I had the pleasure of serving last week on a panel at the Privacy and Security Forum with privacy consultant extraordinaire Elena Elkina and renowned privacy lawyer Mike Hintze. The topic of the panel was Good Bots and Bad Bots: Privacy and Security in the Age of AI and Machine Learning. Serendipitously, on the plane to D.C. earlier that morning someone had left a copy of the October issue of Wired Magazine, the cover of which displayed a dark and grim image of Ryan Gosling, Harrison Ford, Denis Villeneuve, and Ridley Scott from the new dystopian film, Blade Runner 2049. Not only was this a great intro to the idea of bot (in the movie’s case human like androids) but the magazine contained two pertinent articles to our panel discussion: “Q: In. Say. A customer service chat window, what’s the polite way to ask whether I’m talking to a human or a robot?” and “Stop the chitchat: Bots don’t need to sound like us.” Our panel dove into the ethics and legality of deception, in say a customer service bot pretending to be human.

White the idea was fresh in my mind, I wanted to take a moment to replay some of the concepts we touch upon for a wider audience and talk about the case study we used in more detail than the forum allowed. First off, what did we mean by bots? I don’t claim this is a definitive definition but we took the term, in this context, to mean two things:

Some form of human like interface. This doesn’t mean they have to the realism of Replicants in Blade Runner, but some mannerisms in which a person might mistake the bot for another person. This goes back to the days, as Elena pointed out, of Alan Turing and his Turing test, years before any computer could even think about passing. (“I see what you did there.”). The human like interface potentially has an interesting property, are people more likely to let their guard down and share sensitive information if they think they are talking to another person? I don’t know the answer to that and their may be some academic research on that point. If their isn’t I submit that it would make for some interesting research.
The second is the ability to learn and be situationally aware. Again, this doesn’t require the super sophistication of IBM’s Watson but any ability to adapt to changing inputs from the person with whom it is interacting. This is key, like the above, to giving the illusion a person is interacting with another person. By counter example, Tinder is littered with “bots” that recite scripts with limited, if any, ability to respond to interaction.

Taxonomy of Risk

Now that we have a definition, what are some of the heightened risks associated with these unique characteristics of a bot that, say, a website doesn’t have? I use Dan Solove’s Taxonomy of Privacy as my goto risk framework. Under the taxonomy I see 5 heightened risks:

Interrogation (questioning or probing of personal information): In order to be situationally aware, to “learn” more, a bot may ask questions of someone. Those questions could go too far. While humans have developed social filters, which allows us to withhold inappropriate questions, a bot lacking a moral or social compass could ask questions which make the person uncomfortable or is invasive. My classic example of interrogation is an interview where the interviewer asks the candidate if they are pregnant or planning to become pregnant. Totally inappropriate in a job interview. One could imagine a front like recruitment bot smart enough to know that pregnancy may impact immediate job attendance of a new hire but not smart enough to know that it’s inappropriate to ask that question (and certainly illegal in the U.S. to use pregnancy as a discriminatory criteria in hiring).
Aggregation (combining of various piece of personal information): Just as not all questions are interrogations, not all aggregation of data creates a privacy issue. It is when data is combined in new and unexpected ways, resulting in information disclosure than the individual didn’t want to disclose. Anyone could reasonably assume Target is aggregating sales data to stock merchandise and make broad decisions about marketing, but the ability to discern pregnancy of a teenager from non-baby related purchased was unexpected, and uninvited. For a pizza ordering bot, consider the difference between knowing my last order was a vegetable pizza and discerning that I’m a vegetarian (something I didn’t disclose) because when I order for one its always vegetable but if I order for more than one, it includes meat dishes.
Identification (linking of information to a particular individual): There may be perfectly legitimate reasons a bot would need to identify a person (to access that person’s bank account for instance) but identification as an issue comes into play when its the perception of the individual that they would remain anonymous or at the very least pseudonymous. If I’m interacting with a bot as StarLord1999 and all the sudden it calls me by the name Jason, I’m going to be quite perturbed.
Exclusion (failing to let an individual know about the information that others have about her and participate in its handling or use): As with aggregation, a situationally aware bot, pulling information from various sources may alter its interaction in a way that excludes the individual from some service without the individual understanding why and based on data the individual doesn’t know it has. For instance, imagine a mortgage loan bot, that pulls demographic information based on a user’s current address, and steers them towards less favorable loan products. That practice sounds a lot like red-lining and if it has discriminatory effects, could be illegal in the U.S.
Decisional Interference (intruding into an individual’s decision making regarding her privacy affairs): The classic example I use for decisional interference is China’s historic one-child policy which interferes with a family’s decision making on their family make-up, namely how many children to have. So you ask, how can a bot have the same effect? Note the law is only influential, albeit in a very strong way. A family can still physically have multiple children, hide those children or take other steps to disobey the law, but the law is still going to have a manipulatory effect on the decision making. A bot, because if it’s human interface, and advanced learning and situational knowledge, can be used to psychologically manipulate people. If the bot knows someone is psychologically prone to a particular type of argument style (say appealing to emotion) it can use that and information at it’s disposal to subtly persuade you towards a certain decision. This is a form of decisional interference.

Architecture and Policy

I’m not going to go into a detailed analysis of how to mitigate these issues, but I’ll touch on two thoughts: first, architectural design and second, public policy analysis. Privacy friendly architecture can be analyzed along two axes, identifiability and centralization. The more identified and more centralized the design, the less privacy friendly it is. It should be obvious that reducing identifiability reduces the risk of identification and aggregation (because you can’t aggregate external personal data from unidentified individuals) so I’ll focus here on centralization. Most people would mistakenly think of bots as being run by a centralized server, but this is far from the case. The Replicants in Blade Runner or “autonomous” cars are both prominent examples of bots which are decentralized. In fact, it should be glaringly apparent that a self-driving car being operated by a server in some warehouse introduces unnecessary safety risks. The latency of the communication, potential for command injections at the server or network layer, and potential for service interruption are unacceptable. The car must be able to make decisions immediately, without delay or risk of failure. Now decentralization doesn’t help with many of the bot specific issues outlined above, but it does help with other more generic privacy issues, such as insecurity, secondary use and others.

Public policy analysis is something I wanted to introduce with my case study during the interactive portion of the session at the Privacy and Security Forum. The case study I present was as follows:

Kik is a popular platform for developing Bots. https://bots.kik.com/#/ Kik is a mobile chat application used by 300 million people worldwide and an estimated 40% of US teens at one time or another have used the application. The National Suicide Prevention Hotline, recognizing that most teens don’t use telephones wants to interact with them in services they use. The Hotline wants to create a bot to interact with those teens and suggest helpful resources. Where the bot recognizes a significant risk of suicide rather than just casual inquiries or people trolling the service, the interactions will first be monitored by a human who can then intervene in place of the bot, if necessary.

I’ll highlight one issue, decisional interference, to show why it’s not a black and white analysis. Here, one of the objectives of the service and the bot, is to prevent suicide. As a matter of public policy, we’ve decided that suicide is a bad outcome and we want to help people who are depressed and potentially suicidal get the help they need. We want to interfere with this decision. Our bot must be carefully designed to promote this outcome. We don’t want the bot to develop in a way that doesn’t reflect this. You could imagine a sophisticated enough bot going awry and actually encouraging callers to commit suicide. The point is, we’ve done that public policy analysis and determined what the socially acceptable outcome is. Many times organizations have not thought through what decisions might be manipulated by the software they create and what the public policy is that should guide they way the influence those decisions. Technology is not neutral. Whether it’s is decisional interference or exclusion or any of the other numerous privacy issues, thoughtful analysis must precede design decisions.