Privacy Engineering 10 years on

[In July 2023, Kim Wuyts and Isabel Barbera invited me to present the keynote talk to the International Workshop on Privacy Engineering in Delft, Netherlands. Subsequent to that, and because we felt there wouldn’t be an overlapping audience, Nandita Narla and Nikita Samarin, invited me to give the same talk to another group of privacy engineers at the PEP23 workshop ahead of SOUPS in Anaheim, CA. For those who couldn’t be there at either event, I decided to write this blog post to summarize my talk.]

In September of 2013, I authored a blog post for the International Association of Privacy Professionals entitled “Is 2013 the year of the privacy engineer?”  The post came after myself and Stuart Shapiro provided two early workshops on privacy engineering to crowds at IAPP conferences that year and just months before our seminal paper on Privacy Engineering co-authored with, at the time, Information and Privacy Commissioner of Ontario Canada, Ann Cavoukian. The IAPP blog laid out a basic argument that addressing privacy issues needed to move out of legal departments and into engineering. Clearly, my prognostication was premature as to actual industry action, which still prefers legal solutions over technical ones. What about 2023, though? Is it finally time for the ascendence of the privacy engineer?

Before we can answer this question, we need to explore a bit about the concept of privacy engineering. Just a few weeks ago the IAPP published an infographic “Defining Privacy Engineering.”

The publication was the culmination of my push, while a member of the IAPP’s Privacy Engineering Advisory Board, to advocate for rigorous construction of the field. My concern was the term being diluted and applied to anything “technical” in the field of privacy. Anecdotal evidence suggested many lawyers and privacy analysts were deeming any type of coding or technical implementation as “engineering.” This led to bloggers and journalists also applying the engineering moniker to those doing the technical aspects of privacy based on what they were learning from the privacy professionals of the day.

A feedback loop ensued! Privacy professionals reading those blogs and articles similarly referred to technical privacy folks as engineers. Even those doing technical work started referring to themselves as privacy engineers, a form of self-selection bias, which only added fuel to the cycle. Of course, compounding this are job descriptions, which, despite being labeled as “privacy engineer” could just as easily fall under the title of “analyst,” “technologist” or some other non-engineering role. During my exploration of this topic, I found that HR departments at large not-to-be-named tech companies often require the engineering department to hire an “engineer” thus leading to an explosion of engineering roles with no engineering duties (“public relations engineer”, “finance engineer”, etc.)

This debate (“who is the true engineer”) that I’m putting forward is reminiscent of one from my childhood, the question of who is a punk and who is a poseur (aka, a suburbanite who spikes their hair on the weekends but doesn’t live the lifestyle of punk). Though I don’t recall the intensity of the debate at that time, in 1985 MTV (back when they played music and not 24 hours of Ridiculousness) put out a documentary on the LA punk scene entitled “Punks & Poseurs” which suggested the question was one of massive scrutiny within the subculture.

Lest you think the punk/privacy connection is fleeting, I present exhibit two into evidence: the color similarities of the album cover for the Exploited’s Punks Not Dead and CMU’s privacy engineering program logo “Privacy is Not Dead.”

While I don’t think any privacy engineers are going around beating up faux privacy engineers, the debate is worth having, in my opinion, and potentially more important to those relying on the services of such engineers. So let’s stage dive into the discussion:

What is Engineering?

There are a few authoritative definitions of engineering, but I like this one from the Accreditation Board for Engineering and Technology: “The profession in which a knowledge of the mathematical or physical sciences gained by study, experience and practice is applied with judgement to develop ways to utilize, economically, the materials and forces of nature for the benefit of mankind.” Applying this to privacy, it seems fairly logical that privacy takes the role of the “benefit of mankind.” I think the primary gist of what distinguishes an engineer from say an analyst or other professional working in the area is the application of knowledge of mathematical or physical sciences.

I’d like to analogize this to bridge building. You can design and build a bridge without “engineering” that bridge. However, you may end up with a bridge that collapses, as the Tacoma Narrows Bridge did in 1940. In response, a builder could naively suggest a stronger bridge, better materials, or more pillars, but it takes the analysis of an engineer, applying the mathematical and physical sciences, to determine the problem of aerolastic flutter and engineer a solution.

Another analogy I like comes from the realm of software engineering. Programming students are often given the task of sorting lists. Invariably, they produce a sorting program that compares all elements of the lists to every other element. Those who have more study in proper design of sorting algorithms know that this is inefficient, requiring, at worst, n times n passes (where n is the number of elements on the list). More efficient sorting algorithms exist such as “merge sort.” Other sorting algorithms can take advantage of qualities of the elements being sorted to be even more efficient. These are often described using big O notation, where O means on the order of. Inefficient algorithms, as just described, are usual O(n2) and more efficient ones are O(n log n). Hence, the study of big O notation and algorithms distinguishes the engineering of software versus the programming of software. Both an engineer and a programmer may write software to result in sorting a list, but the engineer has developed a more efficient program  by applying the study of mathematics and science to optimize the resources of the computer.

Unlike bridge engineering or software engineering, privacy engineering actually encompasses several different domains. In this way, it’s more similar to, say, safety engineering, where the quality being achieved (safety or privacy) can be desired in many distinct applications.

We can think about safety as a desired quality in the aforementioned bridge engineering, but also in industrial machine design, tool safety, food handling equipment, medical equipment, and many other fields. Someone designing safety into a bridge will not use the same skills and resources as someone building safety in food handling, though they will still have the generic application of math and science to solving the problem. Similarly, privacy engineering is an umbrella covering different disciplines. As part of the aforementioned attempt by the IAPP to define privacy engineering, the advisory board identified a non-exhaustive set of domains where you might apply different engineering techniques to build in privacy.

Software Engineering: This is what most people think about when thinking of privacy engineering. Software engineering covers the design and development of software and building in privacy during the collection, processing and sharing of data. The engineering comes into play during considerations of the tradeoffs of efficiency, utility and risks to individuals.

IT Architecture: When you combine multiple IT components (which may be software engineered), you may get emergent properties or exacerbate risks that were negligible in their individual components. IT Architecture includes considerations of protocols and interfaces between components, again optimizing between efficiency, utility and risks.

Data Science: Privacy in data science concerns the risks from data sets, combinations of data, inferences and such without consideration of the specifics of software collecting, processing or sharing that data. Data science for privacy often involves investigating identifiability of data or linkability between data sets.

HCI/UI-UX: It’s not all about the data. Many privacy harms occur during the interactions between individuals and computers. Spam, pop-up ads and other intrusions are a form of invasions into ones personal space. Manipulative designs may interfere with individuals’ autonomy and decision making. The study of Human Computer Interactions can make use of behavioral economics, psychology and other scientific analysis to minimize these privacy harms.

Business Processing Engineering: Business process engineering traditionally covers the design of efficient business processes. Think industrial manufacturing, though it can be applied to more white-collar business processes like HR and Marketing. While efficiency of resources is the primary goal of business processes, they can be designed to improve privacy as well. This is a nascent area of development, but one worth watching.

Physical Engineering: While there are few, if any, self-identified privacy engineers working in physical spaces, privacy has been a design consideration in the physical world for as long as there have been eaves to stand under. Not only in private spaces in homes, but hospitals are also a frequent target for privacy design. But what about engineering? There is increasing opportunity for physical engineering for privacy in regard to IOT devices, both in their design and the design of spaces to prevent ubiquitous surveillance by those devices.  

Systems Engineering: This is my primary area of work. Systems engineering considers the interconnectedness of all of the above: cyber-physical-social systems and the interactions and interfaces between them and individuals. Systems engineering looks at narrow risks, risk from emergent properties, cascading risks between system elements and tradeoffs of privacy with other quality and functional system requirements.

Next Steps for Privacy Engineering

As you can see, privacy engineering covers a wide swath of different domains. Besides the generic desire for more privacy, what cross-discipline language can we seek to define similarities? I  think we can draw upon the analogous safety engineering for guidance. What are safety engineers trying to do? They are attempting to reduce risk, to make products or service safer by reducing the likelihood of harmful events and the level of harm when those events occur. Similarly, privacy engineers need to focus on reducing privacy risk, the likelihood of privacy events and impacts of those events. [Note, if you’re thinking data breaches, you and I need to have a talk. Privacy is not security, but I won’t derail this blog post with a privacy versus security risks deep dive.]

Privacy risk research is still in its early development. Significant research still needs to be done into quantifying privacy risk, creating feedback loops to incorporate real world data back into risk analysis, measuring the effectiveness of controls on risk and producing ways of determining reasonable levels of risk tolerance. In addition there needs to be more work to tie controls to risk. Privacy professionals have worked for years at developing various technical and organizational controls meant to improve privacy. However, scant work has been done to tie those controls to actual risk reduction. It’s as if bridge engineers said using high-tensile steel improves bridge safety and had tons of data and science backing up the resilience and breakpoints of the steel. That statement seems logical but lacks measurable risk reduction. While you may measure the tensile strength of the steel, that’s different than measuring use of that steel in a particular design’s effect on risk. Similarly, there are many measures, metrics and KPIs for privacy, but what does the ε in differential privacy translate into as to whether a real threat actor is likely to attempt to re-identify the patient in a doctors office’s prescription for 325 mg of acetaminophen?

There also needs to be a causal connection between controls and risk reduction. Increasing the tensile strength of the steel in a bridge isn’t going to reduce risk if the weak point is the connection to the earth. Bridging this gap is what the Institute of Operational Privacy Design (IOPD) is working on in our newest standard. The IOPD is attempting to use structured assurance cases, something used in safety engineering, in privacy. The idea is that you state a claim, such as “This privacy risk has been mitigated” then make an argument as to why it should be considered mitigated. Finally, you provide evidence to support this argument. Lawyers should like this, because it’s similar to a legal argument, showing why some evidence supports some legal claim.

Moneyball and the lesson for Privacy

If you haven’t read the book, Moneyball: the Art of Winning an Unfair Game, or seen the movie staring Brad Pitt and Jonah Hill, I highly encourage you to see them out. I tend to be a “privacy” subject matter expert, but one of the most important aspects of that expertise is the ability to consume diverse content from other expects and apply it to the world of privacy. Whether its behavioral economics (from sources like the Freakonomics and Hidden Brain podcasts), how people learn (from YouTube channel Vertasium), or use of statistics and probability in baseball (i.e. Moneyball).

                I see a huge parallel between Baseball management in the pre-2000 era and the privacy management today. For a hundred plus years, Baseball team management was governed by intuition. Managers and scouts thought they knew what made up a good team. Baseball statistics were published for decades, but wasn’t used in a way that really optimized team performance. The concept of Sabermetrics (the empircal analysis of in game acitivity) began in the 1960s, grew in 1970s, but then transformed the business of Baseball in the 2000s (as chronicled in the book and movie; also see this podcast). Sabremetrics took the “intuition” out of Baseball and turned it into a science, one based on statistics and probability. Science can’t tell you a particular batter will hit a home run against a particular pitcher, but probability can show that you, if you make the same choices, game after game, you’re going to end up with result in a certain range.

                Similar transformations have hit other industries. I believe, the privacy profession is in line for this sort of refactoring. Attempts have been made for decades to create KPI (Key Performance Indicators) for privacy. One of my earliest introductions was a talk by Tracy Ann Kosa at an IAPP conference about a decade ago.  But most privacy program KPIs, in my opinion, still follow the Baseball analogy. Scouts were looking at player stats, but ultimately they were looking at the wrong statistics to determine game outcome. As Jonah Hill says in the movie, “Your goal is not to buy players, its to buy wins.” The privacy profession has, for its short life, relied mostly on heuristics, shortcuts that we intuitively sense, improve desired outcomes. Those shortcuts (such as GAPP, FIPPs, Principles, etc.) have become goals themselves and the metrics tied to those intermediary goals. But, like Baseball, the goal is not the player, the goal is the game. Analogously, a goal in privacy is not transparency, but rather whether a person has the ability to make decisions about things that affect them (with full knowledge and without being overwhelmed). Transparency is a sideshow. Transparency is worthless if it overwhelms or doesn’t support meaningful decision making.

                The point of this post is that metrics are important, rigorously important. Privacy needs to step out of the dark ages of intuition, superstition, and old wives tales.

The author is principal at Enterprivacy Consulting Group, a boutique consulting firm focused on privacy engineering, privacy by design and the NIST Privacy Framework.

Diagramming Data Transfers

International data transfer is probably one of my least favorite privacy exercises. Why? Probably my main dislike deals with the fact that its not really about privacy, but often more about protectionism. That being said, data transfers are a hot hot topic these days, both in Europe and in countries like China and Brazil. It wasn’t until recently that I realized if you look at GDPR from far away, you see there are really four key chapters

  • Chapter II – Principles
  • Chapter III – Rights of the Data Subject
  • Chapter IV – Controllers and Processors
  • Chapter V – Transfers

Fundamentally, most people, myself included, equate GDPR to principles of data processing, rights afforded the data subject and obligations of controllers and processers, but right up there with those key concepts is a whole chapter on international data transfers. At least in the minds of the GDPR authors, transfers are clearly not an afterthought but one of the four major components of the regulation.

With data transfers clearly an important element of GDPR, its important that the analysis of transfers be done with some care. I, for one, am a very visual person. I can analogize concepts visually much easier than verbally. Astute readers may have noticed the diagrams accompany my previous post on data transfer regarding the recent draft guidelines put out by the European Data Protection Board. In working through data transfer scenarios, I’ve found it extremely helpful to illustrate or diagram them.

Disambiguating “transfers” and “transmission” of data

Before diving into how to diagram data transfers, it’s important to distinguish the terms transfer and transmit. Transmission of data occurs when data goes from one place to another. If I send you an email with an attachment, I am transmitting data to you. That data is transmitted over multiple servers, through different service providers and maybe through different geographies. A transfer, however, under the GDPR is a legal chain by which a controller or processer transfers data to another controller or processor. In other words the transmission ≠ the transfer.

Perhaps an example would help. I use Dreamhost to host my website. On my website, I host a file (say a pretty infographic). You go to my website and download the file. The file is transmitted from Dreamhost (wherever their servers are) to you (wherever you are) However, there is not a legal transfer from Dreamhost to you. The transfer was from me to you. I may never have even had the file in my possession. Let’s illustrate that.

Simple transfer/transmission diagram

As you can see, I’ve used dotted lines to illustrate the transmission of data, the 0’s and 1’s flying over the internet from DreamHost to You. But the transfer, the metaphysical or conceptual transfer, from me to you, is illustrated with a solid line. Let’s look at an EU example.

European Union data transfers

EU data transfer diagram

Here, ChairFans, GmbH, a German company, sends a file to Star Analytics, Inc., in the US, to analyze. In this case the transmission is parallel to the transfer. ChairFans is transmitting data to Star Analytics and they are also transferring that data. Remember, the transmission refers to the physical bits flying across the Atlantic Ocean and the transfer refers to the act of one entity giving the other entity the data. If you’re still struggling, you can think of it this way. If a hacker broke into ChairFans and stole the data, the data would still be transmitted over the internet across the Atlantic Ocean, but ChairFans didn’t “transfer” data to the hacker. It was not a deliberative, intentional act of making the data available to the hacker.

Do we need a GDPR Chapter V transfer tool for this transfer? If there was

Diagramming

To diagram data transfers, I’m using diagrams.net, a free (and privacy friendly) tool to create diagrams. I’ll provide the file for all these diagrams at the end of this blog. I’ve also included a template for the shapes I’m using, which you can use to create your own data transfer diagrams. For the following example, I’m first going to illustrate the Use Case for supplementary measures in the EDPB Recommendations.

EDPB Recommendation

Use Case 1: Data storage for backup and other purposes that do not require access to data in the clear

A data exporter uses a hosting service provider in a third country to store personal data, e.g., for backup purposes. Notice, I’ve now added the labels, Exporter and Importer to the entities.

Illustration 3

Use Case 2: Transfer or pseudonymised Data

A data exporter first pseudonymises data it holds, and then transfers it to a third country for analysis, e.g., for purposes of research. This really isn’t distinguished from the previous example. I’ve added a gear icon to indicate the pseudonymization.

Illustration 4

Use Case 3: Encrypted data merely transiting third countries

A data exporter wishes to transfer data to a destination recognised as offering adequate protection in accordance with Article 45 GDPR. The data is routed via a third country.

Illustration 5

Use Case 4: Protected recipient

A data exporter transfers personal data to a data importer in a third country specifically protected by that country’s law, e.g., for the purpose to jointly provide medical treatment for a patient, or legal services to a client. No different than illustration 3

Use Case 5: Split or multi-party processing

The data exporter wishes personal data to be processed jointly by two or more independent processors located in different jurisdictions without disclosing the content of the data to them. Prior to transmission, it splits the data in such a way that no part an individual processor receives suffices to reconstruct the personal data in whole or in part. The data exporter receives the result of the processing from each of the processors independently, and merges the pieces received to arrive at the final result which may constitute personal or aggregated data.

Illustration 6

Use Case 6: Transfer to cloud service providers or other processor which require access to data in the clear

A data exporter uses a cloud service provider or other processor to have personal data processed according to its instructions in a third country.

Illustration 7

Use Case 7: Remote access to data for business purposes

A data exporter makes personal data available to entities in a third country to be used for shared business purposes. A typical constellation may consist of a controller or processor established on the territory of a Member State transferring personal data to a controller or processor in a third country belonging to the same group of undertakings, or group of enterprises engaged in a joint economic activity. The data importer may, for example, use the data it receives to provide personnel services for the data exporter for which it needs human resources data, or to communicate with customers of the data exporter who live in the European Union by phone or email. Here I’ve added an IT system to indicate that the Common Enterprise has remote access to that IT system.

Illustration 8

EDPB Guideline on Article 3 and Chapter V

Next up I’ll tackle the examples from the draft EDPB Guidelines on the Interplay Article 3 and Chapter V.

Example 1

Maria, living in Italy, inserts her personal data by filling a form on an online clothing website in order to complete her order and receive the dress she bought online at her residence in Rome. The online clothing website is operated by a company established in Singapore with no presence in the EU. In this case, the data subject (Maria) passes her personal data to the Singaporean company, but this does not constitute a transfer of personal data since the data are not passed by an exporter (controller or processor), since they are passed directly and on her own initiative by the data subject herself. Thus, Chapter V does not apply to this case. Nevertheless, the Singaporean company will need to check whether its processing operations are subject to the GDPR pursuant to Article 3(2).12

Illustration 10

Example 2

Company X established in Austria, acting as controller, provides personal data of its employees or customers to a company Z established in Chile, which processes these data as processor on behalf of X. In this case, data are provided from a controller which, as regards the processing in question, is subject to the GDPR, to a processor in a third country. Hence, the provision of data will be considered as a transfer of personal data to a third country and therefore Chapter V of the GDPR applies. Note, I’ve added the labels C and P to indicate Processor and Controller

Illustration 9

Example 3: Processor in the EU sends data back to its controller in a third country

XYZ Inc., a controller without an EU establishment, sends personal data of its employees/customers, all of them non-EU residents, to the processor ABC Ltd. for processing in the EU, on behalf of XYZ. ABC re-transmits the data to XYZ. The processing performed by ABC, the processor, is covered by the GDPR for processor specific obligations pursuant to Article 3(1), since ABC is established in the EU. Since XYZ is a controller in a third country, the disclosure of data from ABC to XYZ is regarded as a transfer of personal data and therefore Chapter V applies.

Illustration 10

Example 4: Processor in the EU sends data to a sub-processor in a third country

Company A established in Germany, acting as controller, has engaged B, a French company, as a processor on its behalf. B wishes to further delegate a part of the processing activities that it is carrying out on behalf of A to sub-processor C, a company established in India, and hence to send the data for this purpose to C. The processing performed by both A and its processor B is carried out in the context
of their establishments in the EU and is therefore subject to the GDPR pursuant to its Article 3(1), while the processing by C is carried out in a third country. Hence, the passing of data from processor B to sub-processor C is a transfer to a third country, and Chapter V of the GDPR applies.

Illustration 11

Example 5: Employee of a controller in the EU travels to a third country on a business trip

George, employee of A, a company based in Poland, travels to India for a meeting. During his stay in India, George turns on his computer and accesses remotely personal data on his company’s databases to finish a memo. This remote access of personal data from a third country, does not qualify as a transfer of personal data, since George is not another controller, but an employee, and thus an integral part of the controller (company A). Therefore, the disclosure is carried out within the same controller (A). The processing, including the remote access and the processing activities carried out by George after the access, are performed by the Polish company, i.e. a controller established in the Union subject to Article 3(1) of the GDPR.

Illustration 12

Example 6: A subsidiary (controller) in the EU shares data with its parent company (processor) in a
third country

The Irish Company A, which is a subsidiary of the U.S. parent Company B, discloses personal data of its employees to Company B to be stored in a centralized HR database by the parent company in the U.S. In this case the Irish Company A processes (and discloses) the data in its capacity of employer and hence as a controller, while the parent company is a processor. Company A is subject to the GDPR pursuant to Article 3(1) for this processing and Company B is situated in a third country. The disclosure therefore qualifies as a transfer to a third country within the meaning of Chapter V of the GDPR.

Illustration 13

Example 7: Processor in the EU sends data back to its controller in a third country

Company A, a controller without an EU establishment, offers goods and services to the EU market. The French company B, is processing personal data on behalf of company A. B re-transmits the data to A. The processing performed by the processor B is covered by the GDPR for processor specific obligations pursuant to Article 3(1), since it takes place in the context of the activities of its establishment in the EU. The processing performed by A is also covered by the GDPR, since Article 3(2) applies to A. However, since A is in a third country, the disclosure of data from B to A is regarded as a transfer to a third country and therefore Chapter V applies.

Illustration 14

My comments to the EDPB

Subsequent to the draft guidelines above, I made a comment to the EDPB on two scenarios they should cover. Those scenarios are detailed below.

A data subject contracts with X, GmbH (in Germany) which is a European Union based subsidiary of X, Inc (in the United States). However, the data subject never actually supplies personal data to X, GmbH as the data subject directly transmit data to X, Inc. in the United States. This is a Chapter V transfer of data requiring a transfer tool. X, GmbH and X, Inc. use standard contractual clauses in place governing the transfer of data. X, GmbH is the exporter and X, Inc. is the importer. 

Illustration 15

ABC, GmbH (in Germany) instructs employees to use a service provided by X, Inc., in the United States. Employees’ behavior is tracked via the service provided by X, Inc, thus X, Inc. is subject to GDPR for the data under Article 3.2(b). Because ABC, GmbH is “mak[ing] personal data, subject to this processing, available to…” X, Inc. via instructions to its employees, there is a transfer of data under Article V. ABC, GmbH and X, Inc. execute the standard contractual clauses with ABC, GmbH as the exporter and X, Inc. as the importer.

Illustration 16

I posed this scenario on LinkedIn

Company X, GmbH (DE) host data on an Australian data server. They contract with Company Y, Inc. in the United States to process data. Company Y’s employee working remotely in Australia, accesses the data on the data server. X, GmbH and Y, Inc. execute Standard Contractual Clauses to govern the transfer. There is a legal transfer of data because X, who has putative control over the data in Australia, gave access to Y, who has putative control over it’s employee in Australia. This despite the fact that the data never left Australia.

Illustration 17

If you want to explore these scenarios and make some of you’re own, download this file (be sure to right click and save the file to your desktop). Then go to https://diagrams.net then open the file from there.

Comment on Guidelines 05/2021 on the Interplay between the application of Article 3 and the provisions on international transfer as per Chapter V

Below are my comments on recent EDPB Guidelines which I’m submitting as part of their public consultation.

The guidelines need to provide an example which is a very common scenario:

A consumer data subject (located in the EU) registers with an online service. The service is being offered by a company in a third country, thus placing the company under the territorial scope of GPDR via Art. 3.2(a). However, when registering for the service, the data subject enters into an agreement with a subsidiary of the company located in the EU. For avoidance of doubt, the subsidiary in the EU, never possesses personal data of the data subject. It appears, in this scenario, that there is a legal “transfer” from the subsidiary in the EU to the parent company in the third country.

Example I

Illustration of data transmission and a legal data transfer

In the example illustrated above, a data subject contracts with X, GmbH (in Germany) which is a European Union based subsidiary of X, Inc (in the United States). However, the data subject never actually supplies personal data to X, GmbH as the data subject directly transmit data to X, Inc. in the United States. This is a Chapter V transfer of data requiring a transfer tool. X, GmbH and X, Inc. use standard contractual clauses in place governing the transfer of data. X, GmbH is the exporter and X, Inc. is the importer. 

A similar scenario exists when a business in the EU directs its employees to use an app (such as for Human Resource purposes) which is provided by a vendor in a third country which monitors the behavior of the employees (such as job time tracking), thus subjecting the vendor to GDPR under Art 3.2(b). Even though the employer never holds the data, this still appears to be a transfer under the guidance (“otherwise makes personal data available”). A clarifying example in the guidelines be helpful.

Example II

No alt text provided for this image
Illustration of a data transmission and legal data transfer

ABC, GmbH (in Germany) instructs employees to use a service provided by X, Inc., in the United States. Employees’ behavior is tracked via the service provided by X, Inc, thus X, Inc. is subject to GDPR for the data under Article 3.2(b). Because ABC, GmbH is “mak[ing] personal data, subject to this processing, available to…” X, Inc. via instructions to its employees, there is a transfer of data under Article V. ABC, GmbH and X, Inc. execute the standard contractual clauses with ABC, GmbH as the exporter and X, Inc. as the importer.

Respectfully submitted,

R. Jason Cronk

CHALLENGE:90 Trails in 100 Days

Challenge: 90 Trails in 100 Days

Sunrise over Myakka River State Park

In the fall of 2020, I went on a wonderful, adventurous, hike down from the rim of the Grand Canyon into Phantom Ranch on the river. The trip was arduous and the climb out was tough given my hailing from the vertically challenged state of Florida. Prior to this undertaking, I had begun training, in Florida, including perhaps Florida’s most vertical trail, the Torreya Challenge in Torreya State Park. After this adventure and the two months of training leading up to it, I became a bit of a couch potato in December and January. About the middle of January, I decided to do something about that. Seeing all the wonderful trails in my area, I set myself a challenge, to hike or bike 90 different trails in 100 days. I gave myself a hundred days to account for weather, work or other impediments to doing a trail each day.  

Now you may be asking, what this has to do with privacy. The short answer is not much, but there is always an angle. In using AllTrails, the trail mapping application, I discovered a nifty way to stalk people. See my previous blog post for more.

In February, thanks to Publix Supermarkets, I procured a large amount of trail mix. By the end, despite adding some more in March and April, I was down to one container.


My typical kit consistent of day pack with water reservoir, bear bell, bear mace, Chapstick, headphones to listen to Privacy podcasts, snacks (not pictured trail mix), and trail maps. Also not pictured are optional sunscreen and insect repellent.

Off I set on January 30th. The task seemed simple but, as with many things, implementation was more fraught than at first imagined. I tried to do longer trails or those  farther away when I had more time (like weekends) or during nice weather. One early trail, Pond Loop, at Okeeheepkee Prairie County Park, I completed on a rainy afternoon was only 0.5 miles. I decided after that to only include trails over 1 mile long. This lead me to a few times combining short trails into one “trail” or stretching a trail to it’s extreme (exploring every nook and cranny) to try to get that mile in. Trying to define a “trail” also led to some creative interpretations. Not all trails are simply laid out in the platonic idealized state. Some are based on forest service roads, some intersect and loop and figure 8. Some are out and back, retracing your steps. I had to break some long trails, like the 30+ miles of the St. Marks trail into more manageable pieces of about 16 mile chunks (8 out and 8 back). Some of my trips weren’t trails at all, but I counted them, like when I walked 6 miles home after dropping off a truck at the rental car company. I learned about new trails, which weren’t easily found, like the Capital to Coast trail still under construction, which when fully complete with be 120 miles of biking or shared use paths from Tallahassee to the Emerald coast. 

It was perhaps the best time to be out hiking and biking in Florida. Spring weather meant it wasn’t too hot, the parks were green and flowers were in full bloom. I saw so many animals, many that your rarely see as a weekend warrior.  In addition to the usual squirrels and lizards, I saw a bobcat, a family of boar, water moccasins and other snakes, a mole,  a red pileated woodpecker, a gopher tortoise, giant mosquitoes, ticks, spiders, and many more. I did not, however, see a bear. Not to say they weren’t there, but ever since I encountered a bear last year, I’ve been hiking with a bear bell and bear mace, so they’ve thankfully kept their distance. 

What lurks beneath? Creature from the black lagoon? Manatee? Large alligator? Something was moving fast and leaving a wake under the Crooked River in Tate’s Hell State Forest.

As a capstone to my challenge, I returned to Torreya State Park to take on the Torreya Challenge. It was a wonderous exhausting 4 hours which left me with a terrible head ache, but I made it. 90 different trails. 98 days in the making. 478 miles covered. Challenge complete. Level UP! 

More Galleries

Animals and Insects
Flora and Fungi
Doll's Head Trail
IMG_2133
Georgia
Econfina State Park
Emerald Coast
Torreya State Park - Torreya Trail and the Torreya Challenge Trail
Panoramas
# Location Trail Miles Type Links
1 Lake Talquin State Forest – Lines Track West Loop 3.8 Hike favicons (16×16)
2 Ellinor Klapp-Phipps Park West Loop 2.9 Hike
3 J.R. Alford Greenway Bluebird Loop 3.9 Hike
4 San Luis Mission Park San Luis Park Loop 1.98 Hike favicons (16×16)
5 Apalachicola National Forest GF&A Trail 5 Hike

6 Maclay Gardens State Park Shared Trail Loop 5.5 Bike https://www.floridastateparks.org/maclaygardens
7 Lake Talquin State Forest – Lines Track Talquin Loop (Blue) 6 Bike
8 Okeeheepkee Prairie County Park Pond Loop 0.5 Hike  
9 Apalachicola National Forest Munson Hills 8.4 Bike
10 Governors Park Fern Trail 3.4 Hike  
11 Three Rivers State Park Eagle Trail 3

Bike

12 St Marks Trail North Trail 16 Bike
13 Ochlockonee River WMA Old Cemetary Rd 3.9 Hike  
14 Lafayette Heritage Trail Park Lafayette Heritage Trail 6.9 Hike  
15 Wakulla Springs State Park Wakulla Springs Park Trail 10.1 Hike
16

Orchard Pond

Orchard Pond Trail 6.9 Bike  
17

Silver Lake Recreation Area

Silver Lake Habitat Trail 1.4 Hike
18 Timberlane Ravine Nature Preserve Timberlane Ravine Nature Trail 1.5 Hike
19 San Felasco Hammock Preserve State Park Moonshine Creek Trai 1.6 Hike
20 Lakeland Highlands Scrub  Lakeland Highlands Scrub Trail 3.1 Hike  
21 Catfish Creek Preserve State Park Campsite 2 White Trail 5.5 Hike
22 Black Creek Preserve Red Trail 4.9 Hike  
23 Tom Brown Park Magnolia MTB Trail 3 Bike  
24 Central Park Central Park Lake Loop  1.9 Hike  
25 Apalachicola National Forest Camel Lake Loop 8.2 Hike
26 J. R. Alford Greenway Yellow Loop 5.3 Bike  
27 Miccosukee Canopy Road Greenway Miccosukee Greenways Trail 15.5 Bike  
28 Bald Point State Park Loop 3.1 Hike
29 A.J. Henry Park A.J. Henry Park Trails 1.9 Hike
30 Alfred B. Maclay Gardens State Park Bike Loops 5.7 Bike
31 Wakulla State Forest Nemours Trail 1.6 Hike
32 Ochlocknee River WMA Cut Through Lewis Loop 2.5 Hike  
33 Kolomoki Mounds State Park Spruce Pine Trail 3.1 Hike
34 Wilson Hospice House Wilson Hospice House Trail 1.3 Hike  
35 Marjorie Turnbull Park Trail 1.6 Hike  
36 Tallahassee Morning Hike from Budget 5.4 Hike  
37 Gil Waters Preserve at Lake Munson Trail 1 Hike
38 Bald Point State Park Sandy Trails 4.2 Bike
39 Elinor Klapp-Phipps Park Redbug Trails 4.6 Bike
40 Tate’s Hell State Forest High Bluff Coastal Loop Trail 9 Bike
41 Ochlocknee State Park Pine Bluff Trail 1.2 Hike
42 St. Joseph Island Loggerhead Trail and Maritime Hammock Nature Trail 21.5 Both
43 Tom Brown Park West Cadillac Trail 3.1 Bike
44 J.R. Alford Greenway Long Leaf Trail 4 Hike  
45 Alfred B. Maclay Gardens State Park Ravine Trail 1.9 Bike
46 Cascades Park Cascades Park Loop 2.5 Hike
47 Apalachicola National Forest Oak Park Bridge Trail 4.8 Hike
48 St. George Island State Park Gap Point Trai 5.5 Hike
49 St. George Island State Park Sugar Hill Beach Old Road 7.3 Both
50

St. Marks Trail

Wakulla Springs to St. Marks 16 Bike  
51 Myakka River State Park Fox’s Low to Mossy Hammock 3 Hike
52 Myakka River State Park Mossy Hammock to Fox’s Low 9.5 Hike
53 Plant City Dean’s Ride 8.5 Bike  
54 River Rise Preserve State Park River Rise Preserve Trail 2.9 Hike
55 St. Marks Trail North Trail 9.8 Bike  
56 Tom Brown Park Subaru to Tom Brown 3.5 Hike  
57 Apalachicola National Forest Wright Lake White Trail 5.5 Hike
58 St. Mark WMA Plum Orchard to St Marks Via Port Leon 8.2 Hike  
59 Capital Circle SE  Captial Circle SE Shared Use Path 13 Bike
60 Econfina River State Park Blue and Orange Trail 12.5 Bike
61 Anita Davis Preserve at Lake Henrietta Park Lake Henrietta Trail 1.8 Hike

62 Goose Pond Trail Goose Pond Trail 2.7 Hike  
63 Wakulla State Forest Double Springs Trail with Petrik Spur 4.7 Bike
64 Fred George Greenway and Park Fred George Loop 1.5 Hike  
65 Lake Talquin State Forest Long Leaf Loop 3.9 Hike

favicons (16×16)

66

Guyte P. McCord Park

Sculpture Trail 1.2 Hike
67 Ochlockonee Bay Trail Ochlockonee Bay Trail 31.2 Bike  
68 Optimist Park Indian Head Trail 2 Hike
69 Chase Street Park Monticello “Ike Anderson” Bike Trail 4.5 Bike  
70 Apalachicola National Forest – Bradley Bay Wilderness Monkey Creek Trail Head on Florida Scenic Trail 4.9 Hike
71 FSU Bike Path Ocala Rd to Stadium Drive 2.3 Bike  
72 Seminole State Park Gopher Tortoise Nature Trail 1.8 Hike
73 J.R. Alford Greenway Wiregrass and Beggarwood Loop 3.3 Bike  
74 Lake Talquin State Park Lake Talquin State Park Trail 1.6 Hike
75 Capital to Coast Trail St. Marks to 319 24.3 Bike  
76 Constitution Park Dolls Head Trail 2.4 Hike  
77 East Roswell Park Park Loop Trail 2.1 Bike  
78 Chattahoochee-Oconee National Forest Andrews Cove Trail 3.8 Hike
79 Unicoi State Park Bike Trail 3.5 Bike
80 Unicoi State Park Anna Ruby Falls 1 Hike
81 Brinkley Glen Park Brinkley Glen Trail 1 Hike
82 Letchworth-Love Mounds Archaelogical State Park Letchworth Mounds Loop 2.7 Hike
83 St. Marks WMA Florida Scenic Trail 5.5 Hike  
84 Econfina State Park River Loop (Red Trail) 3.3 Hike
85 Torreya State Park Torreya Trail 6.6 Hike
86 Lafayette Heritage Trail Park East Cadillac Loop 2.6 Bike
87 Governors Park Fern Trail, Kohl’s Trail and Blairstone Multi-Use Trail 4.4 Hike  
88 Apalachee Regional Park Cross Country Loop 3.3 Hike
89 St. Andrews State Park Road and Pine Flatwoods Trail 5.5 Both
90 Torreya State Park Torreya Challenge 9.3 Hike
      478    

 

Google Calendar Privacy Vulnerability

Its interesting how events can lead one find privacy and security vulnerabilities. I’m reminded of the old Connections show, where James Burke would connect seemingly unrelated events in human history and show how one led to another. During my Winter 2021 Strategic Privacy by Design course, the United States did a time shift known as Daylight Saving Time, an anachronism from the days of agriculture where the government thought changing the time twice a year to adjust to changing sunlight would help farmers use time more effectively. As a result of this shift, some students in Europe showed up at the end of a lecture because I had adjust my clock, but they, obviously being in Europe, had not.

As a result of this timing error, I thought it might be good to create calendar items in Moodle (the LMS I use) for the Spring 2021 Strategic Privacy by Design course. The plan was to export the iCal file and send it to students so they would each be able to insert the important course events in their own calendar. I did just that into my calendar as well, which, unfortunately is in Google.

My eagle eyed assistant instructor, Maria, noticed when she was checking my schedule to send me an invite to a meeting, that should could see these items, even though I had set up to only share Free/Busy calendar (see below).

After digging around, I finally figure out what was going on. Visibility on each calendar item has options of: private, public or default visibility (meaning to default to the overall calendar’s visibility).

However, these calendar items had a class in the iCal file of public, which overrode my calendar’s default of Free/Busy only.

Those events were imported. I wanted to check, so I had my security intern invite me to three event, one she set to private, one she set to public and one she set to default visibility. As expected, despite my calendar set to Free/Busy only, the “public” event showed as public.

Your reaction may be, well this event is public, but two problems persist. 1) It still shows MY interest or possible attendance in this public event, not just whether I’m busy or free; and 2) when the sender has their calendar default to public and doesn’t realize that but sends you an invite to talk. I would suggest that my calendar settings should override the imported event’s settings, just to be on the safe side.

By the way, if anyone has a suggestion for a privacy friendly online calendar (so I can share my free/busy schedule), I’d appreciate hearing from you. I haven’t found a good alternative yet.

How to stalk someone via a trail mapping app.

For those that don’t know, I’m an avid hiker and biker. In fact, I’m currently undertaking a challenge that I created for myself to do 90 different trails in 100 days. Currently, I’m 2/3rds through that challenge with ~30 days to go. One of the keys tools I use for finding and following trails is a mobile App called AllTrails.

I’ve used it for years but now I’m using it daily. While I’ve known that trail apps have potential privacy problems  (I even included building a privacy friendly trail app as an example in my book, see illustration), my recent use has pinpointed how problematic.

 

Screenshot of AllTrails

In the App on a phone, when you pull up to explore the area looking for trails you’re presented the pinpoints of a bunch of curated trails, as shown at left. You can click on a trail and get a description, trail map, reviews, popular activities and features. There is a slight problem in that reviews, I think, are public by default, but it appears that when your profile is private or individual recordings are private, your reviews aren’t shown. 

In my search for trails, though, I’ve found lots of unlabeled trails. In other words, trails at parks, greenways and forests that haven’t been curated and cataloged. You can submit new trails for consideration, and I’ve done that with a few. I’ve also recorded via the app some hikes and walks that aren’t official trails, like when I dropped a rental truck off and walked home 5 miles because I needed to get a hike in that day or when my car was getting an oil change and I hiked to a park to kill time. Because of the challenge, I wanted to document these “hikes” to record my mileage. Now even though my recordings are private and my profile is private, uploading these recordings seemed even less problematic because they weren’t linked to an official trail and thus unfindable by the public. At least I assumed so. [Yes, privacy professionals, I know, AllTrails could be monetizing me by selling geolocation information to advertisers. I assume so, at least, with any app I use.

It turns out that my statement about recordings unlinked to trails is not quite accurate. In the App it appears to be true, but on the AllTrails website, you can look at curated trails OR community content. 

This community content contains all sorts of hikes people take, including official but uncurated trails, trips to visit grandmother in Ohio (I saw on where someone recorded their road trip) or walking around their neighborhood. I’ve yellowed out the map above to reduce the chance of someone finding this particular hiker’s location based on the road topography. Clicking on the recording in the list of community content leads to the details (shown below). As you can see this hiker left their house (black point) and walked around their neighborhood and turned off the recording as they approached their house at the end of the cul-de-sac (green point). Mousing over the endpoints yields the latitude and longitude to 5 decimal places, which is accurate to within a meter. I’ve attempted to obscure as much information as possible, like street names, exact lat/long and other houses, but I’m sure  someone with enough resources could identify this from the unique street outline. However, I’m not going to make it easy. 

You may be thinking, well this isn’t bad because I don’t know who lives at some random house (i.e. I don’t know their name, though it might be part of the public records on home ownership).  It other words its an attribute disclosure about this person (their walk details) but not an identity disclosure. I won’t debate the problems of attribute disclosures in this blog but that’s not what’s happening here. Clicking the profile icon will take you to their profile. Note, this person did at least not upload a picture of themselves so the profile icon (under the words Morning Hike on the left) is generic. Unfortunately, they DID include their full name (changed to a gender neutral generic name below). 

On my recent hiking challenge, I generally listen to podcasts, mostly privacy related. One I’ve become very fond of is Michael Bazzell’s “Privacy, Security and OSINT” podcast. It’s fairly frequent (I’m listening to podcasts daily now) and provides both tips on how to protect your privacy and OSINT (Open Source Intelligence) techniques, to which people need to be familiar with in order to protect their privacy. 

Of course, being a privacy by design specialist, my take is people shouldn’t have to go to extremes to protect their privacy. The onus is on organizations to build better products and services. AllTrails, I like your app, really, I do. But it needs so many improvements from a privacy perspective. So many, in fact, I’d be happy to offer you some free consulting. Just contact me rjc at enterprivacy.com.  I don’t mean to single AllTrails out. I’m sure this is a problem with many or most of the trail apps. AllTrails just happens to be the one I use. 

For others who don’t want their organizations to be on the cover of the NY Times , sign up for some privacy by design training or contact me about a consulting engagement. Become a privacy hero with your customers. 

One Way Tables

One of the potential downsides of relational data is the ease at which data can be related in both directions. A simple search of the table below will reveal not only if a known customer has COVID but a list of all COVID positive customers.

Supposed instead, you want to be able to determine if a customer has COVID but not easily obtain a list of all customers who have COVID. How might you do that? This can be accomplished using one-way functions (hashes).. By hashing the customer’s name (shown as H(name) function) and replacing that in the customer field, we no longer can get a list of customer’s with COVID. A simple SQL query will get the customer’s COVID status from their name: SELECT Covid from TABLE where Customer = hash(customer_name). The inverse, however, won’t get me the list of customers with COVID but a list of hashes: select customer from TABLE where COVID =”Positive”.

Now hashes are one way functions, meaning it’s very difficult to determine x, given H(x). However, because of the very few values of customer’s name, it’s easy to compute what’s called a rainbow table, that is a table of all possible hash values. We can then lookup the hash in the rainbow table and obtain the customer’s name for those customers that have COVID. The table below assumes we know all the names of customers, an attacker may not, but names are fairly common and even if we created our rainbow table with six thousand different common first names, that’s fairly trivial to compute.

We address this with a concept called salting, which is adding a random value to the Name before hashing it, i.e. Customer = H(Name+Salt). One problem though we can’t use the same salt for every customer or an attacker could precompute a rainbow table just adding the salt to each name before calculating the hash. We also need the salt BEFORE we do the hash. Now customer’s can’t be expected to remember the salt before looking it up, but we can give them an account number, which we can relate to the salt. A customer, wanting to look up their COVID status, enters their account number and name. The system retrieves the salt using the account number and hashes the name to determine the customer and look up their COVID status. [Note, salts should be incremented with each access to prevent replay attacks but that’s beyond the scope of this blog].

Light arrows show the link between rows, only discoverable with knowledge of the customer’s name.

Now an attacker wanting to identify all known COVID cases would need to hash all the possible customer names against all the possible salts. Padding the salt table with hundreds of thousands of random account numbers could increase the number of computations necessary for the attacker. In fact, you could prefill the account table with a million rows and a million salts and randomly assign account numbers to customers. Even if two customers got the same account number, they aren’t going to get the same Customer value unless they have the same name, H(name+salt). This also benefits us because if we have two people with the same name, they could be given different account numbers and distinct COVID statuses.

In order to create a rainbow table now an attacker must hash six thousand names with a million account numbers, or 109 rows. Not impossible but starting to slow them down.

I’m going to make it even a little bit harder. Right now, we can still use the COVID table to identify how many people are COVID positive or negative. This is especially problematic if each and every customer has the same status. We can do a little encryption and steganography to hide the information. First, for each record we can generate a random 64 byte value (say 787683b51cf3b49886fce82dc34a51a2 in Hexadecimal). If we need to encode a negative result, we ensure that value has the least significant bit of 0. If we need to encode a positive result, we ensure that value has the least significant bit of 1 (i.e. 787683b51cf3b49886fce82dc34a51a3). Note the change in the last digit from 2 to 3, changing the least significant bit and the COVID status. That value is then encrypted using a symmetric cypher (such as AES) without padding or authentication to verify the accuracy of the data. We can use part of the SALT as the encryption key (bolded in the table below)

Now, an attacker can’t search the COVID table to identify how many customers have COVID. Additionally, because we used an encryption algorithm without any validation, any of the values in the COVID column will decrypt with any of the encryption keys in the account table, just with erroneous least significant bits, thus not disclosing anyone’s COVID status at all. If we used a decryption algorithm with padding or authentication, it would provide a quicker backdoor for connecting records in the account table with records in the COVID table.

One more step to slow a potential attacker down is to iterate the hashing, in other words compute H( H( H( H( …. H(name+salt))))), say 100,000 times. While slowing down lookups slightly, it significantly increases the computation of a rainbow table for an attacker, who now must compute 100,000 x 109 hashes. Increasing the number of values in the account table, increasing the potential values for name (such as using given name and surname), and increasing the number of iterations will all serve to slow an attacker down.

The concept described above is sometimes called a translucent database, such that someone who has full access to the data (a database engineer) still can’t interpret it, but with a little extra knowledge (account number and name), a customer can still retrieve their COVID status.

Cookie pollution

Just moments ago I visited lightpollutionmap.info to look for someone to go camping this weekend where there was as little light pollution as possible given a reasonable drive from where I’m at. I was immediately presented with a pretty comprehensive cookie banner. The FIRST thing I noticed was that I couldn’t unselect unnecessary cookies, which included over 75 “statistics” cookies.

Most of those “statistics” cookies, of course, are really third party marketing trackers. The only option presented were “Allow selection” and “Allow all cookies” which were functionally identical because you couldn’t unselect any of the types of cookies. I noticed the button for only selecting necessary cookies was still there but invisible. I clicked it, only be to presented a blank screen. See the video below showing the cookie selection options disappearing. I also noticed the ad settings options to accept or reject vendors was disabled was disabled.

VIDEO of options being hidden

Artificial Motivations

Recently I had the pleasure of working with two start-ups in the AI space to help them consider privacy in their designs. Mr Young, a Canadian start-up, wants to use an intelligent agent to allow individuals to find resources available to them to improve their mental well-being. Eventus AI, a US based company, hopes to use AI to optimize the sales funnel from leads collected at events. Both recognized the potential privacy implications of their services and wanted to not only ensure compliance with legal obligations, but showcase privacy as an important aspect of their brand.

At the onset of my engagements, I had to think about how the intelligent agents driving them could threaten individuals’ privacy. In my previous work, threat actors were persons, organizations or governments, each with distinctive motives. People can be curious, seek revenge, trying to make money, or exert control.  Organizations are generally driven by making money or creating competitive advantage. Governments invade privacy for law enforcement or espionage purposes. Less angelic governments may invade privacy out of desires for control or repression of their citizens.

Figure 1 From the book Strategic Privacy by Design chapter on Actors

Typically, when I think of software, it isn’t a “threat actor” in my privacy model. They don’t have independent motives. They are tools made by developers but they don’t have motives on their own. The question arises though; does AI represent a different beast? Does AI have “motives” independent of its creator? Clearly, we haven’t reached a stage where HAL 9000 refuses Dave’s command or Skynet determines humanity as a threat to its existence, but could something slightly less sentient manifest motive?

Still, I would argue that AI is not similar to other software. It can, in the sense, present a privacy threat beyond the intent of its creator.  While not completely autonomous, AI does exhibit an ends justify the means approach to achieving its objective.  The difference between AI and, say, a human employee, is the human can put their business objective in context of other social norms, whereas AI lacks this contextual understanding. I liken it to the dystopian analogy of robots being programmed to prevent humans from harming one another and determining the best way is to exterminate all humans. Problem solved! No more humans harming other humans. Like the genie granting a wish, they do exactly what they are told, sometimes with unintended and far-reaching consequences.

The motivation that I would ascribe to AI then is “programmatic goal-seeking.” It is not that AI seeks to invade privacy for independent purposes; rather it seeks whatever it’s been programmed to seek (such as ‘increasing engagements’). Privacy is the beautiful pasture bulldozed on AI’s straight-line path to its destination.

The question now becomes, from the perspective of a developer trying to build an AI into a system, how do you prevent privacy being a casualty of that relentless pursuit? I make no claims that my suggestion below in any way supplants all of the efforts to consider ethics in AI development (failed or successful), but rather this is the approach I take complements others. I think it gets us far along in a pragmatic and systematic way.

Before looking at tactics in the AI context (or anywhere really) there is a fundamental construct the reader must understand: the difference between data and information. Consider a photo of a person. The data is the photo – the bits, bytes, interpretations of how color should be rendered, etc. But a photo is rich with much information. It probably displays the gender of the individual, their hair color, their age, their ethnic background, perhaps their economic or social status. Even without geotagging, if the photograph has a distinctive background it could reveal the person’s location. Their subject’s hairstyle and dress and the quality and makeup of the photo might suggest the decade it was recorded. Giving over that photo to someone not only gives them the bits and the bytes but also gives them all of that rich information.

In general, for privacy by design, I use Jaap-Henk Hoepman’s strategies and tactics to reduce privacy risks. Just as they can be applied to other threat actors, I think they are equally applicable here. Returning to how to use Hoepman’s strategies against AI, consider the following example:

Your company has been tasked with designing an AI based solution to sort through thousands of applicants to find the one best suited for a job. You’re concerned the solution might adversely discriminate against candidates from ethnic minority populations. If you’re questioning whether this is even a “privacy” issue, I’d point you to the concept of Exclusion under the Solove Taxonomy.  We’re (well the AI) is potentially using information, ethnicity, without knowledge and participation of the individuals, an Exclusion violation.

How then can we seek to prevent this potential privacy violation?

Two immediate tactics come to mind. These are by no means the only tactics that could or should be employed but illustrative. The first is stripping which falls under the Minimize strategy and my ARCHITECT supra-strategy. Stripping refers to removing unnecessary attributes. Here, the attribute we need to remove is ethnicity. This isn’t as simple as removing ethnicity as a data point given to the AI. Rather, returning to the distinction between data and information, we need to examine any instance where ethnicity could be inferred from data, such as a name or cultural distinctions in the way candidates my respond to certain questions. This also includes ensuring that training data doesn’t contain hidden biases in its collection.

The second tactic is auditing which falls under the Demonstrate strategy and my SUPERVISE supra-strategy. AI already employs validation data to ensure that the AI is properly goal seeking (i.e. achieving its primary purpose). Review of this validation process should be used to also continue ensuring that the AI isn’t inferring ethnicity somehow (that we failed to strip out) and using that information inappropriately as part of its goal seeking objective. If it turns out it is, then, similar to a human employee, the AI might need retraining with new, further sanitized, data.

While AI represents a new and potentially scary future, with proper design considerations and strategic systematic approaches, we reduce the potential privacy risks they would otherwise create.