Moneyball and the lesson for Privacy

If you haven’t read the book, Moneyball: the Art of Winning an Unfair Game, or seen the movie staring Brad Pitt and Jonah Hill, I highly encourage you to see them out. I tend to be a “privacy” subject matter expert, but one of the most important aspects of that expertise is the ability to consume diverse content from other expects and apply it to the world of privacy. Whether its behavioral economics (from sources like the Freakonomics and Hidden Brain podcasts), how people learn (from YouTube channel Vertasium), or use of statistics and probability in baseball (i.e. Moneyball).

                I see a huge parallel between Baseball management in the pre-2000 era and the privacy management today. For a hundred plus years, Baseball team management was governed by intuition. Managers and scouts thought they knew what made up a good team. Baseball statistics were published for decades, but wasn’t used in a way that really optimized team performance. The concept of Sabermetrics (the empircal analysis of in game acitivity) began in the 1960s, grew in 1970s, but then transformed the business of Baseball in the 2000s (as chronicled in the book and movie; also see this podcast). Sabremetrics took the “intuition” out of Baseball and turned it into a science, one based on statistics and probability. Science can’t tell you a particular batter will hit a home run against a particular pitcher, but probability can show that you, if you make the same choices, game after game, you’re going to end up with result in a certain range.

                Similar transformations have hit other industries. I believe, the privacy profession is in line for this sort of refactoring. Attempts have been made for decades to create KPI (Key Performance Indicators) for privacy. One of my earliest introductions was a talk by Tracy Ann Kosa at an IAPP conference about a decade ago.  But most privacy program KPIs, in my opinion, still follow the Baseball analogy. Scouts were looking at player stats, but ultimately they were looking at the wrong statistics to determine game outcome. As Jonah Hill says in the movie, “Your goal is not to buy players, its to buy wins.” The privacy profession has, for its short life, relied mostly on heuristics, shortcuts that we intuitively sense, improve desired outcomes. Those shortcuts (such as GAPP, FIPPs, Principles, etc.) have become goals themselves and the metrics tied to those intermediary goals. But, like Baseball, the goal is not the player, the goal is the game. Analogously, a goal in privacy is not transparency, but rather whether a person has the ability to make decisions about things that affect them (with full knowledge and without being overwhelmed). Transparency is a sideshow. Transparency is worthless if it overwhelms or doesn’t support meaningful decision making.

                The point of this post is that metrics are important, rigorously important. Privacy needs to step out of the dark ages of intuition, superstition, and old wives tales.

The author is principal at Enterprivacy Consulting Group, a boutique consulting firm focused on privacy engineering, privacy by design and the NIST Privacy Framework.

Diagramming Data Transfers

International data transfer is probably one of my least favorite privacy exercises. Why? Probably my main dislike deals with the fact that its not really about privacy, but often more about protectionism. That being said, data transfers are a hot hot topic these days, both in Europe and in countries like China and Brazil. It wasn’t until recently that I realized if you look at GDPR from far away, you see there are really four key chapters

  • Chapter II – Principles
  • Chapter III – Rights of the Data Subject
  • Chapter IV – Controllers and Processors
  • Chapter V – Transfers

Fundamentally, most people, myself included, equate GDPR to principles of data processing, rights afforded the data subject and obligations of controllers and processers, but right up there with those key concepts is a whole chapter on international data transfers. At least in the minds of the GDPR authors, transfers are clearly not an afterthought but one of the four major components of the regulation.

With data transfers clearly an important element of GDPR, its important that the analysis of transfers be done with some care. I, for one, am a very visual person. I can analogize concepts visually much easier than verbally. Astute readers may have noticed the diagrams accompany my previous post on data transfer regarding the recent draft guidelines put out by the European Data Protection Board. In working through data transfer scenarios, I’ve found it extremely helpful to illustrate or diagram them.

Disambiguating “transfers” and “transmission” of data

Before diving into how to diagram data transfers, it’s important to distinguish the terms transfer and transmit. Transmission of data occurs when data goes from one place to another. If I send you an email with an attachment, I am transmitting data to you. That data is transmitted over multiple servers, through different service providers and maybe through different geographies. A transfer, however, under the GDPR is a legal chain by which a controller or processer transfers data to another controller or processor. In other words the transmission ≠ the transfer.

Perhaps an example would help. I use Dreamhost to host my website. On my website, I host a file (say a pretty infographic). You go to my website and download the file. The file is transmitted from Dreamhost (wherever their servers are) to you (wherever you are) However, there is not a legal transfer from Dreamhost to you. The transfer was from me to you. I may never have even had the file in my possession. Let’s illustrate that.

Simple transfer/transmission diagram

As you can see, I’ve used dotted lines to illustrate the transmission of data, the 0’s and 1’s flying over the internet from DreamHost to You. But the transfer, the metaphysical or conceptual transfer, from me to you, is illustrated with a solid line. Let’s look at an EU example.

European Union data transfers

EU data transfer diagram

Here, ChairFans, GmbH, a German company, sends a file to Star Analytics, Inc., in the US, to analyze. In this case the transmission is parallel to the transfer. ChairFans is transmitting data to Star Analytics and they are also transferring that data. Remember, the transmission refers to the physical bits flying across the Atlantic Ocean and the transfer refers to the act of one entity giving the other entity the data. If you’re still struggling, you can think of it this way. If a hacker broke into ChairFans and stole the data, the data would still be transmitted over the internet across the Atlantic Ocean, but ChairFans didn’t “transfer” data to the hacker. It was not a deliberative, intentional act of making the data available to the hacker.

Do we need a GDPR Chapter V transfer tool for this transfer? If there was

Diagramming

To diagram data transfers, I’m using diagrams.net, a free (and privacy friendly) tool to create diagrams. I’ll provide the file for all these diagrams at the end of this blog. I’ve also included a template for the shapes I’m using, which you can use to create your own data transfer diagrams. For the following example, I’m first going to illustrate the Use Case for supplementary measures in the EDPB Recommendations.

EDPB Recommendation

Use Case 1: Data storage for backup and other purposes that do not require access to data in the clear

A data exporter uses a hosting service provider in a third country to store personal data, e.g., for backup purposes. Notice, I’ve now added the labels, Exporter and Importer to the entities.

Illustration 3

Use Case 2: Transfer or pseudonymised Data

A data exporter first pseudonymises data it holds, and then transfers it to a third country for analysis, e.g., for purposes of research. This really isn’t distinguished from the previous example. I’ve added a gear icon to indicate the pseudonymization.

Illustration 4

Use Case 3: Encrypted data merely transiting third countries

A data exporter wishes to transfer data to a destination recognised as offering adequate protection in accordance with Article 45 GDPR. The data is routed via a third country.

Illustration 5

Use Case 4: Protected recipient

A data exporter transfers personal data to a data importer in a third country specifically protected by that country’s law, e.g., for the purpose to jointly provide medical treatment for a patient, or legal services to a client. No different than illustration 3

Use Case 5: Split or multi-party processing

The data exporter wishes personal data to be processed jointly by two or more independent processors located in different jurisdictions without disclosing the content of the data to them. Prior to transmission, it splits the data in such a way that no part an individual processor receives suffices to reconstruct the personal data in whole or in part. The data exporter receives the result of the processing from each of the processors independently, and merges the pieces received to arrive at the final result which may constitute personal or aggregated data.

Illustration 6

Use Case 6: Transfer to cloud service providers or other processor which require access to data in the clear

A data exporter uses a cloud service provider or other processor to have personal data processed according to its instructions in a third country.

Illustration 7

Use Case 7: Remote access to data for business purposes

A data exporter makes personal data available to entities in a third country to be used for shared business purposes. A typical constellation may consist of a controller or processor established on the territory of a Member State transferring personal data to a controller or processor in a third country belonging to the same group of undertakings, or group of enterprises engaged in a joint economic activity. The data importer may, for example, use the data it receives to provide personnel services for the data exporter for which it needs human resources data, or to communicate with customers of the data exporter who live in the European Union by phone or email. Here I’ve added an IT system to indicate that the Common Enterprise has remote access to that IT system.

Illustration 8

EDPB Guideline on Article 3 and Chapter V

Next up I’ll tackle the examples from the draft EDPB Guidelines on the Interplay Article 3 and Chapter V.

Example 1

Maria, living in Italy, inserts her personal data by filling a form on an online clothing website in order to complete her order and receive the dress she bought online at her residence in Rome. The online clothing website is operated by a company established in Singapore with no presence in the EU. In this case, the data subject (Maria) passes her personal data to the Singaporean company, but this does not constitute a transfer of personal data since the data are not passed by an exporter (controller or processor), since they are passed directly and on her own initiative by the data subject herself. Thus, Chapter V does not apply to this case. Nevertheless, the Singaporean company will need to check whether its processing operations are subject to the GDPR pursuant to Article 3(2).12

Illustration 10

Example 2

Company X established in Austria, acting as controller, provides personal data of its employees or customers to a company Z established in Chile, which processes these data as processor on behalf of X. In this case, data are provided from a controller which, as regards the processing in question, is subject to the GDPR, to a processor in a third country. Hence, the provision of data will be considered as a transfer of personal data to a third country and therefore Chapter V of the GDPR applies. Note, I’ve added the labels C and P to indicate Processor and Controller

Illustration 9

Example 3: Processor in the EU sends data back to its controller in a third country

XYZ Inc., a controller without an EU establishment, sends personal data of its employees/customers, all of them non-EU residents, to the processor ABC Ltd. for processing in the EU, on behalf of XYZ. ABC re-transmits the data to XYZ. The processing performed by ABC, the processor, is covered by the GDPR for processor specific obligations pursuant to Article 3(1), since ABC is established in the EU. Since XYZ is a controller in a third country, the disclosure of data from ABC to XYZ is regarded as a transfer of personal data and therefore Chapter V applies.

Illustration 10

Example 4: Processor in the EU sends data to a sub-processor in a third country

Company A established in Germany, acting as controller, has engaged B, a French company, as a processor on its behalf. B wishes to further delegate a part of the processing activities that it is carrying out on behalf of A to sub-processor C, a company established in India, and hence to send the data for this purpose to C. The processing performed by both A and its processor B is carried out in the context
of their establishments in the EU and is therefore subject to the GDPR pursuant to its Article 3(1), while the processing by C is carried out in a third country. Hence, the passing of data from processor B to sub-processor C is a transfer to a third country, and Chapter V of the GDPR applies.

Illustration 11

Example 5: Employee of a controller in the EU travels to a third country on a business trip

George, employee of A, a company based in Poland, travels to India for a meeting. During his stay in India, George turns on his computer and accesses remotely personal data on his company’s databases to finish a memo. This remote access of personal data from a third country, does not qualify as a transfer of personal data, since George is not another controller, but an employee, and thus an integral part of the controller (company A). Therefore, the disclosure is carried out within the same controller (A). The processing, including the remote access and the processing activities carried out by George after the access, are performed by the Polish company, i.e. a controller established in the Union subject to Article 3(1) of the GDPR.

Illustration 12

Example 6: A subsidiary (controller) in the EU shares data with its parent company (processor) in a
third country

The Irish Company A, which is a subsidiary of the U.S. parent Company B, discloses personal data of its employees to Company B to be stored in a centralized HR database by the parent company in the U.S. In this case the Irish Company A processes (and discloses) the data in its capacity of employer and hence as a controller, while the parent company is a processor. Company A is subject to the GDPR pursuant to Article 3(1) for this processing and Company B is situated in a third country. The disclosure therefore qualifies as a transfer to a third country within the meaning of Chapter V of the GDPR.

Illustration 13

Example 7: Processor in the EU sends data back to its controller in a third country

Company A, a controller without an EU establishment, offers goods and services to the EU market. The French company B, is processing personal data on behalf of company A. B re-transmits the data to A. The processing performed by the processor B is covered by the GDPR for processor specific obligations pursuant to Article 3(1), since it takes place in the context of the activities of its establishment in the EU. The processing performed by A is also covered by the GDPR, since Article 3(2) applies to A. However, since A is in a third country, the disclosure of data from B to A is regarded as a transfer to a third country and therefore Chapter V applies.

Illustration 14

My comments to the EDPB

Subsequent to the draft guidelines above, I made a comment to the EDPB on two scenarios they should cover. Those scenarios are detailed below.

A data subject contracts with X, GmbH (in Germany) which is a European Union based subsidiary of X, Inc (in the United States). However, the data subject never actually supplies personal data to X, GmbH as the data subject directly transmit data to X, Inc. in the United States. This is a Chapter V transfer of data requiring a transfer tool. X, GmbH and X, Inc. use standard contractual clauses in place governing the transfer of data. X, GmbH is the exporter and X, Inc. is the importer. 

Illustration 15

ABC, GmbH (in Germany) instructs employees to use a service provided by X, Inc., in the United States. Employees’ behavior is tracked via the service provided by X, Inc, thus X, Inc. is subject to GDPR for the data under Article 3.2(b). Because ABC, GmbH is “mak[ing] personal data, subject to this processing, available to…” X, Inc. via instructions to its employees, there is a transfer of data under Article V. ABC, GmbH and X, Inc. execute the standard contractual clauses with ABC, GmbH as the exporter and X, Inc. as the importer.

Illustration 16

I posed this scenario on LinkedIn

Company X, GmbH (DE) host data on an Australian data server. They contract with Company Y, Inc. in the United States to process data. Company Y’s employee working remotely in Australia, accesses the data on the data server. X, GmbH and Y, Inc. execute Standard Contractual Clauses to govern the transfer. There is a legal transfer of data because X, who has putative control over the data in Australia, gave access to Y, who has putative control over it’s employee in Australia. This despite the fact that the data never left Australia.

Illustration 17

If you want to explore these scenarios and make some of you’re own, download this file (be sure to right click and save the file to your desktop). Then go to https://diagrams.net then open the file from there.

Comment on Guidelines 05/2021 on the Interplay between the application of Article 3 and the provisions on international transfer as per Chapter V

Below are my comments on recent EDPB Guidelines which I’m submitting as part of their public consultation.

The guidelines need to provide an example which is a very common scenario:

A consumer data subject (located in the EU) registers with an online service. The service is being offered by a company in a third country, thus placing the company under the territorial scope of GPDR via Art. 3.2(a). However, when registering for the service, the data subject enters into an agreement with a subsidiary of the company located in the EU. For avoidance of doubt, the subsidiary in the EU, never possesses personal data of the data subject. It appears, in this scenario, that there is a legal “transfer” from the subsidiary in the EU to the parent company in the third country.

Example I

Illustration of data transmission and a legal data transfer

In the example illustrated above, a data subject contracts with X, GmbH (in Germany) which is a European Union based subsidiary of X, Inc (in the United States). However, the data subject never actually supplies personal data to X, GmbH as the data subject directly transmit data to X, Inc. in the United States. This is a Chapter V transfer of data requiring a transfer tool. X, GmbH and X, Inc. use standard contractual clauses in place governing the transfer of data. X, GmbH is the exporter and X, Inc. is the importer. 

A similar scenario exists when a business in the EU directs its employees to use an app (such as for Human Resource purposes) which is provided by a vendor in a third country which monitors the behavior of the employees (such as job time tracking), thus subjecting the vendor to GDPR under Art 3.2(b). Even though the employer never holds the data, this still appears to be a transfer under the guidance (“otherwise makes personal data available”). A clarifying example in the guidelines be helpful.

Example II

No alt text provided for this image
Illustration of a data transmission and legal data transfer

ABC, GmbH (in Germany) instructs employees to use a service provided by X, Inc., in the United States. Employees’ behavior is tracked via the service provided by X, Inc, thus X, Inc. is subject to GDPR for the data under Article 3.2(b). Because ABC, GmbH is “mak[ing] personal data, subject to this processing, available to…” X, Inc. via instructions to its employees, there is a transfer of data under Article V. ABC, GmbH and X, Inc. execute the standard contractual clauses with ABC, GmbH as the exporter and X, Inc. as the importer.

Respectfully submitted,

R. Jason Cronk

CHALLENGE:90 Trails in 100 Days

Challenge: 90 Trails in 100 Days

Sunrise over Myakka River State Park

In the fall of 2020, I went on a wonderful, adventurous, hike down from the rim of the Grand Canyon into Phantom Ranch on the river. The trip was arduous and the climb out was tough given my hailing from the vertically challenged state of Florida. Prior to this undertaking, I had begun training, in Florida, including perhaps Florida’s most vertical trail, the Torreya Challenge in Torreya State Park. After this adventure and the two months of training leading up to it, I became a bit of a couch potato in December and January. About the middle of January, I decided to do something about that. Seeing all the wonderful trails in my area, I set myself a challenge, to hike or bike 90 different trails in 100 days. I gave myself a hundred days to account for weather, work or other impediments to doing a trail each day.  

Now you may be asking, what this has to do with privacy. The short answer is not much, but there is always an angle. In using AllTrails, the trail mapping application, I discovered a nifty way to stalk people. See my previous blog post for more.

In February, thanks to Publix Supermarkets, I procured a large amount of trail mix. By the end, despite adding some more in March and April, I was down to one container.


My typical kit consistent of day pack with water reservoir, bear bell, bear mace, Chapstick, headphones to listen to Privacy podcasts, snacks (not pictured trail mix), and trail maps. Also not pictured are optional sunscreen and insect repellent.

Off I set on January 30th. The task seemed simple but, as with many things, implementation was more fraught than at first imagined. I tried to do longer trails or those  farther away when I had more time (like weekends) or during nice weather. One early trail, Pond Loop, at Okeeheepkee Prairie County Park, I completed on a rainy afternoon was only 0.5 miles. I decided after that to only include trails over 1 mile long. This lead me to a few times combining short trails into one “trail” or stretching a trail to it’s extreme (exploring every nook and cranny) to try to get that mile in. Trying to define a “trail” also led to some creative interpretations. Not all trails are simply laid out in the platonic idealized state. Some are based on forest service roads, some intersect and loop and figure 8. Some are out and back, retracing your steps. I had to break some long trails, like the 30+ miles of the St. Marks trail into more manageable pieces of about 16 mile chunks (8 out and 8 back). Some of my trips weren’t trails at all, but I counted them, like when I walked 6 miles home after dropping off a truck at the rental car company. I learned about new trails, which weren’t easily found, like the Capital to Coast trail still under construction, which when fully complete with be 120 miles of biking or shared use paths from Tallahassee to the Emerald coast. 

It was perhaps the best time to be out hiking and biking in Florida. Spring weather meant it wasn’t too hot, the parks were green and flowers were in full bloom. I saw so many animals, many that your rarely see as a weekend warrior.  In addition to the usual squirrels and lizards, I saw a bobcat, a family of boar, water moccasins and other snakes, a mole,  a red pileated woodpecker, a gopher tortoise, giant mosquitoes, ticks, spiders, and many more. I did not, however, see a bear. Not to say they weren’t there, but ever since I encountered a bear last year, I’ve been hiking with a bear bell and bear mace, so they’ve thankfully kept their distance. 

What lurks beneath? Creature from the black lagoon? Manatee? Large alligator? Something was moving fast and leaving a wake under the Crooked River in Tate’s Hell State Forest.

As a capstone to my challenge, I returned to Torreya State Park to take on the Torreya Challenge. It was a wonderous exhausting 4 hours which left me with a terrible head ache, but I made it. 90 different trails. 98 days in the making. 478 miles covered. Challenge complete. Level UP! 

More Galleries

Animals and Insects
Flora and Fungi
Doll's Head Trail
IMG_2133
Georgia
Econfina State Park
Emerald Coast
Torreya State Park - Torreya Trail and the Torreya Challenge Trail
Panoramas
# Location Trail Miles Type Links
1 Lake Talquin State Forest – Lines Track West Loop 3.8 Hike favicons (16×16)
2 Ellinor Klapp-Phipps Park West Loop 2.9 Hike
3 J.R. Alford Greenway Bluebird Loop 3.9 Hike
4 San Luis Mission Park San Luis Park Loop 1.98 Hike favicons (16×16)
5 Apalachicola National Forest GF&A Trail 5 Hike

6 Maclay Gardens State Park Shared Trail Loop 5.5 Bike https://www.floridastateparks.org/maclaygardens
7 Lake Talquin State Forest – Lines Track Talquin Loop (Blue) 6 Bike
8 Okeeheepkee Prairie County Park Pond Loop 0.5 Hike  
9 Apalachicola National Forest Munson Hills 8.4 Bike
10 Governors Park Fern Trail 3.4 Hike  
11 Three Rivers State Park Eagle Trail 3

Bike

12 St Marks Trail North Trail 16 Bike
13 Ochlockonee River WMA Old Cemetary Rd 3.9 Hike  
14 Lafayette Heritage Trail Park Lafayette Heritage Trail 6.9 Hike  
15 Wakulla Springs State Park Wakulla Springs Park Trail 10.1 Hike
16

Orchard Pond

Orchard Pond Trail 6.9 Bike  
17

Silver Lake Recreation Area

Silver Lake Habitat Trail 1.4 Hike
18 Timberlane Ravine Nature Preserve Timberlane Ravine Nature Trail 1.5 Hike
19 San Felasco Hammock Preserve State Park Moonshine Creek Trai 1.6 Hike
20 Lakeland Highlands Scrub  Lakeland Highlands Scrub Trail 3.1 Hike  
21 Catfish Creek Preserve State Park Campsite 2 White Trail 5.5 Hike
22 Black Creek Preserve Red Trail 4.9 Hike  
23 Tom Brown Park Magnolia MTB Trail 3 Bike  
24 Central Park Central Park Lake Loop  1.9 Hike  
25 Apalachicola National Forest Camel Lake Loop 8.2 Hike
26 J. R. Alford Greenway Yellow Loop 5.3 Bike  
27 Miccosukee Canopy Road Greenway Miccosukee Greenways Trail 15.5 Bike  
28 Bald Point State Park Loop 3.1 Hike
29 A.J. Henry Park A.J. Henry Park Trails 1.9 Hike
30 Alfred B. Maclay Gardens State Park Bike Loops 5.7 Bike
31 Wakulla State Forest Nemours Trail 1.6 Hike
32 Ochlocknee River WMA Cut Through Lewis Loop 2.5 Hike  
33 Kolomoki Mounds State Park Spruce Pine Trail 3.1 Hike
34 Wilson Hospice House Wilson Hospice House Trail 1.3 Hike  
35 Marjorie Turnbull Park Trail 1.6 Hike  
36 Tallahassee Morning Hike from Budget 5.4 Hike  
37 Gil Waters Preserve at Lake Munson Trail 1 Hike
38 Bald Point State Park Sandy Trails 4.2 Bike
39 Elinor Klapp-Phipps Park Redbug Trails 4.6 Bike
40 Tate’s Hell State Forest High Bluff Coastal Loop Trail 9 Bike
41 Ochlocknee State Park Pine Bluff Trail 1.2 Hike
42 St. Joseph Island Loggerhead Trail and Maritime Hammock Nature Trail 21.5 Both
43 Tom Brown Park West Cadillac Trail 3.1 Bike
44 J.R. Alford Greenway Long Leaf Trail 4 Hike  
45 Alfred B. Maclay Gardens State Park Ravine Trail 1.9 Bike
46 Cascades Park Cascades Park Loop 2.5 Hike
47 Apalachicola National Forest Oak Park Bridge Trail 4.8 Hike
48 St. George Island State Park Gap Point Trai 5.5 Hike
49 St. George Island State Park Sugar Hill Beach Old Road 7.3 Both
50

St. Marks Trail

Wakulla Springs to St. Marks 16 Bike  
51 Myakka River State Park Fox’s Low to Mossy Hammock 3 Hike
52 Myakka River State Park Mossy Hammock to Fox’s Low 9.5 Hike
53 Plant City Dean’s Ride 8.5 Bike  
54 River Rise Preserve State Park River Rise Preserve Trail 2.9 Hike
55 St. Marks Trail North Trail 9.8 Bike  
56 Tom Brown Park Subaru to Tom Brown 3.5 Hike  
57 Apalachicola National Forest Wright Lake White Trail 5.5 Hike
58 St. Mark WMA Plum Orchard to St Marks Via Port Leon 8.2 Hike  
59 Capital Circle SE  Captial Circle SE Shared Use Path 13 Bike
60 Econfina River State Park Blue and Orange Trail 12.5 Bike
61 Anita Davis Preserve at Lake Henrietta Park Lake Henrietta Trail 1.8 Hike

62 Goose Pond Trail Goose Pond Trail 2.7 Hike  
63 Wakulla State Forest Double Springs Trail with Petrik Spur 4.7 Bike
64 Fred George Greenway and Park Fred George Loop 1.5 Hike  
65 Lake Talquin State Forest Long Leaf Loop 3.9 Hike

favicons (16×16)

66

Guyte P. McCord Park

Sculpture Trail 1.2 Hike
67 Ochlockonee Bay Trail Ochlockonee Bay Trail 31.2 Bike  
68 Optimist Park Indian Head Trail 2 Hike
69 Chase Street Park Monticello “Ike Anderson” Bike Trail 4.5 Bike  
70 Apalachicola National Forest – Bradley Bay Wilderness Monkey Creek Trail Head on Florida Scenic Trail 4.9 Hike
71 FSU Bike Path Ocala Rd to Stadium Drive 2.3 Bike  
72 Seminole State Park Gopher Tortoise Nature Trail 1.8 Hike
73 J.R. Alford Greenway Wiregrass and Beggarwood Loop 3.3 Bike  
74 Lake Talquin State Park Lake Talquin State Park Trail 1.6 Hike
75 Capital to Coast Trail St. Marks to 319 24.3 Bike  
76 Constitution Park Dolls Head Trail 2.4 Hike  
77 East Roswell Park Park Loop Trail 2.1 Bike  
78 Chattahoochee-Oconee National Forest Andrews Cove Trail 3.8 Hike
79 Unicoi State Park Bike Trail 3.5 Bike
80 Unicoi State Park Anna Ruby Falls 1 Hike
81 Brinkley Glen Park Brinkley Glen Trail 1 Hike
82 Letchworth-Love Mounds Archaelogical State Park Letchworth Mounds Loop 2.7 Hike
83 St. Marks WMA Florida Scenic Trail 5.5 Hike  
84 Econfina State Park River Loop (Red Trail) 3.3 Hike
85 Torreya State Park Torreya Trail 6.6 Hike
86 Lafayette Heritage Trail Park East Cadillac Loop 2.6 Bike
87 Governors Park Fern Trail, Kohl’s Trail and Blairstone Multi-Use Trail 4.4 Hike  
88 Apalachee Regional Park Cross Country Loop 3.3 Hike
89 St. Andrews State Park Road and Pine Flatwoods Trail 5.5 Both
90 Torreya State Park Torreya Challenge 9.3 Hike
      478    

 

Google Calendar Privacy Vulnerability

Its interesting how events can lead one find privacy and security vulnerabilities. I’m reminded of the old Connections show, where James Burke would connect seemingly unrelated events in human history and show how one led to another. During my Winter 2021 Strategic Privacy by Design course, the United States did a time shift known as Daylight Saving Time, an anachronism from the days of agriculture where the government thought changing the time twice a year to adjust to changing sunlight would help farmers use time more effectively. As a result of this shift, some students in Europe showed up at the end of a lecture because I had adjust my clock, but they, obviously being in Europe, had not.

As a result of this timing error, I thought it might be good to create calendar items in Moodle (the LMS I use) for the Spring 2021 Strategic Privacy by Design course. The plan was to export the iCal file and send it to students so they would each be able to insert the important course events in their own calendar. I did just that into my calendar as well, which, unfortunately is in Google.

My eagle eyed assistant instructor, Maria, noticed when she was checking my schedule to send me an invite to a meeting, that should could see these items, even though I had set up to only share Free/Busy calendar (see below).

After digging around, I finally figure out what was going on. Visibility on each calendar item has options of: private, public or default visibility (meaning to default to the overall calendar’s visibility).

However, these calendar items had a class in the iCal file of public, which overrode my calendar’s default of Free/Busy only.

Those events were imported. I wanted to check, so I had my security intern invite me to three event, one she set to private, one she set to public and one she set to default visibility. As expected, despite my calendar set to Free/Busy only, the “public” event showed as public.

Your reaction may be, well this event is public, but two problems persist. 1) It still shows MY interest or possible attendance in this public event, not just whether I’m busy or free; and 2) when the sender has their calendar default to public and doesn’t realize that but sends you an invite to talk. I would suggest that my calendar settings should override the imported event’s settings, just to be on the safe side.

By the way, if anyone has a suggestion for a privacy friendly online calendar (so I can share my free/busy schedule), I’d appreciate hearing from you. I haven’t found a good alternative yet.

One Way Tables

One of the potential downsides of relational data is the ease at which data can be related in both directions. A simple search of the table below will reveal not only if a known customer has COVID but a list of all COVID positive customers.

Supposed instead, you want to be able to determine if a customer has COVID but not easily obtain a list of all customers who have COVID. How might you do that? This can be accomplished using one-way functions (hashes).. By hashing the customer’s name (shown as H(name) function) and replacing that in the customer field, we no longer can get a list of customer’s with COVID. A simple SQL query will get the customer’s COVID status from their name: SELECT Covid from TABLE where Customer = hash(customer_name). The inverse, however, won’t get me the list of customers with COVID but a list of hashes: select customer from TABLE where COVID =”Positive”.

Now hashes are one way functions, meaning it’s very difficult to determine x, given H(x). However, because of the very few values of customer’s name, it’s easy to compute what’s called a rainbow table, that is a table of all possible hash values. We can then lookup the hash in the rainbow table and obtain the customer’s name for those customers that have COVID. The table below assumes we know all the names of customers, an attacker may not, but names are fairly common and even if we created our rainbow table with six thousand different common first names, that’s fairly trivial to compute.

We address this with a concept called salting, which is adding a random value to the Name before hashing it, i.e. Customer = H(Name+Salt). One problem though we can’t use the same salt for every customer or an attacker could precompute a rainbow table just adding the salt to each name before calculating the hash. We also need the salt BEFORE we do the hash. Now customer’s can’t be expected to remember the salt before looking it up, but we can give them an account number, which we can relate to the salt. A customer, wanting to look up their COVID status, enters their account number and name. The system retrieves the salt using the account number and hashes the name to determine the customer and look up their COVID status. [Note, salts should be incremented with each access to prevent replay attacks but that’s beyond the scope of this blog].

Light arrows show the link between rows, only discoverable with knowledge of the customer’s name.

Now an attacker wanting to identify all known COVID cases would need to hash all the possible customer names against all the possible salts. Padding the salt table with hundreds of thousands of random account numbers could increase the number of computations necessary for the attacker. In fact, you could prefill the account table with a million rows and a million salts and randomly assign account numbers to customers. Even if two customers got the same account number, they aren’t going to get the same Customer value unless they have the same name, H(name+salt). This also benefits us because if we have two people with the same name, they could be given different account numbers and distinct COVID statuses.

In order to create a rainbow table now an attacker must hash six thousand names with a million account numbers, or 109 rows. Not impossible but starting to slow them down.

I’m going to make it even a little bit harder. Right now, we can still use the COVID table to identify how many people are COVID positive or negative. This is especially problematic if each and every customer has the same status. We can do a little encryption and steganography to hide the information. First, for each record we can generate a random 64 byte value (say 787683b51cf3b49886fce82dc34a51a2 in Hexadecimal). If we need to encode a negative result, we ensure that value has the least significant bit of 0. If we need to encode a positive result, we ensure that value has the least significant bit of 1 (i.e. 787683b51cf3b49886fce82dc34a51a3). Note the change in the last digit from 2 to 3, changing the least significant bit and the COVID status. That value is then encrypted using a symmetric cypher (such as AES) without padding or authentication to verify the accuracy of the data. We can use part of the SALT as the encryption key (bolded in the table below)

Now, an attacker can’t search the COVID table to identify how many customers have COVID. Additionally, because we used an encryption algorithm without any validation, any of the values in the COVID column will decrypt with any of the encryption keys in the account table, just with erroneous least significant bits, thus not disclosing anyone’s COVID status at all. If we used a decryption algorithm with padding or authentication, it would provide a quicker backdoor for connecting records in the account table with records in the COVID table.

One more step to slow a potential attacker down is to iterate the hashing, in other words compute H( H( H( H( …. H(name+salt))))), say 100,000 times. While slowing down lookups slightly, it significantly increases the computation of a rainbow table for an attacker, who now must compute 100,000 x 109 hashes. Increasing the number of values in the account table, increasing the potential values for name (such as using given name and surname), and increasing the number of iterations will all serve to slow an attacker down.

The concept described above is sometimes called a translucent database, such that someone who has full access to the data (a database engineer) still can’t interpret it, but with a little extra knowledge (account number and name), a customer can still retrieve their COVID status.

Cookie pollution

Just moments ago I visited lightpollutionmap.info to look for someone to go camping this weekend where there was as little light pollution as possible given a reasonable drive from where I’m at. I was immediately presented with a pretty comprehensive cookie banner. The FIRST thing I noticed was that I couldn’t unselect unnecessary cookies, which included over 75 “statistics” cookies.

Most of those “statistics” cookies, of course, are really third party marketing trackers. The only option presented were “Allow selection” and “Allow all cookies” which were functionally identical because you couldn’t unselect any of the types of cookies. I noticed the button for only selecting necessary cookies was still there but invisible. I clicked it, only be to presented a blank screen. See the video below showing the cookie selection options disappearing. I also noticed the ad settings options to accept or reject vendors was disabled was disabled.

VIDEO of options being hidden

Breaking down “Personal Data”

I’ve rallied for years against the use of PII (or Personally Identifiable Data) as unhelpful in the privacy sphere. This term is used is some US legislation and has unfortunately made its way into the vernacular of the cyber-security industry and privacy professionals. Use of the term PII is necessarily limiting and does allow organizations to see the breadth of privacy issues that may accompany non-identifying personal data. This post is meant to shed light on the nuances in different types of data. While I’ll reference definitions found in the GDPR, this post is not meant to be legislation specific.

Personal Data versus Non-personal Data

The GDPR defines Personal Data as “means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.” The key here term in the definition is the phrase “relating to.” This broad refers to any data or information that has anything to do with a particular person, regardless of whether that data helps identify the person or that person is known. This contrast with non-personal data which has no relationship to an individual.

Personal Data: “John Smith’s eyes are blue.”

In this phrase, there are three pieces of personal data. The first is the name John which is a first name related to an individual, John Smith. The second is his last name. Finally, the third is blue eyes, which also relates to John Smith.

Anonymous Data: “People’s eyes are blue.”

No personal data is indicated in the above sentence as the data doesn’t relate to an individual, identified or identifiable. It relates to people in general.

Identified Data versus Pseudonymous Data

Much consternation has been exhibited over the concept of pseudonymized data. The GDPR provides a definition of pseudonymized: “means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.” The key phrase in this definition is that data can no longer be attributed to an individual without additional data. Let me break this down.

Identified Data: “John Smith’s eyes are blue.”
The same phrase we used in our example for Personal Data is identified because the individual, John Smith, is clearly identified in the statement.

Pseudonymous (Identifiable) Data: “User X’s eyes are blue.”
Here we have processed the individual’s name and replaced it with User X. In other words, its been pseudonymized. However, it is still identifiable. From the definition above, Personal Data is data relating to an identified or identifiable individual. Blue eyes are still related to an identifiable individual, User X (aka John Smith). We just don’t know who he is at the moment. Potentially we can combine information that links User X to John Smith. Where some people struggle is understanding there must be some form of separation between the use of the User X pseudonym and User X’s underlying identity. Store both in one table without any access controls and you’ve essentially pierced the veil of pseudonymity. WARNING: Here is where it can get tricky. Blue eyes are potentially identifying. If John Smith is the only user with blue eyes, it makes it much easier to identify User X as John Smith. This is huge pitfall as most attributable data is potentially re-identifying when combined with some other data.

Identifying Data versus Attributable Data

In looking at the phrase “John Smith’s eyes are blue” we can distinguish between identifying data and attributable data.

Identifying Data: “John Smith”
Without going into the debate of number of John Smiths in the world, we can consider a person’s name as fairly identifying. While John Smith isn’t necessarily uniquely identifying, a type of data, a name, can be uniquely identifying.

Attributable Data: “blue eyes”
Blue eyes is an attribution. It can be attributable to a person, in the case of our phrase “John Smith’s eyes are blue.” It can be attributable to a pseudonym: “User X’s eyes are blue.” As we’ll see below, it can also be attributed anonymously.

Anonymous Data versus Anonymized Data

GDPR doesn’t define anonymous data but in Recital 26 it says “anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.” In the first example, I distinguished Personal Data with Anonymous Data, which didn’t relate to specific individual. Now we need to consider the scenario where we have clearly Personal Data which we anonymize (or render anonymous in such a manner that the data subject is not or no longer identifiable).

Anonymous Data: “People’s eyes are blue.”
For this statement, we were never talking about a specific individual, we’re making a generalized statement about people and an attributed shared by people.

Anonymized Data: “User’s eyes are blue.”
For this statement, we took Identified Data (“John Smith’s eyes are blue”) and processed in a way that is potentially anonymous. We’ve now returned to the conundrum presented with Pseudonymous data. Specifically, if John Smith is the only user with blue eyes, then this is NOT anonymous. Even if John Smith is a one of a handful of users with blue eyes, the degree of anonymity is fairly low. This is the concept of k-anonymity, whereby a particular individual is indistinguishable from k-1 other individuals in the data set. However, even this may not given sufficient anonymization guarantees. Consider a medical dataset of names, ethnicities and heart condition. A hospital releases an anonymized list of heart conditions (3 people with heart failure, 2 without). Someone with outside knowledge (that those of Japanese descent rarely have heart failure and the names of patients) could make a fairly accurate guess as to which patients had heart failure and which did not. This revelation brought about the concept of l-diversity in anonymized data. The point here is that unlike Anonymous Data which never related to a specific individual, Anonymized Data (and Pseudonymized Data) should be carefully examined for potential re-identification. Anonymizing data is a potential minefield.

If you need help navigating this minefield, please feel free to reach out to me at Enterprivacy Consulting Group

Bots, privacy and sucide

I had the pleasure of serving last week on a panel at the Privacy and Security Forum with privacy consultant extraordinaire Elena Elkina and renowned privacy lawyer Mike Hintze. The topic of the panel was Good Bots and Bad Bots: Privacy and Security in the Age of AI and Machine Learning. Serendipitously, on the plane to D.C. earlier that morning someone had left a copy of the October issue of Wired Magazine, the cover of which displayed a dark and grim image of Ryan Gosling, Harrison Ford, Denis Villeneuve, and Ridley Scott from the new dystopian film, Blade Runner 2049. Not only was this a great intro to the idea of bot (in the movie’s case human like androids) but the magazine contained two pertinent articles to our panel discussion: “Q: In. Say. A customer service chat window, what’s the polite way to ask whether I’m talking to a human or a robot?” and “Stop the chitchat: Bots don’t need to sound like us.” Our panel dove into the ethics and legality of deception, in say a customer service bot pretending to be human.

White the idea was fresh in my mind, I wanted to take a moment to replay some of the concepts we touch upon for a wider audience and talk about the case study we used in more detail than the forum allowed. First off, what did we mean by bots? I don’t claim this is a definitive definition but we took the term, in this context, to mean two things:

  • Some form of human like interface. This doesn’t mean they have to the realism of Replicants in Blade Runner, but some mannerisms in which a person might mistake the bot for another person. This goes back to the days, as Elena pointed out, of Alan Turing and his Turing test, years before any computer could even think about passing. (“I see what you did there.”). The human like interface potentially has an interesting property, are people more likely to let their guard down and share sensitive information if they think they are talking to another person? I don’t know the answer to that and their may be some academic research on that point. If their isn’t I submit that it would make for some interesting research.
  • The second is the ability to learn and be situationally aware. Again, this doesn’t require the super sophistication of IBM’s Watson but any ability to adapt to changing inputs from the person with whom it is interacting. This is key, like the above, to giving the illusion a person is interacting with another person. By counter example, Tinder is littered with “bots” that recite scripts with limited, if any, ability to respond to interaction.

Taxonomy of Risk

Now that we have a definition, what are some of the heightened risks associated with these unique characteristics of a bot that, say, a website doesn’t have? I use Dan Solove’s Taxonomy of Privacy as my goto risk framework. Under the taxonomy I see 5 heightened risks:

  1. Interrogation (questioning or probing of personal information): In order to be situationally aware, to “learn” more, a bot may ask questions of someone. Those questions could go too far. While humans have developed social filters, which allows us to withhold inappropriate questions, a bot lacking a moral or social compass could ask questions which make the person uncomfortable or is invasive. My classic example of interrogation is an interview where the interviewer asks the candidate if they are pregnant or planning to become pregnant. Totally inappropriate in a job interview. One could imagine a front like recruitment bot smart enough to know that pregnancy may impact immediate job attendance of a new hire but not smart enough to know that it’s inappropriate to ask that question (and certainly illegal in the U.S. to use pregnancy as a discriminatory criteria in hiring).
  2. Aggregation (combining of various piece of personal information): Just as not all questions are interrogations, not all aggregation of data creates a privacy issue. It is when data is combined in new and unexpected ways, resulting in information disclosure than the individual didn’t want to disclose. Anyone could reasonably assume Target is aggregating sales data to stock merchandise and make broad decisions about marketing, but the ability to discern pregnancy of a teenager from non-baby related purchased was unexpected, and uninvited. For a pizza ordering bot, consider the difference between knowing my last order was a vegetable pizza and discerning that I’m a vegetarian (something I didn’t disclose) because when I order for one its always vegetable but if I order for more than one, it includes meat dishes.
  3. Identification (linking of information to a particular individual): There may be perfectly legitimate reasons a bot would need to identify a person (to access that person’s bank account for instance) but identification as an issue comes into play when its the perception of the individual that they would remain anonymous or at the very least pseudonymous. If I’m interacting with a bot as StarLord1999 and all the sudden it calls me by the name Jason, I’m going to be quite perturbed.
  4. Exclusion (failing to let an individual know about the information that others have about her and participate in its handling or use): As with aggregation, a situationally aware bot, pulling information from various sources may alter its interaction in a way that excludes the individual from some service without the individual understanding why and based on data the individual doesn’t know it has. For instance, imagine a mortgage loan bot, that pulls demographic information based on a user’s current address, and steers them towards less favorable loan products. That practice sounds a lot like red-lining and if it has discriminatory effects, could be illegal in the U.S.
  5. Decisional Interference (intruding into an individual’s decision making regarding her privacy affairs): The classic example I use for decisional interference is China’s historic one-child policy which interferes with a family’s decision making on their family make-up, namely how many children to have. So you ask, how can a bot have the same effect? Note the law is only influential, albeit in a very strong way. A family can still physically have multiple children, hide those children or take other steps to disobey the law, but the law is still going to have a manipulatory effect on the decision making. A bot, because if it’s human interface, and advanced learning and situational knowledge, can be used to psychologically manipulate people. If the bot knows someone is psychologically prone to a particular type of argument style (say appealing to emotion) it can use that and information at it’s disposal to subtly persuade you towards a certain decision. This is a form of decisional interference.

Architecture and Policy

I’m not going to go into a detailed analysis of how to mitigate these issues, but I’ll touch on two thoughts: first, architectural design and second, public policy analysis. Privacy friendly architecture can be analyzed along two axes, identifiability and centralization. The more identified and more centralized the design, the less privacy friendly it is. It should be obvious that reducing identifiability reduces the risk of identification and aggregation (because you can’t aggregate external personal data from unidentified individuals) so I’ll focus here on centralization. Most people would mistakenly think of bots as being run by a centralized server, but this is far from the case. The Replicants in Blade Runner or “autonomous” cars are both prominent examples of bots which are decentralized. In fact, it should be glaringly apparent that a self-driving car being operated by a server in some warehouse introduces unnecessary safety risks. The latency of the communication, potential for command injections at the server or network layer, and potential for service interruption are unacceptable. The car must be able to make decisions immediately, without delay or risk of failure. Now decentralization doesn’t help with many of the bot specific issues outlined above, but it does help with other more generic privacy issues, such as insecurity, secondary use and others.

Public policy analysis is something I wanted to introduce with my case study during the interactive portion of the session at the Privacy and Security Forum. The case study I present was as follows:

Kik is a popular platform for developing Bots. https://bots.kik.com/#/ Kik is a mobile chat application used by 300 million people worldwide and an estimated 40% of US teens at one time or another have used the application. The National Suicide Prevention Hotline, recognizing that most teens don’t use telephones wants to interact with them in services they use. The Hotline wants to create a bot to interact with those teens and suggest helpful resources. Where the bot recognizes a significant risk of suicide rather than just casual inquiries or people trolling the service, the interactions will first be monitored by a human who can then intervene in place of the bot, if necessary.

I’ll highlight one issue, decisional interference, to show why it’s not a black and white analysis. Here, one of the objectives of the service and the bot, is to prevent suicide. As a matter of public policy, we’ve decided that suicide is a bad outcome and we want to help people who are depressed and potentially suicidal get the help they need. We want to interfere with this decision. Our bot must be carefully designed to promote this outcome. We don’t want the bot to develop in a way that doesn’t reflect this. You could imagine a sophisticated enough bot going awry and actually encouraging callers to commit suicide. The point is, we’ve done that public policy analysis and determined what the socially acceptable outcome is. Many times organizations have not thought through what decisions might be manipulated by the software they create and what the public policy is that should guide they way the influence those decisions. Technology is not neutral. Whether it’s is decisional interference or exclusion or any of the other numerous privacy issues, thoughtful analysis must precede design decisions.

Financial Cryptography, Anguilla, Hettinga and Cate

As many of you probably know, the Lesser Antilles in the Caribbean were battered this week by Hurricane Irma, one of, if not, the most powerful storms to hit the Caribbean. Barbuda was essentially annihilated (90% of building damaged or destroyed). Anguilla, a country of only 11,000, hasn’t fared too well either.

What some, but not all, in the Financial Cryptography/crypto-currency community may not know is that Anguilla was home to the first and several subsequent International Financial Cryptography Conferences from 1997. The association, IFCA, still bears that legacy with the TLD of .ai, ifca.ai. The paper presented at IFCA conference and the discussions held by many of its participants paved the ground for Tor, Bitcoin and the crypto-currency revolution. Eventually, the conference had grown to big for this tiny island.

More directly, two of the founders of that conference, Bob Hettinga and Vince Cate, currently live on Anguilla.  While they have since separated from IFCA, their early evangelism was instrumental in the robust, thriving and important work developed at the conference throughout the years. Bob and Vince were both directly affected by the hurricane, though not their spirit. Here is a quote  from Bob about the aftermath:

Have a big wide racing stripe of missing aluminum from front to back. Sheets of water indoors when it rains. Gonna tarp it over tomorrow though. Sort of a rear vent across the back of the great room. See the videos. Blew out three steel accordion shutters covering three sliding glass doors. 200mph winds? No plywood on the island, much less aluminum.

Airport has no tower anymore. SXM airport had its brand new terminal demolished. Royal Navy tried to land at the commercial pier but the way was blocked. Rum sodomy and the lash.

We need money of course, blew a bunch prepping, etc. No reserves to speak of even then. Gotta pay people to do a bunch of stuff, but the ATMs are down so people are just doing shit anyway. Works in our favor except to buy actual stuff. We hadda a little cash stashed in the house and paid about half of that boarding up three of the six demolished sliding glass doors. Tomorrow we’ll put the MICR numbers off one of [Mrs RAH’s] checks here, so you can wire us money if you want. Heh. It still might be weeks before we get it. Or days. Just no idea right now.

If we can get the materials replacing the aluminum on the roof can be done in a couple days. Getting it here under normal circumstances would take weeks.

Certainly months now.

Then we have to rip out and replace all the Sheetrock in the house, basically all the closets and bathrooms and the inner bedroom walls. Again days to fix, months to do.

All the gas pumps on the island were flattened. Literally blown away. That makes thing interesting. A couple grocery stores are open, but who knows how long there inventories will hold up.

See “Royal Navy” above.

Both cisterns fine. 38000 gallons total. Full. Wind ripped off the gutters so no discernible salt in the water. And little floating litter on top. We’ve got water, and we can pull buckets out of the cisterns until the cows come home. I’d been looking at building a hand pump out of PVC and check valves, and now I have some incentive.

Car needs brake work. Threw the error codes the day after the storm. Supposed to park it and have the shop come and get it. Heh. No shop anymore. No money so it works out. 🙂 Car exists to charge the smartphones at the moment. Full tank. Life is good

See the rain from inside Bob’s house during the Hurricane.

I’m making this post to request that the community support Bob, Vince and the island of Anguilla. We can’t help the entire Caribbean but we can donate to efforts to support them and our adoptive island of Anguilla.

For donations to support the island of Anguilla, please consider donating to the Help Anguilla Rebuild Now fund or individual recipients on this page

To help Bob or Vince, please contact them directly (or you can contact me). I will update this page with funding options as I’m made aware of them.

My contact information:

Email: rjc at privacymaverick

Twitter @privacymaverick

LinkedIn