FAA and flight routes

Recently, over the objections of many privacy advocates and airplane owners, the FAA moved to make more flight route information open and publicly available. Specifically, the FAA operates a program called BARR (Blocked Aircraft Registration Request), which allows certain aircraft to be exempt from the public records of flight routes. The FAA collects these flight routes for every flight in and out of the US in order to deal with traffic control issues, congestion, etc. However, this could be clearly sensitive information for some aircraft, those with at risk passengers or cargo, those doing surprise inspections on facilities, etc.

This is very typical of government modus operandi: collect potentially sensitive information and then attempt to secure it (or it’s analogy, making it illegal to look at information that clear for all to see). Although in this case the government has made it much more stringent criteria to be excluded, baring those who may have a legitimate interest in securing their flights that don’t meet the government threshold. This very much reminds me of the law surrounding employee addresses (or other PII) for government employees in Florida. Florida has very broad public records laws and generally, one can get the home addresses of government employees. There are numerous exceptions, for judges, law enforcement, child protective service employees, etc; those deemed by the state legislature as being high risk. This solution is no solution though, because one it deprives the citizenry of public records and adds to the long list of exemptions to the public records and second it deprives individuals of control of their personal information. A better solution would be not to collect home address information from employees, or at least give them the option of not supplying that information in the first place.

Returning to the FAA issue, there are other options. Assuming they do need it for legitimate purposes, the FAA could collect the information in such a way that it doesn’t have the information directly but only in the aggregate (to assess congestion, etc). If it needs the information for legitimate law enforcement purposes, it could use some key escrow or blinding method to store the information but only have it available with a valid court order. Without knowing all the functional requirements of the system, it’s hard to design a privacy protective method, but my purpose is to say that it could be done…… if someone cared enough to do it.

Financial Cryptography 2012

I’m trying to organize a legal panel for the conference on Financial Cryptography and Data Security 2012. The 16th annual conference will be held at the Divi Flamingo Beach Resort in Bonaire. The subject of the legal panel will be

Privacy: Technology versus the Law.
The widespread adoption of PETS (privacy enhancing technologies) faces many obstacles including misunderstanding by corporate leaders, a lack of technical skills within organizations, and business case justification. Are laws and regulations another impediment? Regulators seem to favor one of two approaches: outright bans on the collection of information or requiring information to be collected and then attempting to layer security on top. What laws and regulations do our panelist consider as problematic and what solutions do we have for getting government to support rather than hinder the adoption of PETS.

Aggregation without revelation

I was given the following problem:

You have a web based mail system that you want to provide auto-correction on spell checking. The spell checking system needs to not only correct for common words, but needs to account for proper names of places and obscure languages. Most importantly, privacy considerations need to be taken into account so as not to store or leak one individual’s spelling errors. How do you do it?

Anonymization of data may be insufficient. See AOL

Most intelligent spell checkers do so by recording a word and then changes to that word. If lots of people make the same changes, chances are it’s because of a typographical error or spelling error. You can then suggest that as a correction to others who type in the original potentially misspelled word. How do you do that without storing the misspelled word or the correction unless you have it in it’s aggregate form?

Here is my suggested solution:

Take the user’s email and modulo it with 256 (1 byte). That essentially puts everybody into one of 256 buckets. The reason will be clear later.

On the client side (via Javascript), also take each word as it’s typed and modulo that word into 3 bytes. So, you’ll have some collisions but not more than a few hundred worst case scenario. (Some statistical analysis should be done, but this is my back of the envelope calculation).

If the user does not change the word, do nothing.

If the user makes a correction, take the original and the correction and use M-of-N secret splitting to split that information into N parts. For this particular exercise, you’ll split the information into 256 parts. You’ll then take the z-th part (z being determined by the value of the user’s email moduloed 256 above) and send that back to the server. In and of itself, that information does nothing.

However, now once you have M parts, the server can reconstruct the word and the original and make it available for others to autocorrect. When will this happen? Well the first person provides the first part, the next person to send in a correct won’t necessarily provide the second part. They have a 1/256 chance of providing the same part at the first person. So how many people do you need to get M parts? It comes down to probability. You obviously need at least M people but M people doesn’t guarantee you have M different parts. By adjusting M, the administrator could set a threshold of how many people you would need to have a certain percentage chance of collecting the parts. For example, 30 people gave you a 90% chance and 50 people gave you a 95% chance and 100 people gave you a 99% chance. Once the information is revealed, you’re confident its based on aggregate errors, not one person’s.

Now, when a user types in an email, after each word, the system can calculate the 3 byte compression and send that out to the server. If the server finds any matches, it returns the list of corrections (which may be only a few times long or a hundred items). The client side then compares the word against the corrections and determines those that are likely and those that are not (i.e. “hapyy” to “corrugated” would not be a reasonable match but it would be to “happy” even though say hapyy and corugated both fall in the same 3 byte bucket).

So here you have a solution which only reveals information in the aggregate and gives some level of security so the server doesn’t know what the person is typing.

Google’s strategy

This post is related to privacy in so far as it’s hard these days to talk about Google and not mention privacy in the same breath (as least for anyone involved in the privacy profession).

I’m been thinking about Google a lot lately: strategically, competitively, and in terms of their product mix. I’ve come to a rather radical conclusion that I haven’t seen elsewhere so I thought I’d share. Ostentatiously, Google’s mission is to “organize the worlds information and make it universally accessible.” However, strategically they seem to be on a mission to become the universal middle man.

Consider the number of Google products and services that aim to position them between consumers and producers:

Chrome

Chrome Frame

Chrome OS

Android

Android Market

goo.gl (url shortner)

Offers, Adwords and other advertising products.

TV

Wallet

Page Speed Service

Public DNS

Google can’t physically come between consumers and producers (as say an ISP could though they may be going this route too with Google Fiber) but they can interject themselves virtually. Because of the need to be adopted on either the consumer or producer side of the equation, they add value by offering services that are free (or provide a benefit over costs like adwords). The goal of these services is to, again, get Google in the middle of the equation.

Where the lines of producer and consumer are blurred, such as interpersonal communications, Google also offers a suite of products to be the conduit between the parties: Google +, Voice, Gmail, Groups, Youtube.

What’s interesting is when you start viewing the competition in this light, there really is no competition. Apple is doing the same thing with it’s operating systems, itunes store, app market but there are two obvious distinctions: 1) Apple is only positioning itself between consumers and producer’s of information not products and they aren’t positioning themselves between either producers of consumer goods nor in the consumer as producer (i.e. social) market and 2) Apple is attempting to rent seek and take advantage of it’s position to maximize it’s income.

The next competitor is Facebook which can take advantage of it’s social networking site. To a degree it’s trying to leverage that to get between producers and consumers with it’s advertising platform and with it application space, but this strategy is necessarily limited.

Microsoft probably has the broadest competitive suite of products and services. However, it’s strategy doesn’t seem as cohesive.

There are a host of other companies that compete with Google in niche markets, but nobody (except maybe Microsoft) has the breadth of services strategy aimed at wedging themselves into every transaction.

How to store all the world’s phone calls.

I have a potential interview coming up with a certain technology company, who shall not be named though let’s just say their motto should be Go Big or Go Home. In preparing for the interview, the recruiting contact made a number of suggestions including be prepared to answer the following question:

Expect questions designed to test your analytical and technical ability (example: how would you store all the phone calls in the world?).
I’ve been tossing and turning all night thinking about this question, even though I’m sure it won’t be on the interview. How would I answer it? I think the question is more, how would I not answer it. That question begs a dozen more:
What is the purpose of storing all the phone calls?
Who are you storing it for? The originators? The world?
When you say store all the world’s phone calls, do you mean the meta data (i.e. who called who and when) or do you mean actual recordings of the conversations?
If we’re talking recordings there are many legal and societal implications before we even address the engineering.
How will this information be indexed? retrieved? accessed?
Do we need it searchable or just filed away?
How long do you need to store this information?
The security and privacy implications of such a system are huge. Consider a phone app that listens for sensitive data much like a keylogger. See http://blogs.computerworld.com/17785/sensory_malware_android_app_listens_then_steals_credit_card_data
In terms of pure data, you’re not really looking at much (compared to music audio or video). Voice quality audio is about 8kbits/s. Assuming the average phone user spends one hour per day on the phone that 1kbyte/s*60second*60minutes = 3600 kbytes = 3.6mbytes. Not much by today’s standards. Assuming 1 billion phones worldwide, that’s 3.6 billion megabytes per day (or 3.6 million gigabytes). Now this isn’t unsubstantial. We’re talking 3600 terabytes PER DAY. Google’s search index use around 1000 terabytes. Youtube about 45 terabytes.
I would have to say that one approach would be to use distributed storage. Let each user store their own phone calls locally on their phone. Now we’re back at 3.6 MB per phone user per day, which could easily be stored on a 1Gig sd card. In fact you could store nearly a year’s worth of phone calls on each phone. Of course, we’re back to the original questions I asked, why are we storing this? Does storing it in a distributed fashion like this meet our functional requirements? Even if we’re doing this for searching purposes, we could index the phone calls in a centralized location and allow them to be searched but to pull up the actual phone calls from the phones. Obviously each phone call would be stored on at least two phones, so you have some redundancy, but you could have even more by having each phone user agree to store (encrypted of course) other’s phone calls. Using a 10 to 1 factor would still allow the storage of up to a month’s worth of phone calls on each phone equipped with a 1gig card.
Now that I’ve thought it through, I hope they ask this question…

Schizophrenia on Capitol Hill

As you may be aware, Congress is considering a bill that would require internet service providers to retain data on their customer’s assigned IP addresses  (as well as name, address, credit card number and a host of other data) for up to 12 months.  Though the bill ostentatiously targets child pornographers, the legislation goes so far beyond that one legislator proposed an amendment to rename the bill Keep Every American’s Digital Data for Submission to the Federal Government Without a Warrant Act.


While this bill is horribly privacy invasive, for the obvious reasons, Congress has chosen to accept the familiar bugaboo of child protection as justification enough to impose this burden.  It’s so especially odd given the number of privacy protective laws also being proposed in this Congress. However, such schizophrenic action by Congress or business is by no mean uncommon.  In fact, I would have to say it’s par for the course. Not only is it par for the course for Congress, but most businesses use this approach as well.  Collect more data and then ratchet up security in an attempt protect that data.  


Of course, there is another approach.  The one I’ve been advocating and that was recently picked up recently by none other than Forbes’ Kashmir Hill, is, of course, Privacy By Design.   



What do you want to be when you grow up?

Most people’s eyes gloss over when I say I’m a “privacy professional.”  They don’t have the foggiest idea of what I mean.  This is a problem both outside and within the corporate world.  Some jobs are easily identifiable with a single word.  If I say I’m a fireman, for the most part, people can understand what I do for a living.  There may be some aberration, such as if work mostly handling forest fires versus structure fires but in general, the public understand what I do on a daily basis.

When you start talking information technology professional, the confusion begins.  Even simplifying that term into computer professional, people will tend to put you into two camps: someone who programs computers or fixes them.  If they associate you with the latter, you’re now their new best friend to come get rid of that nasty virus.  However, the range of information technology professionals is much broader. You have programmers for sure but also database developers, web developers, driver developers, component programmers, designers (who know graphical interfaces).  You have a host of administrators: network admins, sys admins, database admins, etc.  Then you have the engineers and architects who design the system.  Then the support personell.  Then you start moving into layers of abstraction and there are IT governance folks who develop data policies, auditors who enforce the policy.  The IT world is rich with nuanced professionals, none of whom would consider their job the same as others, yet from the outside they are easily lumped together, “oh you’re a computer guy.”

Apparently the same is true of lawyers.  Some lawyers have a very public persona.  The criminal defense lawyers on TV, the personal injury attorneys on every billboard, this is what the public sees.  They also know you need to go to lawyers for things like divorces and real estate but still tend to lump everybody into either civil law or criminal law.  The truth is lawyers are much more diverse once they develop a specialty really have limited ability to advise on other issue.  More importantly, some laws are so intricate that many corporate lawyers specialize in very narrow niches: Foreign Corrupt Practices, mergers and acquisitions, etc. 

Returning to my issue at hand. I get the question all the time, of people asking me what I do or what I want to do.  My response “I’m in privacy” is clearly unintellible to people outside the field.  Even people who you think might have an inkling of an idea, still seem perplexed that someone could claim such as their professional vocation.  My point of this blog post is really to help myself come up with a simple easly explanation….. an elevator pitch that I can give when confronted with the question.

My girlfriend had a good suggestion.  I actually asked her last night what she thought I did.  She had a better notion than most but still incomplete.  She says she often struggles explaining it to her friends. She suggested a three prong approach: first say what it is, second explain what it does and third give an example.

1. What is the title? “Privacy Architect”

Now I would say even most seasoned privacy professionals would have a hard time understanding what this means.  The vast majority of privacy pros come from a regulatory compliance background, they either don’t have the IT knowledge or just don’t think in terms of building in privacy up front. 

2.  What does a privacy architect do?  “I help companies design their computer systems and business practices with their customers privacy in mind.” 

While this is a really incomplete picture, I think it’s simple enough to convey mostly what I do.  My work may not be limited to computer systems and business practices.  I think about privacy of other stakeholders than customers. This doesn’t really encompass the legal issues at hand, nor the brand building that having a good privacy reputation can bring.

3.  Give an example of what you do. “Say a company wants to start an online bookstore. What books one purchases can reveal very personal interests: medical issues, political positions or financial status. Customers may want to keep such information private. What I do is help design the online store to empower their customers to preserve their privacy while still allowing the business to serve those customers.”

I’d really appreciate anyone’s feedback on this.  Does it capture what I do?  Is it simple and understandable. 

Helping your customers help themselves

Watch this video…. http://www.ted.com/talks/mikko_hypponen_fighting_viruses_defending_the_net.html

Now consider, how can a company help a customer protect themselves against this kind of attack?  Even given the inherent weaknesses of credit cards, consider this option:

The video says that hackers monetize compromised computers by installing key loggers. The hackers then search through the keyed data looking for credit card numbers. “But we can’t do anything if the customer’s computer is compromised!”  B.S.

What’s the solution?  Don’t have your customer enter all the digits of the credit card via their keyboard.  Display an onscreen keypad for entry of the last four digits.  This thwarts the CC thieves and demonstrates to your customers that you are attempting to protect them even on potentially compromised systems.  Simple, easy, effective.

Social Security Number redux

This article describes the continuing problem of banks using social security numbers as identifiers and partial passwords.  It seems the FTC has previously identified the problem of SSNs being utilized as authentication tokens and has even gone so far as to propose legislation to reduce the over-reliance on them in industry.  I would like to point out my simple solution doesn’t require any legislation, simply publish SSNs on the internet as a ubiquitous identifier, thereby reducing their value to identity thieves and as authentication tokens.