March 2013 – Privacy Maverick

Does Disney screen for sex offenders? Should they?

I’m sitting here with a 4 day park pass from Disney I purchased in 2000. It has two days used and two days remaining. I’ve been reluctant to use it because, I know Disney will want me to convert it into one of their less anonymous park passes, requiring either biometric identification or my name. Disney has been marching towards full scale identification of it’s park visitors for years. I still remember the days in the 70’s when park tickets consisted of coupons for rides in the various theme sections of the park. In 1977 they began issuing 2 day park passes to resort guest (and this was our family vacation so that’s how we rolled) and shortly thereafter began selling all access passports. This quickly became the norm, eliminating the per ride coupon books by the end of 1981, and thus began their need to track patrons for fraud purposes. I call this the “anti-fraud surveillance” business model.

I remember the last time I used this ticket I have in 2001, the ticket taker at the entrance suggested I take the ticket to customer service after I entered to have it upgraded. I politely ignored them. It’s not that I haven’t been to Disney since then but I haven’t had an opportunity to use this particular ticket. I’ve been wanting to go again recently, if nothing else to get use of this ticket which I paid for so many years ago and have been holding onto. However, if I go, it may be my last. Disney is getting too creepy for me. I did get to thinking though, if Disney continues on the road to identifying and tracking guests, will they start screening for sex offenders?

To date I have no knowledge that they do so for their guests, though they do for their employees. Disney does have a problem with sex offenders on their property. However, the problem isn’t that Disney is overrun with offenders, quite the contrary. They have a public perception problem. Because they are geared towards children, every incident becomes a public relations fiasco.

Screening for sex offenders is difficult. There are lots of false positives and many more false negatives as registrants find ways of skirting the system. Even given the heightened scrutiny that Disney is under, I think they would be reluctant to embark on such an offensive. However, as they collect more data about their visitors, they may be inclined to use correlation data to screen and monitor guests. Single male spending too much time around It’s a Small World? Group of teens going from shop to shop but not riding the rides? False profiling is something that is real and problematic. Of course, it’s something I know nothing about.

(Re | De) – Identification

Lots of concern has been put forth about re-identifying people from de-identified data. For those who may not know the term, de-identification is the process of removing identifying information data sets. Think health records where you remove the persons name, date of birth, address, etc and leave the raw health data.

The risks of re-identification have been both exaggerated and underplayed. I’d like to put forth a small framework for thinking about the re-identification and risk. There are two variables I’d like you first to consider. The first is whether the information can be re-identified with public data, private data or theoretical data. For instance, in the infamous NetFlix’s’ case, individual’s movie viewing habits were re-identified through publicly available data. Specifically, NetFlix released de-identified data containing only the names of rentals, dates of rentals and a unique identifier to correlate renters. The purpose was to provide the data to allow teams to develop a better suggestion engine. However, by correlating this data with public IMDB posts, researchers were able to identify a subset of the renters who had purchased certain movies on certain days and then posted reviews of the films on IMDB.

Contrast that scenario to what happen with the reidentification of Paula Brodwell. Paula Broadwell was CIA Director Petraeus’ biographer and was carrying on an illicit affair with him. Suffice to say the FBI was able to identify her because of the correlation of two privately held pieces of information. She logged into a joint Google account shared by her and Director Petraeus. Google had the IP addresses but not her name. The FBI traced those IP addresses to the hotels where they originated but that still didn’t reveal her name. Mrs. Broadwell was uncovered because only she fit the unique pattern of having stayed at those hotels at the times the Gmail account was accessed from those IP addresses. Even the anonymous information about IP addresses and times logged into a Gmail account, when combined with the hotel’s private information, was enough to re-identify Mrs. Broadwell.

The third classification would be the combination of data with theoretical data. I don’t have a good example of this but consider again, some health data about the patients at a clinic. If someone had been sitting in the parking lot, monitoring the cars, they could theoretically combine that information to identify which cars arrived on which day with which treatments were performed on those days. With enough days of monitoring, a good portion of the clinics population might be re-identifying or at the least correlated to what car they came in.

So the first question one needs to ask,

1) Is data needed for re-identification publicly, privately or theoretically available?

The second question to consider for risk analysis is one of scale.

2) When combined with other available data sets, will re-identification be available for one, many or all of the anonymized population?

For example, in the Paula Broadwell case, combining the data only identified one person. It would have taken similar effort for every Gmail account holder to try and identify them. This is an economic barrier to privacy, one of the three privacy vectors I often talk about. In the Netflix’s case, combining the data allowed for re-identification of a subset of the renters, though probably not a large subset.

Clearly the availability of public data sets are more risky to re-identification than private data sets which still pose more risk than the theoretical data set.

Once you’ve identified the means of re-identification, you then have to take into consideration the particularized risks that will befall re-identified persons. Not just compliance risks and not just objective harms like identity theft but also consider subjective harms and loss of trust that may befall the organization releasing the information. There are policy arguments to weigh when considering releasing data for public or private review and by no means does the privacy argument rule the day. But before you can make the determination, you must have all the available information in front of you.

Unlimited talk, text and data?

This is a bit off topic for me but I wanted to post about the common usage in the mobile phone industry to tout unlimited talk, text and data. I question the practice as it relates to data. While unlimited in popular vernacular would seem to imply infinite, in reality we are constrained by the physical limitations of our existence. Unlimited talk in the confines of a month means we can’t exceed 43200 minutes within a given month (except months with 31 days of course). It’s an impossibility on one phone to use more minutes that we’ve divided a month into. Similarly with text, we are limited to the physical number of texts we can type or receive within a month. I dare guess the number that a teenage girl could obtain but suffice to say, theoretically, I’m sure it’s pretty high. Now, I don’t know if the companies limit or monitor automated software. What if I download an app that sends out 1000 texts per minute? Would they notice? Would they care? Afterall, it’s unlimited, right. Truth be told I haven’t read my contract or others but I suspect they may put the brakes on automated texting.

Now consider data. Most, if not all plans, throttle data once you’ve reached a certain threshold. So 4g, at least as specified, has a theoretical limit of 1Gigabit/second. That’s about 324,000 Gigabytes per month. However, most service providers throttle usage down to 3g or 2g once a certain amount of Gigabytes have been processed (my provider is 5 Gigs). Now, at 2g speed, running full speed all month, the maximum could theoretically get is 61.79 Gigs. To say that I still have “unlimited” data is a stretch by any imagination.

I’m not a trial attorney, but this is a lawsuit in the making.

Note, Ive been informed that several carriers do offer truly unlimited service (or have in the past). I’m going based on my experience and my carrier which touts unlimited data but then throttles after 5gigs.

Wal-Mart does not “know what’s up”

Wal-Mart knows what's up meme. A friend recently shared this image on Facebook. The image which appeared to have quite a few shares is meant to imply that Wal-Mart is aware enough to know that Beer Pong enthusiasts will be buying lots of ping pong balls to go with their red Silo cups (wait, where are the kegs, Wal-Mart?). The truth is Wal-Mart probably has know idea why people who by red Silo cups at their store also buy ping pongs. The do know that it happens though because Big Data analysis tells them so. So what does Wal-Mart do in response? They put the two oft purchased items together to increase the sales. They assume that many people want to buy these items together and so by placing them together they will increase sales.

What I love about this example, is it is a good use of Big Data which doesn’t necessarily implicate privacy issues. They don’t need to track individual purchases or purchasers, they only need to know that there is a correlation between cups and balls. They don’t need to pry into people’s lives as to why they purchase these together, but simply that they do. However, some risk remains if Big Data turns into Big Brother. If they did make the connection, could they require ID of purchases ping pong balls along with Silo cups? Sorry you must be 21 to purchase these items together.

Maybe in retrospect Wal-Mart does know what is up (the correlation of purchases) but not why (causation). For more information on Big Data, I suggest the Mayer-Schonberger and Cukier book of the same name.

SSNs and the identity theft of children

The Social Security Administration has called for comments on their proposed rule changes as they relate to issuance and re-issuance of social security numbers to children. I’ve previously written on SSN numbers but this gives me an opportunity to present my solution directly to the SSA. I have no doubt they will not implement either of the solutions I’m proposing (too radical) but if it gets a bug in someone’s ear the tides may eventually change. Below is my commentary. You still have until April 12th to submit your comments. I’d appreciate any feedback and I’ll wait in case any constructive criticism allows me to refine my commentary to the agency.

Background

As stated in the request for comments, the Social Security Administration began issuing social security numbers as a means of tracking persons within the administration of the social security program. Due to the widespread use of this identifier, it quickly became the de facto identifier when interacting with federal agencies and spread also to state agencies. Due to its increased use by government agencies, in 1972 Congress passed the Privacy Act which, in part, restrict the collection of social security numbers by government bodies to those that had specifically been authorized by law.

Because of the use of social security numbers for tax purposes by the Internal Revenue Service and the necessity of filing tax forms with a employee’s social security numbers, they became natural near-unique identifiers for employers to track their employees.

For similar reasons, the financial industry, with similar reporting requirements, found that the near ubiquity of social security numbers within the US adult population made it also useful near-unique identifier for tracking customers, in addition to employees, in their information systems. With the inapplicability of the Privacy Act to private parties, businesses and other persons (such as landlords) routinely ask for social security numbers for a variety of purposes and uses. Most often the only restrictions on social security numbers concerns installing reasonable security against unauthorized disclosure.

As the use of social security numbers for tracking purposes continued to gain traction, it became routine for people to apply for social security numbers for minor children, sometimes even at birth, despite the fact that they would not interact with the social security administration for many years or would not engage in reportable financial transactions.

The purported problems

The Social Security Administration has recognized a problem exists when a minor receives a social security number and that social security number is used by someone who is not intended to be identified by that number. There are three classes of problems related to this type of situation. The first is that the issued social security number is used by a person for employment purposes which erroneously places that number into service within the Social Security Administration’s information system for purposes of calculating payments and benefits. The second is the use of that number as an identifier when interacting with other government agencies and adversely affecting the intended recipient of the social security number (such as by associating them with criminal activity or depleting benefits due to them) . Finally, the third class of problem is when that number is fraudulently given to a private party and said use adversely affects the intended recipient of the social security number (such as a negative credit history).

The real problems

The real problems stem not from the adverse affects that misuse of the social security number have on certain minors but the extent to which our society has been allowed to misuse social security numbers for purposes for which they are not intended nor useful. Specifically,

Social security numbers are not inextricably tied to a unique individual

Though ostentatiously used to identify a unique person, nothing ties the actual person to the number and as the past 20 years of increasing identity theft has shown, many people may be linked (some legitimately but most illegitimately) to many social security numbers. A name, with which we choose to identify ourselves (Robert, Rob, Bob, Bobby, etc) is closer in characteristics to an identifying number (123456789) than to a unique biometric trait which cannot be shared with others.

Before any solution can be consider, the lack of one to one correspondence must be accepted. Attempting to perpetuate the myth that social security numbers identify unique individuals only leaves us open to further problems.

Semi-private social security numbers are misused as an authentication token

Current law attempts to keep social security number only in the hands of the recipients and those parties with a need to know. However, the ubiquitous use of social security numbers means that those persons with access to them is exceedingly large. The problem is exacerbated in the case of minors, who may not themselves know their social security number, but their parents, siblings and other family members do. The idea that they are semi-private has led businesses to accept an individual’s knowledge of a social security number as proof that they are the individual. In security parlance, it has become an authentication token in addition to a identifier. However, as clear from recent history, the widespread availability of social security number means they should be treated as semi-public rather than semi-private. While knowledge may to an extent infer one’s identity (similar to knowledge of semi-private facts like one’s mother’s maiden name), it can not be a wholesale authentication token.

The solutions

Make social security numbers public

All social security numbers should be publically published along with the name of the recipient. This would solve the problem of social security numbers inappropriately relied on as an authentication token. While the concern may be raised that governments and industry currently rely on the semi-private nature of social security numbers to authenticate the individuals they interact with, the explosion of identify theft and use of social security numbers to fraudulently access information, shows that government and industries reliance on social security numbers as an authentication token is unfounded and dangerous. Publicizing the social security number will prevent such inappropriate reliance.

Once this reliance has been severed, the public nature of tying social security numbers to names produces no more risks to the individual than publicizing their names.

Issue new social security numbers on demand and reuse old ones

Perhaps one of the biggest problems associated with the uniqueness of social security numbers is that they purport to tie and individual to a credit history. Credit reporting agencies often store aliases, multiple addresses or other non-unique identifiers in their files yet have the ability to distinguish one John Smith from another so that one’s bad behavior doesn’t adversely affect the other’s credit. The purported uniqueness of social security numbers allows one John Smith (or someone using that alias) to access the good credit of another if they know the number and can furnish it on a credit application. The use of a social security number should not be the key that unlocks the credit kingdom. Businesses must use it only as one possible identifying element, not THE identifying element of a credit file (just as a name is not unique within a credit file).

This solution actually works only if the social security numbers are not public. If they are public (per the proposed solution above), businesses may be inclined to track when numbers are issued to individuals and then reissued to new individuals thus allowing them to tie a number to an individual at a specific time.

This solution does not hinder the social security administration from tracking an individual within an its own system. Assumedly the administration already has the ability to track individuals with multiple numbers to facilitate the existing solutions to misuse of social security numbers.

Don’t issue social security numbers until necessary to begin tracking payments and benefits in the social security system

Finally, from the perspective of the social security administration there is no need to issue a number until such time as the individual interacts with the administration for the purpose of applying payments or applying for benefits. The problem identified with the misuse of minor’s social security numbers is part and partial a result of the administration’s issuance of number years before necessary. The mandate of the administration is to administer the social security program, not to manage a national identifier. Other’s misappropriation of the number for secondary purposes does not change this mandate and the administration should be attempting to role back these secondary uses, not exacerbate society’s reliance on them.

Questions posed

1. Is age 13 the appropriate cut off for application of the revised policy?

No, the problems presented affect all persons with social security numbers regardless of age. The solution should attack the entire threat in a systemic way not attempt to put band-aids on individual issues. The solution proposed by the social security administration is attacking the symptom not the disease.

2. Are the circumstances that we propose for assigning a new SSN to children age 13 and under appropriate?

While appropriate they are insufficient (see answer below).

3. Are there other circumstances that would warrant assigning a new SSN to children age 13 and under?

Per the solution proposed by me above, new social security numbers should be issued to all on demand. This approach would affect not only those children adversely affected by the misuse of their social security numbers but it would also attack the systemic mis-reliance on social security numbers which is the root cause of the identified problems. This would have a wider-ranging effect and social benefit than the solution proposed by the Social Security Administration.

What privacy issues can inform us about the health care debate in the US.

Health Care is singled out in the United States as one of the industries which needs significant privacy protection. Health information disclosed without permission of the data subject can result in embarrassment, stigmatization and discrimination. Today, I’d like to consider the discriminatory aspects of health information privacy. More specifically the potential in insurance price discrimination. In designing a privacy preserving system, some information must be kept and utilized. Doctors need access to your prior health history in order to assist them in diagnosing your condition. Insurance companies need to know what they are paying for and want to know your history to price your policy based on your risk.

Consider for a moment two scenarios on the extreme side of things. In the first, let’s assume that with perfect knowledge, an insurance company could predict with absolute certainty what ailments would befall you and how much of a cost you’ll impose in the future on the insurance company. In the second, we assume that the insurance company can have no information on you and thus everybody is a blackbox; the only information the insurance company can use to price their service is on the overall health of society.

In the second scenario it’s obvious the insurance company knows nothing about you, but I would submit that in both scenarios privacy in perfectly protected from adverse discrimination. But how can that be, you say? In the first, the insurance company knows EVERYTHING about me. Insurance is about pooled risk. In the former scenario, there is no pooled risk. The insurance company collects payments from you in exact proportion to the cost burden you impose on the company. You’re just funneling your money through an insurance carrier so they can take out their profit margin and forward payment to your health care provider. In this case, the justification for the insurance company is naught and they just go out of business, leaving no discriminatory pricing privacy concerns.

Unfortunately, we don’t exist at either extreme but rather somewhere in the middle. Insurance companies know something about us (our gender, our age, etc) but don’t have perfect knowledge and if they had perfect knowledge, they don’t have perfect predictability. However, both options remain in terms of being privacy protective: provide insurance without any discrimination or eliminate insurance companies from them equation. As long s you try to play somewhere in the middle, it will be a struggle of the needs/desires of insurance carrier to acquire information about the insured and the desire of the insured to hide the information. Of course, you also run into the problem of adverse selection. Those with pre-existing conditions or risky markers want the insurance company ignorant but those who are healthy want to provide that information to insurance companies for a beneficial rate reduction. Then the presumption by the insurance company is that if you don’t proved you’re not a risk, you’re assumed to be in the higher risk category thus obscuring the need of the insured to hide that information.

Stepping outside the privacy issues, from a health policy debate, what distinguishes the conditions of egalitarian insurance (whereby everyone pays the same rates) and eliminate of insurance (whereby everybody pays for their own heath). Many people favor a social policy whereby people are not subject to the lottery of life. In other words, the fact that you were born with a propensity for some ailment shouldn’t matter: we as a society should step up to the plate and help out those who lost. Fewer people are as sympathetic to those whose adverse health is caused by behavior (such as smoking). If we agree as a society to bear the burden of the former but not the latter, the question moves to how and what about when the former causes the latter. Is a propensity for addictive behavior a sympathetic enough condition that perhaps we should pay for lung cancer treatment of smokers? Or what about the case where lack of action (failure to get regular checkups) results in catching cancer at later and more costly to cure stages? Is that the sympathetic genetic condition or the behavioral failure to get checked up and thus who bears the burden of payment? Do we then reach a society that is even MORE privacy invasive because we must monitor your actions to make sure you aren’t costing society more that the expected cost because of your actions (fining you for not getting an annual check up, monitoring your urine to make sure you ate your daily vegetables and drank enough water)?

I think there is another problem to the ignorant insurance company scenario and that’s one of a failure of third party payers being able to keep health care costs in check. If no one is questioning whether a particular test is necessary, the incentive for the doctor is to order it. No harm, no foul, more money in their pockets. This could be over-come by giving money directly to the insured based on their known ailments and let them allocate it to their health care according to the expected costs. We just have to be fine with knowing some may buy beer and cigarettes rather than pay for a proctology exam.

Ultimately, I think a solution is achievable IF (and it’s a big IF) we can agree no the principles we want our society to adhere to and not get stuck in the weeds arguing the minutia.

Privacy Policy: Disclosure to Law Enforcement

Edith Ramirez, FTC Chair, speaks at the IAPP Global Summit about privacy. — Edith Ramirez, FTC Chair, speaks at the IAPP Global Summit.

At the IAPP Global Summit in Washington, D.C. which just ended, I didn’t get a chance to ask my question of newly appointed FTC Chair Edith Ramirez. She had only been in office 5 days and privacy is at the top of her agenda. She had previously been scheduled for a Q&A but because of her new appointment, the questions were posed by a moderator and the audience was not allowed to participate.

Had I been given the opportunity, I would have asked the following question. Would the FTC consider action against a company for violating the unfair and deceptive practices act if it turned over information to a government agency in violation with the company’s stated privacy policy? Or is such an enforcement action verboten.

I suspect I know the answer.

To date, to my knowledge, the FTC has never made such a complaint against a company. However, the potential is there.

I would like to examine two different clauses from privacy statements and their particular risks to users of those services. Here is one common clause I’ve found in many privacy statements.

We may disclose any subscriber information to law enforcement agencies without further consent or notification to the subscriber upon lawful request from such agencies. We will cooperate fully with law enforcement agencies.

Notice the phrase “lawful request.” Such a policy does not preclude the scenario where a law enforcement agency simply asks for the information, no subpoena, warrant or national security letter. The request is lawful. No law prohibits the agent from making the request and no law prevents the company from disclosing the information to anybody (except the FTC’s enforcement of the company’s own privacy statement). Could such a policy be deceptive? To the average consumer, the term lawful request seems to imply that the company will respond to legal requests such as the aforementioned court recognized documents. However, to a lawyer, arguing before the FTC, the phrase could be read as I’ve described above, nothing unlawful, therefore the request was lawful. The clause could be a result of sloppy draftsmanship or crafty lawyering.

Contrast that to the pertinent section of Facebook Data Use Policy:

We may access, preserve and share your information in response to a legal request (like a search warrant, court order or subpoena) if we have a good faith belief that the law requires us to do so. This may include responding to legal requests from jurisdictions outside of the United States where we have a good faith belief that the response is required by law in that jurisdiction, affects users in that jurisdiction, and is consistent with internationally recognized standards.

Notice they include the requirement that they must have good faith belief that the law requires them to comply. Not that it allows them to comply, but requires them. This is a significant difference in function. Under the previous construction, they will comply with a request if the law allows them but under the Facebook policy, they will only comply if the law requires them. I also appreciate the out they provide themselves for international requests that it must be consistent with internationally recognized standards, possibly providing them a legal out to not enforce some dictator’s decree. However, it would be nice if it was stronger still.

Facebook and Real Names and social circle segmentation

At the IAPP Global Summit in Washington, D.C., Jules Polonetsky (@JulesPolonetsky) conducted a public discussion with Facebook Chief Privacy Officer Erin Egan. During the audience Q&A portion of the discussion, I posed two question: essentially what does Facebook do to ensure its developers are assuring that contextual clues help the Facebook audience know what information is being shared and with whom and secondly, why does Facebook insist on a real names policy despite the fact that there exist a clear minority of it’s audience that reject the idea.

I’ll save you an analysis of the response to the first question which essentially amounted to context is important and our developers know that. The response to the second question, though, bears further investigation. Erin answered, essentially, that its a means to encourage good community standards; that being anonymous or pseudononymous on the Internet leads (or allows) people to engage in behavior that they, shall we say, wouldn’t want their mother to see them doing. Jules chimed in that at AOL they saw rampant disregard for social norms due to the pseudononymous nature of that forum. I later approached Jules and suggested that, while the pseudononyms may play a role, another factor may have contributed more to AOL’s raucous nature. Unlike Facebook, AOL was primarily based on public forums (chat rooms, bulletin boards, groups). Facebook, though it has some of those features, is primarily based on private forums: private messages, postings on friend’s walls. The public forums do exists but they are largely an ancillary service to Facebook’s primary use (sharing old high school photos). I would put out the hypothesis that this is the major contributing factor to people being on their best behavior. If they are obnoxious, rude, crude, or otherwise inappropriate, users have the ability to ban those people from their private spaces (ignore their posts, unfriend them or block them). Even the public spaces generally have moderators that can remove unwanted visitors.

Facebook is perhaps the ultimate big data company. I would suggest Facebook researchers (they have those right?) do some data analysis on how many adverse reports they get of people in public spaces versus private spaces. Do users mostly avail themselves of self help (unfriending) or resort to reporting to Facebook? Of those complaints, how many of the users appear to be using pseudonyms and how many appear to be using real names? Inquiring minds want to know. If, as I suspect, the public spaces are much more rife with complaints and pseudononymous users, then perhaps Facebook could require real names for access to public content as opposed to the private spaces.

Many people have justifiable reasons not to use their real names. A one size fits all policy is not appropriate for a space of 1 billion users (*cough cough*). In the real world, while we use our real names, people engage in social circle segmentation. What I tell my doctor I don’t tell my neighbor. What information I give to my boss may be different than the picture I paint and my kid’s little league game. In those environments, context plays a role in allowing us to socially segment our acquaintances into circles of what we share. While concepts like Google Circles and Facebook Smart Lists allow people to segment their audiences in those platforms, this is often difficult and mentally taxing for people to do. Easier is to segment their friends either on different platform (Facebook for school friends, LinkedIn for professional contacts, Google for online friends, Twitter for ….well it varies by the person). Pseudonyms on platforms allows for a quick brain response of who am I right now and who is my audience. I don’t have to worry about my boss seeing the picture of my with the lampshade on my head at a party. Each of my social circles is in a nice distinct bucket. Just some food for thought, Facebook.

Predictive policing

At 7:15 this morning I was rudely awaken by a police SWAT team banging on the door. I’m currently in a cold northeastern city visiting a friend (whom I happened to take to the airport last night to fly to my home state of Florida). He offered to let me stay here for a few days until I return to D.C. It’s a great savings of a few hundred dollars in hotel nights and the solitude has given me an opportunity to concentrate on some much needed work. Of course, solitude is not exactly what I had this morning. First there was a knock. As I peered bleary eyed out the window to see if it was an obnoxious solicitor, the knock grew furious. “Police, Open Up” was the shout. I scurried towards the door in only underwear and a tshirt. I opened it to approximately 10-15 police officers in full gear (bullet proof vest, helmets, guns). I stated to the officer at the door (who clearly recognized that I wasn’t whom they were looking for) that I was a house guest. He showed me a picture and asked if I recognized the man and I said no. He gave me his card and ask me to have the resident (my friend) call him.

I passed the information on to my friend who called the detective and spoke at length. Apparently, this is not the first time his house had been visited by the police. The detective explained that the suspect, wanted in connection with a shooting, and his family were listing this address as theirs. My friend explained that he had been there for 3 months and the owner of the house, who previously lived there, had been there many years. The detective offered to email my friend the picture of the suspect and asked to be contacted if he saw him in the neighborhood.

My friend called me back to discuss the incident and we discussed in light of the book I had been reading the previous day while my friend was at the house. That book was Big Data by Viktor Mayer-Schonberger and Kenneth Cukier. Predictive policing, not quite like Minority Report, is the use of big data style analysis for policing. The concept is fairly straight-forword, amalgamate large amounts of information relevant to criminal behavior and find connections that were heretofore unidentifiable. While arrests won’t be made as a result of predictive policing, suspicious actors could be uncovered and scrutinized thereby improving the efficiency of the police department. The risk, however, is having innocent associations place certain members of the population under enhanced scrutiny while others commit crimes. In the old days, this was called profiling and while dispassionate data analysis could be beneficial in removing stereotypes and biases from policing, the risks remains of being caught in a associative bucket of bad guys. My friend, who innocently occupies an address picked by criminals, now potentially will be forever associated with them. Will his car be pulled over more often then not, as police hope to catch him in the act? What other subtle things will threaten his peaceable right to be let alone now? Will credit reporting agencies ding his credit score because he shared an address with a family of criminals?

My ex-girlfriend used to carry her social security card in her wallet, much to my dismay. I pleaded with her not to but her retort was that she had no credit history worthy of stealing so what was the risk? She had a somewhat legitimate need as her drivers license had a different name that her birth certificate, due to custody battle and judges decree when she was just a toddler. She used the SSN as an alternative proof of her name, when her license didn’t match. It is an unfortunately byproduct of living in a society that is hellbent on using identity as a means of security. But the risk to her, were clear. What happens when her identity is stolen for criminal purposes? Or when a criminal uses her identity to commit a violent crime and her name is now tied as an alias to that criminal? While law enforcement making contact with John Smith may do a double take before arresting him on an outstanding warrant, her unique name would not be so lucky.

While efficiencies in the competitive industry of ferreting out criminals is a goal worth pursuing, appropriate safeguards must be in place to not make unwarranted connections. Further, oppression, warrantless searches, identity tattoos (ala WWII germany) make policing efficient but that doesn’t make them ethical. Society must weigh the political repercussions before embarking on the use of big data in this realm.