CS 513 System Security -- Privacy Primer

Privacy Primer

Lecturer: Professor Fred B. Schneider

Lecture notes by Fred B. Schneider

For further reading, see:

Who goes there? Authentication Through the Lens of Privacy National Academy Press, Washington DC 2003.
Whitfield Diffie and Susan Landau. Privacy on the Line. The Politics of Wiretapping and Encryption. MIT Press, Cambridge, Mass, 1998.

Authentication is useful in connection with authorization, because permission to perform an operation so often depends on which principal is requesting that operation (or, more generally, on whose behalf that operation is being requested). However, the knowledge that an operation has been requested on behalf of some person could reveal things about that person---possibly even things that person would prefer to have kept secret. We defined privacy as the "right of an individual to decide for himself/herself on what terms his/her attributes should be revealed" and, therefore, we conclude that authentication is not necessarily privacy preserving.

In fact any program or service that processes data about a person could reveal information that person would wish to have kept secret. An authentication service is but one example, and privacy potentially should be a concern to the developers of many systems. This lecture---a primer on privacy---is intended for these developers. Specifically, we discuss

the historical, practical, and legal impetus for supporting privacy, and

guidelines and techniques for preserving a reasonable level of privacy in software systems.

Thus we cover not only a context for implementing privacy but also a set of concrete steps that most would accept as constituting a "good faith" effort at supporting privacy in a system.

The Right to Privacy

An obligation to protect individual privacy predates the existence and challenges of cyberspace. Over 2000 years ago, Hebrew Law imposed restrictions on erecting a structure opposite the windows of your neighbor's house; and the Talmud stated that a person should not look into his neighbors house. English common law (in 1603) restricted the crown from invading the privacy of subjects ("The house of every one is to him as his castle and fortress.") and also provided for the punishment of eavesdroppers on conversations.

The secrecy of Postal Mail in the U.S. has roots in the postal system the British government created in 1710 for the colonies [sic]; in 1825, the U.S. Congress asserted that prying into another person's (postal) mail is illegal, followed by an 1878 U.S. Supreme Court ruling that (even) the U.S. Government, which operates the postal service, requires a search warrant in order to open first class mail.

The U.S. Constitution nowhere explicitly mentions a "right to privacy". Scholars and the courts though do find elements of a "right to privacy" scattered throughout various of the amendments, as follows.

First Amendment: "Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances." Freedom of speech and of the press is seen as important to political discourse, because it allows positions to be discussed and refined without the specter of government, with its vested interest in a status quo, impeding discussion. Similarly, freedom of assembly is interpreted to include preserving secrecy of membership and meeting attendance lists; this too prevents fear of repercussion from affecting whether somebody participates in the debate (especially as an advocate of a perhaps unpopular view). These freedoms, in effect, stipulate that a speaker has control over certain content (speech, press, artifacts of assembly); or, equivalently, an individual can decide when and on what terms information is revealed (as our definition of privacy requires).

Third Amendment: "No Soldier shall, in time of peace be quartered in any house, without the consent of the Owner, nor in time of war, but in a manner to be prescribed by law." Harking back to English common law ("The house of every one is to him as his castle...") this amendment stipulates that at home one is free from observation by government military personnel. Presumably, the watchful eye of a live-in government soldier could have a chilling effect on what is said and done at home.

Fourth Amendment: "The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized." This right implies that certain recorded information can remain completely under your control (including being kept secret) unless there are grounds to obtain a search warrant. That the government must specify ("particularly describing the place to be searched, and the persons or things to be seized") is very important from a privacy viewpoint. It implies that a search warrant does not give the government freedom to do a blanket search. Presumably, search warrants are issued only when there is reason to believe a crime has been committed and evidence needs to be retrieved or protected from being destroyed. So again the individual (for the most part) can decide when and on what terms information is revealed.

Fifth Amendment: "No person shall be held to answer for a capital, or otherwise infamous crime, unless on a presentment or indictment of a Grand Jury, except in cases arising in the land or naval forces, or in the Militia, when in actual service in time of War or public danger; nor shall any person be subject for the same offense to be twice put in jeopardy of life or limb; nor shall be compelled in any criminal case to be a witness against himself, nor be deprived of life, liberty, or property, without due process of law; nor shall private property be taken for public use, without just compensation." One's innermost thoughts and memories are now protected from being revealed in response to the traditional levers of interrogation---prison, confiscation of property, etc.

Ninth Amendment: "The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people." So the people (and not the government) have any rights not enumerated in the Constitution. The Constitution's silence on matters that we might include in a "right to privacy" thus is interpreted as not prohibiting those rights.

Fourteenth Amendment: "(1) ... No State shall make or enforce any law which shall abridge the privileges or immunities of citizens of the United States; nor shall any State deprive any person of life, liberty, or property, without due process of law; ..." This is often interpreted to say that as long as those rights don't impinge on another individual's rights, the rights of the individual (including privacy) must outweigh those of the state, and the Due Process clause of the 14th Amendment guarantees that.

There is, for sure, a tension between a "right to privacy" and society's desire to discourage illegal activity by successfully prosecuting crime, since success in prosecution usually requires thorough investigation (which by its nature must supersede the privacy rights of the individual to decide on whether information about him/her is revealed). This tension is typically resolved differently in the U.S. from other countries, reflecting different views of privacy. Europeans, for example, tend to believe your data remains yours, even after you have released it to somebody else; Americans tend to believe that whoever owns data has a right to disseminate it.

Moreover, at any given time, different countries and even different communities within a country may resolve this tension between the individual and society differently, selecting different points in what amounts to spectrum of possibilities. Life changed rather significantly, for example, in the U.S. after the 9/11 terrorist attacks: mandatory authentication became required for passengers on airplanes, and various individual rights about controlling information were relinquished (e.g. a new legal process to allow the government to learn what books you check out of a library) in the name of facilitating investigations to anticipate future terrorist activities.

Finally, note that the definition of privacy used in this lecture, while intended and well suited for cyberspace, is decidedly information-centric, hence narrow. Privacy is seen by most as being far broader. Some argue that privacy encompasses a right to autonomy, including rights to solitude, intimacy, and autonomy. The United Nations 1948 Universal Declaration of Human Rights stipulates a right of privacy in article 12:

"No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks."

Guidelines for Privacy in Cyberspace

Various sets of guidelines have been proposed and adopted by national and international bodies. These guidelines outline obligations that ensure a software system does not compromise an individual's right of privacy. They are rarely codified as laws (though they certainly do inform the content of legislation and the interpretation of existing law), so a developer's obligation to satisfy the guidelines is ("only") a moral one.

One of the better known sets of guidelines is the one adopted by the Organization for Economic Cooperation and Development (OECD), with its instantiation as the U.S. Fair Information Principles and Practices (FIPP). These principles refine into a set of obligations what it means for an individual to control what and how information is revealed; each obligation concerns some dimension of a system, its relationship with the individual, or its relationship with other systems.

Collection Limitation:: Collect the minimum amount of information that is needed for the relationship or transaction at issue. Do the collection by lawful and fair means, with the knowledge and consent of the individual.
Data Quality:: Any information that is stored should be relevant, accurate, timely, and complete.
Purpose Specification:: The uses to which data will be put should be specified at the time the data are collected.
Use Limitation:: Data should be used only for the purposes given in the Purpose Specification (and for which they individual understands they will be used), except under 2 conditions: (i) with prior consent of the individual, and (ii) with the appropriate legal authority.
Security:: The information should be maintained in a way to ensure against loss, destruction, unauthorized access, modification, unauthorized use, or disclosure.
Openness and Notice:: People should be able to ascertain the existence of data systems and their purposes and uses. There should be no secret data systems.
Individual Participation:: An individual has a right to

know if he/she is a subject of the system,

access information about himself,

challenge the quality of that information, and

correct and amend that information.
Accountability:: The organization collecting and using the information can be held responsible for abiding by these principles through enforcement and/or redress.

Some Developer Guidelines

Besides the legal, moral, or ethical reasons for enforcing privacy rights, there are also good business reasons for doing so. Ignoring privacy erodes customer trust and exposes the developer to negative press. A good underlying strategy, then, is:

Collect personal information only if there is a compelling business and customer value proposition.

Here, we consider personal information to be

anything collected directly from a user,
anything about a user that is gathered indirectly (e.g, meta-data in documents)
any data about a user's usage behavior (e.g., logs, history of edits and revisions, etc), and
any data relating to the user's system (e.g., system configuration, IP address, etc).

The different kinds of personal information warrant different treatments. It is worthwhile to distinguish between:

Anonymous Data: Data that has no connection to an individual. In the absence of other information, anonymous data has no intrinsic link to a user and, therefore, special care is not required in managing this information. Hair color is an example.

Personally Identifiable Information (PII): This includes identifiers that can be used directly or to derive information that allows identifying, contacting, or locating the person to whom the information pertains. Name, social security number, full telephone number are examples.

Sensitive PII: PII that is subject to abuse and therefore special means should be employed for protection. Examples include credit card numbers (financial abuse), ethnic heritage or sexual orientation (a basis of discrimination), mother's maiden name (useful for identity theft).

Hidden Information: Information that is not visible to a user in all views is problematic. The Principle of Openness and Notice dictates that users be made aware that this information exists and must be given appropriate control over sharing it. For example, "cookies" are hidden information because they are downloaded to a web browser without user oversight, stored by the web browser, and later up-loaded back to a server (again without user oversight); another example is illustrated by the Microsoft Office suite, where when somebody copies a graph produced by Excel into a powerpoint slide, although only the graph might be visible on the slide, the entire spreadsheet is actually copied and available for reading by anyone with access to the slide.

The Principle of Collection Limitation says that information should be collected only with the consent of users. Not collecting information is obviously best, since no consent would then be needed. So give very careful thought about whether that piece of information is really needed before you build software to collect it. And instead of viewing the question from your perspective as a software developer, put yourself in the shoes of a paranoid privacy advocate who isn't paid by your employer.

When information must be collected, three general modes of consent exist:

opt-in: Here the user needs to take a specific action (e.g., checking a box) to signify his/her consent to collection. Thus, the default---no action by the user---is to withhold consent.

opt-out: Here the user needs to take a specific action (e.g., checking a box) to withdraw his/her consent to collection. Thus, the default---no action by the user---is to give consent.

implicit consent: Here, consent is implicitly granted as part of performing some other action. Thus, the default is to give consent and, moreover, the user might not be aware that consent is being given. Visiting a URL is an example. The very act of visiting a web page conveys certain information about the user (e.g., cookies, a return IP address, etc) to the web server.

Opt-in consent should be preferred, because it increases the chances that consent is being given in thoughtful and deliberate way (even knowing that many users click "yes" to anything, which is neither thoughtful nor deliberate).

The Principle of Collection Limitation also says that users should be given notice about what information is being collected. Here, we can distinguish between, on the one hand, prominent notice, which is designed to catch the users attention, and, on the other hand, discoverable notice, where the user may have to take actions in order to find the notice. In theory, prominent notice should be preferred but it is not difficult to imagine settings where (in practice) prominent notice would become a nuisance.

There is also the matter of timing about when a notice is to be displayed. With just in time display, notice is given when the data is about to be collected. This is useful if the collection act itself is not predictable (e.g., collecting data in response to a system crash) and it gives the user an opportunity to examine exactly which data is being collected. Under first-run disclosure, the notice is displayed either the first time a user runs the program or when the program is first started up. An opportunity is afforded here to paint a broad picture of data collection choices in the context of what the program will do. However, a user who has not much experience with a program might not understand the ramifications of choices that are made at this point. Finally, in some settings it is sensible to provide installation-time disclosure. If the system installer is different from the system user, then this means the user does not see the disclosure---a problem, unless the system administrator is authorized to make privacy choices on behalf of all users (as is frequently the case in a corporate setting).

Authentication and Privacy

If we lived in a world with exactly one form of identification, then all actions by an individual could be linked and privacy would be quite limited. One way to prevent the correlation of actions is by associating multiple identities with each person. We each might (and most of us do) carry multiple different credit cards, membership cards, etc. Each card gives a different number for our identity, and that number is what is employed to track our actions. Use of just one of these forms of identification prevents the agency with which we are interacting from associating an action with those actions we might have taken using a different form of identification. The member of the "NRA" is never seen by that organization to also be a member of the "Pacifists Club", for example.

Keeping the various separate identities uncorrelated is important for supporting privacy but, unfortunately, it is becoming more and more difficult to do. For one thing, it is not all that convenient to have to carry multiple cards or to remember all those different identifiers. And many people find the idea of carrying fewer identity cards an appealing proposition, despite the reductions in privacy it implies.

In addition, the current drive for fully-mediated access brings an increase in the need for and prevalence of authentication systems. These authentication systems erode privacy by compelling individuals to reveal ever more about what they do and when. The desire by business to collect PII cheaply and unobtrusively (for marketing and advertising), leveraging the prevalence of authentication systems, coupled with the resilience of digital information fosters the increased collection and retention of PII. And governments too are under presure to streamline their interactions and reduce costs; linked PII collection provides a means to respond.

As a system developer, you may well be caught in these currents. Be mindful that adding an authentication system is not without risks to privacy. Ponder the possibilities of:

Covert identification where the presence of your authentication system make it possible to identify an individual without that individual's consent or knowledge. For example, some establishments use a driver's license to validate a customer's age but in so doing they might also learn the customers hair color, height, weight, etc.
Chilling effects because the authentication system actually becomes a degenerate form of authorization and thus creates an (unintended) opportunity for social control.
Over-collection of PII, which now becomes possible because the cost of collecting information and correlating is no longer high, so this activity becomes more likely.

Unnecessary linking of one identifier to another, which then allows anyone with knowledge of either identifier to access the information bound to either and also allows correlations to be computed.

Failure to retire information, since information that has been deleted is no longer is capable of compromising privacy.