Privacy Primer
Lecturer: Professor Fred B. Schneider
Lecture notes by Fred B. Schneider
For further reading, see:
Authentication is useful in connection with authorization,
because permission to perform an operation so often depends on
which principal is requesting that operation (or, more generally, on whose behalf
that operation is being requested).
However, the knowledge that an operation has been requested on behalf of
some person could
reveal things about that person---possibly even things that person would
prefer to have kept secret.
We defined privacy as the "right of an individual to decide
for himself/herself on what terms his/her attributes should be revealed" and,
therefore, we conclude that authentication is not necessarily
privacy preserving.
In fact any program or service that processes
data about a person could reveal information
that person would wish to have kept secret.
An authentication service is but one example,
and privacy potentially should be a concern to the developers of many systems.
This lecture---a primer on privacy---is intended for these developers.
Specifically, we discuss
- the historical, practical, and legal impetus
for supporting privacy, and
- guidelines and techniques for preserving a reasonable level of privacy
in software systems.
Thus we cover not only a context for implementing privacy but also
a set of concrete steps that most would accept as constituting a "good faith"
effort at supporting privacy in a system.
The Right to Privacy
An obligation to protect individual privacy
predates the existence and challenges of cyberspace.
Over 2000 years ago,
Hebrew Law imposed restrictions on erecting a structure
opposite the windows of your neighbor's house; and the Talmud
stated that a person should not look into his neighbors house.
English common law (in 1603) restricted the crown from invading the
privacy of subjects
("The house of every one is to him as his castle and fortress.")
and also provided for the punishment of eavesdroppers on conversations.
The secrecy of Postal Mail in the U.S. has roots in the postal system
the British government created in 1710 for the colonies [sic];
in 1825, the U.S. Congress asserted that prying into another person's (postal) mail
is illegal, followed by an 1878 U.S. Supreme Court ruling that (even)
the U.S. Government,
which operates the postal service,
requires a search warrant in order to open first class mail.
The
U.S. Constitution
nowhere explicitly mentions a "right to privacy".
Scholars and the courts though do find
elements of a "right to privacy" scattered throughout
various of the
amendments, as follows.
- First Amendment:
"Congress shall make no law respecting an establishment of religion, or
prohibiting the free exercise thereof; or abridging the freedom of
speech, or of the press; or the right of the people peaceably to
assemble, and to petition the Government for a redress of grievances."
Freedom of speech and
of the press is seen as important to political discourse, because
it allows positions to be discussed and refined without the specter of government,
with its vested interest in a status quo,
impeding discussion.
Similarly, freedom of assembly is interpreted to include preserving secrecy
of membership and meeting attendance lists;
this too prevents fear of repercussion from affecting whether somebody
participates in the debate
(especially as an advocate of a perhaps unpopular view).
These freedoms, in effect, stipulate that a speaker has control
over certain content
(speech, press, artifacts of assembly);
or, equivalently, an individual can decide when and on what
terms information is revealed
(as our definition of privacy requires).
- Third Amendment:
"No Soldier shall, in time of peace be quartered in any house, without
the consent of the Owner, nor in time of war, but in a manner to be
prescribed by law."
Harking back to English common law
("The house of every one is to him as his castle...")
this amendment stipulates that at home one is free from observation by government
military personnel.
Presumably, the watchful eye of a live-in government soldier could have a chilling
effect on what is said and done at home.
- Fourth Amendment:
"The right of the people to be secure in their persons, houses,
papers, and effects, against unreasonable searches and seizures, shall
not be violated, and no Warrants shall issue, but upon probable cause,
supported by Oath or affirmation, and particularly describing the
place to be searched, and the persons or things to be seized."
This right implies that certain recorded information can remain completely
under your control (including being kept secret)
unless there are grounds to obtain a search warrant.
That the government must specify ("particularly
describing the place to be searched, and the persons or things to be
seized") is very important from a privacy viewpoint.
It implies that a search warrant does not give the government freedom to do
a blanket search.
Presumably, search warrants are issued only when there is
reason to believe a crime has been committed and evidence needs to be retrieved
or protected from being destroyed.
So again the individual (for the most part) can decide when and on what
terms information is revealed.
- Fifth Amendment:
"No person shall be held to answer for a capital, or otherwise infamous
crime, unless on a presentment or indictment of a Grand Jury, except
in cases arising in the land or naval forces, or in the Militia, when
in actual service in time of War or public danger; nor shall any
person be subject for the same offense to be twice put in jeopardy of
life or limb; nor shall be compelled in any criminal case to be a
witness against himself, nor be deprived of life, liberty, or
property, without due process of law; nor shall private property be
taken for public use, without just compensation."
One's innermost thoughts and memories are now protected from being revealed
in response to
the traditional levers of interrogation---prison, confiscation of
property, etc.
- Ninth Amendment:
"The enumeration in the Constitution, of certain rights, shall not be
construed to deny or disparage others retained by the people."
So the people (and not the government) have any rights not enumerated in the
Constitution.
The Constitution's silence on matters that we might include in a "right to
privacy" thus is interpreted as not prohibiting those rights.
- Fourteenth Amendment:
"(1) ... No State shall make or enforce any law which shall abridge the
privileges or immunities of citizens of the United States; nor shall
any State deprive any person of life, liberty, or property, without
due process of law; ..."
This is often
interpreted
to say that as long as those rights don't impinge on
another individual's
rights, the rights of the individual (including privacy) must outweigh those of the state,
and the Due Process clause of the 14th Amendment guarantees that.
There is, for sure, a tension between a "right to privacy" and society's
desire to discourage illegal activity by successfully prosecuting crime,
since success in prosecution usually requires
thorough investigation (which by its nature must
supersede the privacy rights of the individual to decide on whether information about
him/her is revealed).
This tension is typically resolved differently in the U.S. from other countries,
reflecting different views of privacy.
Europeans, for example, tend to believe your data remains yours, even after you
have released it to somebody else; Americans tend to believe
that whoever owns data has a right to disseminate it.
Moreover, at any given time, different countries and even
different communities within a country may resolve this tension
between the individual and society
differently, selecting different points in what amounts to spectrum of possibilities.
Life changed rather significantly, for example, in the U.S. after the 9/11 terrorist
attacks:
mandatory authentication became required for passengers on airplanes,
and
various individual rights about controlling information were relinquished
(e.g. a new legal process to allow the government to learn what books
you check out of a library)
in the name of facilitating investigations to anticipate future
terrorist activities.
Finally, note that the definition of privacy used in this lecture,
while intended and well suited for cyberspace,
is decidedly information-centric, hence narrow.
Privacy is seen by most as being far broader.
Some argue that privacy encompasses a right to autonomy, including rights
to solitude, intimacy, and autonomy.
The United Nations
1948 Universal Declaration
of Human Rights stipulates a right of privacy in article 12:
"No one shall be subjected to arbitrary interference with his privacy,
family, home or correspondence, nor to attacks upon his honour and
reputation. Everyone has the right to the protection of the law
against such interference or attacks."
Guidelines for Privacy in Cyberspace
Various sets of guidelines have been proposed and adopted by national
and international bodies.
These guidelines outline obligations that ensure a software
system does not compromise an individual's right of privacy.
They are rarely codified as laws (though they certainly do inform the
content of legislation and the interpretation of existing law),
so a developer's obligation to satisfy the guidelines is ("only") a moral one.
One of the better known sets of guidelines is the
one
adopted by the
Organization for Economic Cooperation and Development
(OECD), with its instantiation as the
U.S.
Fair Information Principles and Practices (FIPP).
These principles refine into a set of obligations
what it means for an individual to control what and how information is revealed;
each obligation concerns some dimension of a system, its relationship with the individual,
or its relationship with other systems.
- Collection Limitation:
- Collect the minimum amount of information that is needed for the relationship
or transaction at issue.
Do the collection by lawful and fair means, with the knowledge and consent of the individual.
- Data Quality:
- Any information that is stored should be relevant, accurate, timely, and complete.
- Purpose Specification:
- The uses to which data will be put should be specified at the time the data are collected.
- Use Limitation:
- Data should be used only for the purposes given in the Purpose Specification (and for
which they individual understands they will be used), except under 2 conditions:
(i) with prior consent of the individual, and
(ii) with the appropriate legal authority.
- Security:
- The information should be maintained in a way to ensure against loss, destruction,
unauthorized access, modification, unauthorized use, or disclosure.
- Openness and Notice:
- People should be able to ascertain the existence of data systems and their purposes and
uses. There should be no secret data systems.
- Individual Participation:
- An individual has a right to
- know if he/she is a subject of the system,
- access information about himself,
- challenge the quality of that information, and
- correct and amend that information.
- Accountability:
- The organization collecting and using the information can be held
responsible for abiding by these principles through enforcement and/or redress.
Some Developer Guidelines
Besides the legal, moral, or ethical reasons for enforcing privacy rights,
there are also good business reasons for doing so.
Ignoring privacy erodes customer trust and
exposes the developer to negative press.
A good underlying strategy, then, is:
Collect personal information only if there is a compelling business and customer value
proposition.
Here, we consider personal information to be
- anything collected directly from a user,
- anything about a user that is gathered indirectly
(e.g, meta-data in documents)
- any data about a user's usage behavior
(e.g., logs, history of edits and revisions, etc), and
- any data relating to the user's system
(e.g., system configuration, IP address, etc).
The different kinds of personal information warrant different treatments.
It is worthwhile to distinguish between:
- Anonymous Data:
Data that has no connection to an individual.
In the absence of other information, anonymous data has no intrinsic link to a user and,
therefore, special care is not required in managing this information.
Hair color is an example.
- Personally Identifiable Information (PII):
This includes identifiers that can be used directly or to derive information
that allows identifying, contacting, or locating the person
to whom the information pertains.
Name, social security number, full telephone number are examples.
- Sensitive PII:
PII that is subject to abuse and therefore special means should be employed for protection.
Examples include credit card numbers (financial abuse),
ethnic heritage or sexual orientation (a basis of discrimination),
mother's maiden name (useful for identity theft).
- Hidden Information:
Information that is not visible to a user in all views is problematic.
The Principle of Openness and Notice dictates
that users be made aware
that this information exists and must be given appropriate control over sharing it.
For example, "cookies" are hidden information because they are downloaded to
a web browser without user oversight, stored by the web browser, and
later up-loaded back to a server (again without user oversight);
another example is illustrated by the Microsoft Office suite, where
when somebody copies a graph produced by Excel into a
powerpoint slide, although only the graph might be visible
on the slide, the entire
spreadsheet is actually copied and available for reading by anyone with
access to the slide.
The Principle of Collection Limitation says that information
should be collected only with the consent of users.
Not collecting information is obviously best, since no consent would then
be needed.
So give very careful thought about whether that piece of information is really
needed before you build software to collect it.
And instead of viewing the question from your perspective as a software
developer, put yourself in the shoes of a paranoid privacy advocate
who isn't paid by your employer.
When information must be collected, three general modes of consent exist:
- opt-in: Here the user needs to take a specific
action (e.g., checking a box)
to signify his/her consent to collection.
Thus, the default---no action by the user---is to withhold consent.
- opt-out: Here the user needs to take a specific
action (e.g., checking a box)
to withdraw his/her consent to collection.
Thus, the default---no action by the user---is to give consent.
- implicit consent: Here, consent is implicitly granted
as part of performing
some other action.
Thus, the default is to give consent and, moreover, the user might not be aware that
consent is being given.
Visiting a URL is an example.
The very act of visiting a web page conveys certain information about the user
(e.g., cookies, a return IP address, etc) to the web server.
Opt-in consent should be preferred, because it increases the chances that consent is
being given in thoughtful and deliberate way (even
knowing that many users click "yes" to anything,
which is neither thoughtful nor deliberate).
The Principle of Collection Limitation also says that
users should be given notice about what information is being collected.
Here, we can distinguish between, on the one hand,
prominent notice, which is designed to catch the users attention, and, on the other hand,
discoverable notice, where the user may have to take actions in order to find the notice.
In theory, prominent notice should be preferred but
it is not difficult to imagine settings
where (in practice) prominent notice would become a nuisance.
There is also the matter of timing about when a notice is to be displayed.
With just in time display, notice is given when the data is about to
be collected.
This is useful if the collection act itself
is not predictable (e.g., collecting data in response
to a system crash) and it gives the user an opportunity to examine exactly which data
is being collected.
Under first-run disclosure, the notice is displayed either the first time
a user runs the program or when the program is first started up.
An opportunity is afforded here
to paint a broad picture of data collection choices in the context
of what the program will do.
However, a user who has not much experience with a program might not understand
the ramifications of choices that are made at this point.
Finally, in some settings it is sensible to provide installation-time disclosure.
If the system installer is different from the system user, then this means the user
does not see the disclosure---a problem, unless the system administrator is
authorized to make privacy choices on behalf of all users (as is frequently the case
in a corporate setting).
Authentication and Privacy
If we lived in a world with exactly one form of identification, then
all actions by an individual could
be linked and privacy would be quite limited.
One way to prevent the correlation of actions is
by associating multiple identities with each person.
We each might (and most of us do)
carry multiple different credit cards, membership cards, etc.
Each card gives a different number for our identity,
and that number is what is employed to track our actions.
Use of just one of these forms of identification prevents the agency with
which we are interacting from associating
an action with those actions we might have taken using a different form of identification.
The member of the "NRA" is never seen by that organization
to also be a member of the "Pacifists Club", for example.
Keeping the various separate identities uncorrelated is important
for supporting privacy but, unfortunately, it is becoming more and more difficult
to do.
For one thing, it is not all that convenient to have to carry multiple cards
or to remember all those different identifiers.
And many people find the idea of carrying fewer identity cards
an appealing proposition, despite the reductions in privacy it implies.
In addition,
the current drive for fully-mediated access brings an increase in the
need for and prevalence of authentication systems.
These authentication systems
erode privacy by compelling individuals to reveal ever more
about what they do and when.
The desire by business to collect PII cheaply and unobtrusively
(for marketing and advertising), leveraging the prevalence of authentication
systems, coupled with the
resilience of digital information fosters the increased collection and retention
of PII.
And governments too are under presure to streamline their interactions and reduce costs;
linked PII collection provides a means to respond.
As a system developer, you may well be caught in these currents.
Be mindful that adding an authentication system is not without risks to privacy.
Ponder the possibilities of:
- Covert identification where the presence of your authentication
system make it possible to identify an individual without that individual's consent
or knowledge.
For example, some establishments use a driver's license to validate a customer's age
but in so doing they might also learn the customers hair color, height, weight, etc.
- Chilling effects because the authentication system actually becomes
a degenerate form of authorization and thus creates an (unintended) opportunity for social
control.
- Over-collection of PII, which now becomes possible
because the cost of collecting information
and correlating is no longer high, so this activity becomes more likely.
- Unnecessary linking of one identifier to another, which
then allows anyone with knowledge of either identifier to access the information
bound to either and also allows correlations to be computed.
- Failure to retire information, since information that has
been deleted is no longer is capable of compromising privacy.