Privacy engineering is a relatively new field. People working in privacy have tended to be in legal and
policy and until recently, haven’t directly worked on designing, building, and shipping products.
Privacy engineers are an important part of implementing “privacy by design.” We are responsible for
tackling privacy issues early in the software development lifecycle. But there’s far more to being a
privacy engineer. Privacy is not just a technical problem. It’s nuanced and contextual and
I have written about my
education in privacy engineering and how I’ve applied these
my teaching. Being a privacy engineer is a wide ranging role that I didn’t learn about in
school. I am now beginning to understand how much interpretation is required, what questions to ask, and
when I need to draw from other disciplines. After working in the field for more than two years, I have
summarized what I do as a privacy engineer into three categories: classify, contextualize, and
At Good Research one of the services we offer is to analyze websites and mobile apps looking for “bad”
or unexpected behavior regarding personal or sensitive data. To describe these three C’s in more detail,
I’ll use the example of when I am given a dataset of network traffic generated from a mobile app with
the goal of identifying whether there are potential privacy violations.
Classifying involves organizing and categorizing the data well enough to get an initial understanding of
what information has been exchanged between the app and other parties. I run some scripts and use other
tools to group and classify what I’m seeing. While classification does inherently require some
interpretation, there isn't a lot in this step. I'm not assigning a value (e.g. dangerous, useful) or
flagging a behavior (e.g. unexpected, privacy violation). The main goal of this step is to group things
together, like all of the transmissions that went to the same place, or all of the messages that weren’t
encrypted. Essentially, I’m trying to figure out who receives
what information. I do need to decide what
information matters, depending on why I’m looking at this dataset in the first place, so there is some
interpretation but more on that in a later post.
Once I have a decent understanding of who is getting what information, I add context. I find out why or
why not this data is needed. For example, if I observe a third party receiving personal information,
then I need to know more about both the information and the third party. As you can imagine, the third
party might have a perfectly legitimate reason for receiving that data. But out of context that might
not be so obvious. Alternatively, if I am assessing whether an app is sending geolocation information, I
can look for something like latitude and longitude, but there are many ways to figure out where an app
is being used. Three different pieces of information individually might not reveal location, but
together, they can. For example, information like wifi network name, signal strength, and other nuanced
identifiers can be pieced together and matched against an already existing catalog to identify the
location of the device, sometimes with surprising accuracy.
At other times I look for potential instances of
bridging, which is the practice of linking multiple
identifiers for the same user on different devices and across multiple apps. These identifiers are not
only collected and shared but also bridged or linked together by the developer or a third-party so as to
know that they belong to the same individual. ID bridging permits tracking across apps, across devices,
and over time to allow third parties to assemble comprehensive advertising profiles, often without the
user’s consent. Developers and third parties use a variety of identifiers, and I need to know which ones
are persistent and which ones are changeable, and who receives them and under what conditions. Yes,
Communication is the hardest part of being a privacy engineer. Classifying the data is relatively
straightforward. Adding context is more subjective. However, communication relies a lot on my
interpretations. I often have to share my findings with a nontechnical audience and make sure all the
nuance, complexities, and context are clear and accessible. Before writing a report or preparing a
presentation, I ask: who is the audience, what is their goal, and what are my main points. Importantly,
once delivered, I want to make sure they understand and that I can address their questions and comments.
Despite how this process sounds, it’s not linear. After adding context, I almost always go back and
reclassify the data. And, I rarely feel like the contextualize step is totally complete. There’s never
enough context! Finally, good communication requires a feedback loop.
Privacy engineering draws from many different disciplines. I use my experience and refer to
conversations, blogs, and many other sources to understand whether what I am observing is expected
judgment needed and as I continue to learn more, I will be sure to share with you!
Thanks to Cassia Artanegara and Will Monge.