Eric Khumalo, Data Scientist & Privacy Engineer
August, 2022
Privacy Engineering in the Real World

Earlier this year I was asked to speak at a privacy engineering conference about the role of a privacy engineer. This was a great opportunity to share how the field came to be and what we actually do. The conference recruited hundreds of speakers to talk about a lot of related topics. Thousands registered. However, weeks before the conference was scheduled to start, many participants pointed out privacy issues with the registration process, specifically referring to trackers embedded on the system.

At Good Research, we did early studies on internet trackers, publishing several articles and papers on them. One issue that came up repeatedly was significant numbers of trackers on websites. Why is this an issue? What are the implications? And what can privacy engineers do to address this?

In our work, many of the privacy issues we see are due to inexperience, ignorance, or in most cases, a lack of guidelines. This is encouraging because we can address these issues with models and examples to help inform communities about better practices. In fact, building these models and examples is part of the role of a privacy engineer. I don’t just clean up bad privacy behaviors or write cryptographic protocols. I am responsible for interpreting privacy and am empowered to architect solutions into a product before it gets shipped. Essentially, privacy engineers build the tools and practices to make privacy better.

Unfortunately, in the development space especially, we see that common toolkits for solving problems of functionality tend to not take into account potential issues of privacy, security, or ethics. The ability to easily pull together multiple third party code creates a situation where third parties are directly interacting with end users in ways that developers may not fully understand. For example, on a website adding a weather widget not only provides the weather, it also provides a means for the writer of the weather widget to track the users on that page.

As more and more functionality is provided by third parties, more and more control over the user experience and data gathering is given over to these third parties. Consequently, developers need to understand more about the third parties they are using, as well as how to detect when they are not acting in accordance with best practices. Equally important is for SDK developers to disclose what tracking it does in exchange for the functionality it provides. Ultimately a product can be thought of as a conversation between the users, developers and third parties, where everyone needs to be on the same page in order for people to know what they are getting and what the exchange is.

People reported that the privacy engineering conference registration website had a good amount of trackers. However, it’s not about the amount. Just one tracker that is misbehaving can be a problem. In fact, I’ll write all about trackers next month! At Good Research, we talk about building privacy solutions as aligning policies, practice and promises. In short, make it clear to your users what you promise to do; do what you say you are going to do; and adhere to both your internal principles and the laws and regulations.

When you’re collecting data, you should always ask, “Why do we need this? What will we do with it? How will we communicate that to our users?” Without addressing these questions, you risk losing trust and instead earn resentment. Resentment is an important design issue that is often overlooked. It is crucial to building a good product over time and is something that a good privacy engineer always considers. Resentment is a hidden killer of products, hidden by short term metrics that mistake high numbers for engagement rather than lack of a real choice.

With the conference, using a generic toolkit for registration packed with trackers pointed to several issues that could build resentment and erode trust. First is not knowing your audience. People signing up for a privacy conference expect to be given the ability to opt-in to information rather than forced to fill in info to sign up. Second, it's unclear if the registration questions are even necessary. Why did they need this information? Looking like a clear info grab is never good. In the best case, it is simply annoying. In the worst case, it turns off potential attendees and users, and speakers as well. An argument that often comes up in these situations is that “nothing is free.” Meaning, users should expect to trade some of their data for a free conference. If they are not comfortable with that, they don’t attend. This argument doesn’t fly.

Software today is like Legos: you pull together lots of pieces from third parties, gluing them together with custom code and logic to perform specific functions, and ultimately the third party tools play a major role. Third party SDKs provide a lot of functionality but are face-to-face with your users, making them vulnerable to whatever those third parties are doing without the proper checks in place.

When you need to register thousands of people for a conference that takes months to organize, the more you can use standard off-the-shelf software to get things like registration done, the more time you have to spend on other things. So it isn’t really a surprise that a large privacy engineering conference uses an off-the-shelf software package to do so. It is surprising though how privacy invasive it is, and is reflective of the kinds of challenges that privacy engineers face regularly.

A key privacy engineering task is to understand and manage third party software to help get the functionality needed while protecting users privacy. This requires audits of software as well as third party SDKs to be open about their practices, neither of which are common today. Because of this, sloppy practices abound, and registration pages for privacy conferences end up with too many trackers. The number of trackers is impressive and makes for good graphs, but it only takes one tracker acting out of turn to be problematic for a user’s privacy. We need to understand that, fundamentally, all networked software is potentially privacy invasive, and we can’t assume from its functionality however mundane it is that its privacy risk is equally so. This is one of the major challenges in privacy engineering, basically acknowledging that the problem exists so that you can apply the proper resources and expertise to fix it.

Thanks to Nathan Good, Will Monge, and Jessica Traynor.