Mind the Gap ... between privacy laws killing your data pipelines, visiting IAPP and how HTTP is privacy's nemesis

Mind the Gap ... between privacy laws killing your data pipelines,  visiting IAPP and how HTTP is privacy's nemesis
Photo by Mikel Parera / Unsplash

New week, new showdown! And once again it was an interesting week for navigating the stormy waters between Privacy Island and Engineering Peninsula. 🗺

Today's core idea: the fabric of the internet is privacy's main issue

It must have been some 12 years ago when I visited CERN, the birth place of the internet. We were camping nearby and went for a "quick tour". We ended up visiting 3 days in a row to tour the facility and its museum. It really is holy ground to folks like myself, but while standing in front of the birth certificate of the internet I could not imagine it would become privacy's number one issue later on. Read on to learn how I believe that relates to CNIL's recent explanation of why Google Analytics is still illegal.

On the topic of visits: my Buddy Bart was at the IAPP Privacy forum and brought some interesting points back to the office. Furthermore, I came across a previously overlooked item from Protocol about the challenges privacy laws and privacy-by-design bring to your data pipelines - which of course is just our cup of tea! Finally, we get a bit more technical with Mongo introducing queryable encryption.

Have fun reading, and I'm looking forward to discuss these topics over an IRL coffee or beer. Please reach out!

Best and thanks,

-Pim from STRM Privacy


Takeaways from the IAPP Privacy Forum in The Hague

Meeting people IRL is more fun than TCP/IP packets put together as Zoom or Teams calls. That might be just my preference, but given the attendance my buddy Bart ran into at the IAPP Privacy Intensive in the Hague, many Dutch Privacy People agree.

Reading his texts and summaries it was an interesting two days. So how, 4 years after GDPR, is privacy structured in organisations? Is privacy engineering about Engineering, or just about engineering Privacy Policies?

Discuss it on LinkedIn in Bart's write-up 👇

Bart Voorn on LinkedIn: #GDPR #privacybydesign #privacypros
Over the past days, I’ve been attending a privacy conference in The Hague. Key takeaways in a semi 🧶🧵: [1] Quite a space. In the Netherlands, we apparently...

Told You So: privacy by design laws will kill your data pipelines

A car is totaled when the cost to repair it exceeds its total value. By that logic, Privacy by Design legislation could soon be totaling data pipelines at some of the most powerful tech companies.
Privacy by Design laws will kill your data pipelines
The legislation could make old data pipelines more trouble than they’re worth.

I'm not sure how I missed this article before, but in this interesting piece over at Protocol, Hirsh Chitkara dives into the implications of privacy regulations and privacy by design principles for data pipelines (and the applications that consume them).

His point: it will be so costly and rough to port existing data fabrics to the age of Privacy by Design, you might as well just start over.

While the link is not an advertorial, it simply reads as a background story to STRM's thesis. We are more optimistic about the possibility of an evolutionary shift from existing architectures. We wrote and presented some thoughts on how to add these principles to existing (large scale) data systems.

“Because [privacy] was never a consideration to begin with, now it’s increasingly difficult to untangle things.”

Privacy streams, anyone?

Technical privacy #1: Mongo adds (and open sources) queryable encryption, so you can search data without exposing it.

A Long-Awaited Defense Against Data Leaks May Have Just Arrived
MongoDB claims its new “Queryable Encryption” lets users search their databases while sensitive data stays encrypted. Oh, and its cryptography is open source.

We've argued before privacy in practice is a balancing act. And as such, some trade-off between the data you have and the data you can use (or better: the purposes for which you can use that data) is inevitable.

In a new release, MongoDB presents a solution that brings together the utility of data with important security (and privacy) considerations: queryable encryption.

It's basically exactly what it says: a way to store encrypted data that you can still search through. Wired explains how that works and how you can use it. The article emphasises the security argument more than privacy, but it's easy to see how this benefits operational privacy as well - for instance in operations like counts for analytics (and believe me - tons of "advanced" analytics relies on simple counts of a lot of metrics!).

I'm checking with our engineering team when that MongoDB connector for STRM is ready. Reach out if you can use it already 😉


I'm getting a bit annoyed with yet another item in this newsletter about Google (although that symmetrical consent was good news), but we cannot skip it as it is so essential.

The French Data Protection Authority CNIL explained GA is still illegal despite proposed changes by Big G.

Of course a lot of organisations run Google Analytics ("GA"), and that alone makes this latest explanation by the French DPA impactful. But there's a deeper issue.

Google is dominant, exposes a lot of "free" services, and through that network are able to observe, link and identify almost everyone's behavior online. It is perhaps at its core more of an anti-trust than a privacy issue.

The essential concern in the rulings and perspective shared by the CNIL is in the fabric of the internet. And it's not cookies as you might expect. It's IP addresses.

The fabric of the internet is privacy's main issue

The HTTP protocol (the set of rules defining how computers can exchange information over the public internet) simply exchange a machine's location for proper working. And they want to know who they are talking to: "I'm A and I would like to request resource B from you".

With more and more personal devices, machine ID's are increasingly 1:1 personal identifiers. And that's a problem: behavior (hits from an IP address) in a set of services that belongs to one mothership can be linked together through the use of the IP. And with maps, ads, fonts etc it's hard to NOT come across a Google service on any given webpage.

There are solutions, like proxying: a network of machines passing on requests to obfuscate the original source. But those are not consumer-default choices and it requires (a) understanding of this issue and (b) high computer literacy plus (c) a willingness to face a user experience nightmare for the sake of privacy.

So we had HTTP/1, HTTP/2 and we are already at HTTP/3. But the IP remains an essential element.

Is it perhaps time for HTTP/4nonymous?

French data protection watchdog: Tweaking Google Analytics won’t make it legal
Google Analytics’ use is not legal without a new deal that would replace the disgraced EU-US data processing agreement, French data watchdog CNIL recently clarified on its website, which also dashed hopes that the tool could be reconfigured to allow data transfers to the US.

And that's it for this week!

Thanks for reading us, make sure to subscribe and let us know what you think of Mind the Gap.

Photo by Junseong Lee on Unsplash

STRM is privacy by design for data without the compromise