Digitl | Personal Data: Anonymization & Pseudonymization

Personal Data: About Data Anonymization and Pseudonymization

2022-10-24 | Article | Insights

Summary

Continuing our data privacy series, this article focuses on some of the most important methods for de-identifying personal data, namely data anonymization and pseudonymization. Along with an overview of the different techniques, which were previously introduced in our GDPR Compliance Checklist, this article describes how these techniques come into play when using Google Analytics, for a more privacy-safe and future-proof setup.

Recap: What is personal data?

“[Personal data] means any information relating to an identified or identifiable natural person (‘data subject’)”(Article 3 (1) of Regulation (EU) 2018/1725). Different pieces of information, which – collected together – can lead to the identification of a particular individual, also constitute personal data. In other words, individuals may either be directly identified from the information in question or may be indirectly identified from that information in combination with other information.

Examples of personal data include:

a name and surname;
a home address;
an email address in the form of name.surname@company.com;
an Internet Protocol (IP) address;
the advertising identifier of your phone

Anonymization

According to Rectical 26 of the GDPR, personal data that has been rendered anonymous so that the individual is either not identified or no longer identifiable is consequently no longer considered personal data. For data to be truly anonymized, the anonymization must be irreversible (see Figure 1).

In the process of anonymization, personal identifiers, which could directly or indirectly lead to an individual being identified, are removed. Once data is truly anonymized and individuals are no longer identifiable, the data will not fall within the scope of the GDPR anymore and it becomes safer to use.

While there may be incentives to process certain data in anonymized form, this method may devalue the data, so that it is no longer useful for some purposes. Therefore, before collecting and anonymizing specific pieces of information, consider the purpose for which the data is collected in the first place.

Pseudonymization

As stated in Article 4(5) of the GDPR, “[Pseudonymization] means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person”.

In order to pseudonymize a data record that includes personal information, identifying fields within that data record are replaced by one or more artificial identifiers or pseudonyms. There can be a single pseudonym for a collection of replaced fields or a pseudonym per replaced field. Additional information that would allow this data to be attributed to a specific individual is usually a key file, in which the pseudonymized data is linked to the personal data (see Figure 2).

Unlike anonymized data, a pseudonym is still considered personal data according to the GDPR as the process is reversible, and – given you have the right key – you can identify the individual. Recital 26 explains “…personal data which have undergone pseudonymization, which could be attributed to a natural person by the use of additional information, should be considered to be information on an identifiable natural person.” Hence, pseudonymized personal data can still fall within the scope of the GDPR.

Anonymization and Pseudonymization in Server-Side Tracking

Google Analytics 4 (GA4) provides a couple of features designed to help website owners better comply with GDPR requirements. Moreover, if you are already using the Server GTM (Google Tag Manager), i.e. your Tag Manager is running in a server-side environment, there are additional setting options you may enable in order to de-identify personal data.

With an sGTM setup, you are creating an endpoint in a server environment that you own. It acts like a proxy between the hits sent from browsers and devices and the actual endpoints to which the hits are collected. Here, anonymization and pseudonymization can be applied to manipulate personal data before it is sent to your Google Analytics endpoint.

Under GDPR, IP addresses are considered personal data. GA4, however, anonymizes the IP address, so the full IP address, which could identify an individual, is never logged or stored. This is always enabled, so no manual action is required. IP anonymization in GA4 means sending a visitor’s IP address to Google Analytics servers by replacing the last octet of the IPv4 address (the last 80 bits for IPv6). Here is an example: The IP address 12.314.21.154 would be anonymized to 12.314.21.0. This way, basic geolocation information can still be derived from the remaining part of the IP address, even if it is less accurate.

Yet, with the sGTM you may also redact the entire IP address. While anonymization is the process of turning data into a form that does not identify a specific individual, and where identification is unlikely, redaction refers to the obscuring of all or part of the data. If you choose to redact the visitor’s IP address in the sGTM, the entire IP address will be removed from the hit sent to GA4 altogether. In other words, the IP address will never reach the Google Analytics endpoint.

Furthermore, the sGTM lets you pseudonymize the Google client ID as another user identifier so that it can no longer be attributed to a specific individual without additional information, i.e. the “key”. Cleaning your campaign parameters or hiding the referrer are additional ways in which you can protect sensitive data in Server-Side Tracking.