How to Create Google Analytics 4 Custom Channel Grouping with BigQuery

2023-01-26 | Article | Insights

As we know, the default channel grouping is a predefined set of channels created by Google Analytics to categorise website traffic. Whereas custom channel grouping allows us to segment website traffic based on individual business goals and objectives. These custom channel groups can help us understand the performance of different marketing channels and eventually make informed decisions about our marketing strategy. Custom channel grouping is not yet a feature in GA4, but we can simulate it via BigQuery. In doing so, we do not need to change our custom UTM parameters setup. For the majority of us, the definition of UTM parameters is not an easy process. When we reach a clear structure, it's not a good idea to change it because the GA4 does not yet support Custom Channel Grouping (traffic goes to groups we don't want). By creating custom channels, we can set the groups as we want for GA4, and we can keep the structure as it is without affecting the UA that potentially works in parallel.

Before we create custom channel groupings in BigQuery, a little background knowledge on session-level acquisition channels is required. In Google Analytics 4, the traffic acquisition parameters are used to identify the source and medium of a user's first session. These parameters are stored in the trafficSource field in the ga_sessions_* table in BigQuery. Please note that this information is related only to the first session of the user; if a user arrives at our website via multiple sources, their first source and medium touchpoint will be stored in the trafficSource field, rather than the source and medium of each session. But the good news is that we can unnest the event parameters to capture additional information about user interactions, including the source and medium of the corresponding session.

With this in mind, let's start writing the query. Within the first subquery of the following script we unnest the event_params with key as source and medium to access these fields but since we have unnested them, we would need to regroup them by user_pseudo_id and session_id using the MAX function. So far we have written a subquery that produces the session-level source and medium which we can use as a basis to create our custom channel grouping in the next step. Now we can use the CASE statement to create the custom groupings based on source and medium. For example, in the second part of the query we have redefined and cleaned the direct, paid search, social paid, organic, email, affiliate, and the rest of the channels are grouped under ‘other’ . These rules are created based on the manual traffic grouping defined in GA help centre page on channel grouping.

WITH
base AS (
  SELECT
  DISTINCT user_pseudo_id,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id') AS session_id,
  MAX((SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'source')) AS source,
  MAX((SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'medium')) AS medium,
  FROM
  -- Replace the following table with your own data
  `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
  WHERE
  -- Please adjust the following time range to your own needs
  _table_suffix BETWEEN 'start_date' AND 'end_date'
  GROUP BY
  1,2
)
SELECT
user_pseudo_id,
session_id,
source,
medium,
-- Feel free to adjust the following grouping to your own needs
CASE
WHEN (source='(direct)'OR source IS NULL) AND (REGEXP_CONTAINS(medium, r'^(\(not set\)|\(none\))$') OR medium IS NULL) THEN 'direct'
WHEN REGEXP_CONTAINS(source, r'^(google|bing)$') AND REGEXP_CONTAINS (medium, r'^(.*cp.*|ppc|paid.*)$') THEN 'paid_search'
WHEN REGEXP_CONTAINS(source,r'^(twitter|facebook|fb|instagram|ig|linkedin|pinterest)$') AND REGEXP_CONTAINS(medium, r'^(.*cp.*|ppc|paid.*)$') THEN 'social_paid'
WHEN medium = 'organic' THEN 'organic_search'
WHEN REGEXP_CONTAINS(source, r'^(email|e-mail|e_mail|e mail)$') OR REGEXP_CONTAINS(medium, r'^(email|e-mail|e_mail|e mail)$') THEN 'email'
WHEN REGEXP_CONTAINS(medium, r'^(affiliate|affiliatesl)$') THEN 'affiliate'
WHEN medium = 'referral' THEN 'referral'
ELSE 'other'
END AS custom_channel_grouping
FROM
base


When applying custom channel grouping for your specific needs and goals, here are some tips to keep in mind:

  1. Be specific: Creating specific custom channel groupings allows us to understand the performance of different traffic sources on a granular level. For example, instead of grouping all small traffic to ‘other’, create separate groupings for each channel.
  2. Use a consistent naming convention: Consistently naming the custom channel groupings will keep the data clean and consistent and smoothens the further understanding and analysis of data.
  3. Keep it simple: It is recommended to keep the number of custom channel groupings to a minimum. Having too many custom channel groupings can make it difficult to analyse the data, especially when it comes to evaluating the marketing channels.

It is vital to review and adjust custom channel groupings regularly, to ensure they continue to align with the business goals and provide meaningful insights.

Do you need more Info?

Contact