Path analysis with GA4 data in BigQuery

2023-03-09 | Article | Insights

In this article we will focus on querying the user journey paths on our website or app. This query can provide many insights, such as identifying common paths or patterns, understanding the behaviour of different user segments, and optimising user journeys. We can also gain a good understanding of the role of individual channels within the user journey to inform both budget allocation on channels and the user experience, helping to optimise marketing and increase conversions in the future.

In the following query, we get the sequence of channels and timestamps per user. This can be extended to sequences of devices or filtered for paths that lead to a conversion goal. To get the timestamps of each touchpoint we will get the event timestamp of the session_start event in the first subquery. For getting the source and medium combination in the second subquery we will unnest the event parameters to capture the source and medium of the individual sessions (We have talked about how to query source and medium for individual sessions in this article ). These two tables are then joined together on the user id to bring timestamps of session starts and source/medium combinations in one table. In the last part of the query, we will build our sequences of timestamps and source/medium using an aggregation function called ‘STRING_AGG’ which takes all timestamps and source/medium combinations from a user and concatenates them into a single string sorted by ascending timestamp. See the transformation example below:

 WITH
#---get the session start timestamp
 t1 AS (
 SELECT
   DISTINCT user_pseudo_id,
   (
   SELECT
     value.int_value
   FROM
     UNNEST(event_params)
   WHERE
     KEY = 'ga_session_id') AS session_id,
   event_timestamp,
 FROM
   -- Replace the following table WITH your own data
   `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
 WHERE
   event_name='session_start' ),
 #---get the source/medium combination of each session
 t2 AS(
 SELECT
   DISTINCT user_pseudo_id AS user_pseudo_id_m,
   (
   SELECT
     value.int_value
   FROM
     UNNEST(event_params)
   WHERE
     KEY = 'ga_session_id') AS session_id_m,
   CONCAT( MAX((
       SELECT
         value.string_value
       FROM
         UNNEST(event_params)
       WHERE
         KEY = 'source')), ' / ', MAX((
       SELECT
         value.string_value
       FROM
         UNNEST(event_params)
       WHERE
         KEY = 'medium'))) AS source_medium,
 FROM
   -- Replace the following table WITH your own data
   `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
 GROUP BY
   1,
   2),
 #--- join session start timestamps and the channels of each session
t3 AS(
 SELECT
   t1.*,
   t2.source_medium
 FROM
   t1
 LEFT JOIN
   t2
 ON
   t1.user_pseudo_id=t2.user_pseudo_id_m
   AND t1.session_id=t2.session_id_m
 ORDER BY
   1,
   2,
   3)
#--- aggregate per user the timestamps and touchpoints in order
SELECT
 user_pseudo_id,
 STRING_AGG(CAST(event_timestamp AS STRING), ' > ' ORDER BY event_timestamp) AS timestamp,
 STRING_AGG(source_medium, ' > 'ORDER BY event_timestamp) AS touchpoints,
FROM
 t3
GROUP BY 1
ORDER BY
 1,
 2

This report can be extended for various use cases. Here are some tips to get inspired:

1. Product category path analysis: This query can be adjusted to a product category level to learn about their user journey. For example, the number of touchpoints and the length of time from first visit to purchase. By learning about the individual journeys of the product categories, we can create more effective marketing activities at the product category level to optimise our budget planning.

Most path taken for view/conversion of product category A

2. Conversion path analysis: We can break down this report by different conversion goals such as add to cart and purchase. By analysing the specific paths that users take on their way to purchase, we can understand the common touchpoints and interactions that lead to this goal and adjust our marketing efforts accordingly, potentially delivering more conversions and revenue.

Most path taken that leads to a specific conversion goal

Happy querying!

Do you need more Info?

Contact