GA 4 User Segmentation for Content-Based Websites in Google Cloud BigQuery

2023-02-15 | Article | Insights

Content-based websites are often struggling with having a deep understanding of the behaviour and interest of their users. User segmentation is a helpful analysis to gain insights about the different user groups on content-based websites. There are a variety of ML algorithms such as K-means that can be used for user segmentation. But it is also possible to start simply and quickly to segment our users by some basic measurements, for instance by the number of pages they visit, the time they spend on the site, and the frequency of their visits. By doing so, we can gain insights into how different types of users interact with our content which can help us improve user experience and increase engagement.

In this article, we will go through a query that produces a basic table for segmentation by visit frequency and engagement of our user groups daily. First, the date and device type are queried (since this report is daily and we would like to distinguish between devices). For getting the frequency of visits per day, we use the COUNT clause and the window function to get the count of distinct session ids partitioned by user id, device type, and date. With the same logic, we would get the pageviews with the count of event name page_view. We now would need the time on the website in seconds (that is why we divide the engagement_time_msec by 1000). We get this information with a SUM clause and a window function. In the last query, we will get the sum of page views and engagement time per date, device, and frequency of visits.

WITH base AS(
SELECT
DISTINCT
event_date,
device.category as device_type,
user_pseudo_id,
COUNT (DISTINCT (SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id'))
      OVER (PARTITION BY user_pseudo_id,event_date,device.category) AS number_of_sessions,
COUNTIF(event_name = 'page_view')
      OVER (PARTITION BY user_pseudo_id,event_date,device.category) AS number_of_pageviews,


SUM(((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec'))/1000)
      OVER (PARTITION BY user_pseudo_id,event_date,device.category) AS timeOnSite_sec
FROM
-- Replace the following table with your own data
`bigquery_project_id.dataset_id.events_*`)
SELECT
event_date,
device_type AS device,
number_of_sessions,
SUM(number_of_pageviews) AS sum_of_page_views,
SUM(timeOnSite_sec) AS sum_of_timeOnSite_sec
FROM base
GROUP BY
1,2,3
ORDER BY
1,2,3,4,5

This query will generate a daily report of pageviews and engagement time per frequency of visits, distinguished by device type. Some of the insights and use cases of this table are the following:
  • Monitoring reports: By setting up a reporting dashboard on the development of these segments over time, we can drive insights in a variety of areas. For instance, analysis of the health of the website, detection of abnormalities, and analysis of campaign activity impact on different user groups.
  • Better understanding of user groups: Segmentation provides insights into which groups are most loyal to our business and which are struggling (one time visitors with low engagement time). These insights can help to create tactics on how to reactivate the low-engaged users and eventually increase high-quality traffic.
  • The basis for campaign strategy: ​​by tailoring our marketing efforts to specific groups of users based on their engagement, interests, or other characteristics, we can increase the effectiveness and efficiency of our campaigns.

Feel free to extend and adjust this query to your business-specific needs. Happy querying!

Do you need more Info?

Contact us