2023-01-26 | Article | Insights
Joining Google Analytics 4 data with other data sources in BigQuery can be beneficial in many cases. For example, joining GA4 data with CRM data can give us a complete picture of how users interact with products in both the online and offline world. Or joining GA data with data from other marketing platforms, such as CM360 and DV360, can help us to understand the conversion paths more in detail and eventually attribute conversions on a more granular level to the specific channels that drove them. These are just a few examples of the many use cases for joining Google Analytics 4 data with other data sources in BigQuery. The specific use cases will depend on the data we have available and the goals of the analysis.
In this article, we will go through an example where Google Analytics 4 data is joined with Google Merchant Center data in BigQuery.
Google Merchant Center gives us information about products. For instance, product id, product group, description, images, and performance data such as clicks, impressions, and conversion data. Google Analytics 4 data, on the other hand, includes information about products and interaction with products, such as product id, add-to-cart, revenue, and conversion on the website. By joining these two data sources in BigQuery on the product id level, we can gain a granular understanding of our products. For example, we can see which product groups are the most popular, how different product groups are related to the content of our website, gain insights on the performance of products, and make data-driven decisions on which products to keep in stock and which to phase out.
We can start writing the query with querying fields from GA4 data. As we can see in the first subquery, the item id, revenue and sold quantity are extracted for the last seven days. Within the condition we specify that we would like to exclude the products that had no sale. Within the second subquery, some product information such as id, type, and availability are queried from GMC data. As mentioned, there is various information on products available in this table, so feel free to explore the relevant ones for your report and analysis. In the last part of the query, we simply bring these two sources together by applying a LEFT JOIN, using the product id as the key between the two data sources. A LEFT JOIN in BigQuery is a kind of a join that returns all the rows from the left table (our sold products in GA4), along with the matching rows from the right table (our GMC product detail information). If there is no matching product information in the GMC data, the result will contain NULL values for product type and availability columns.
WITH ga AS( SELECT items.item_id AS item_id_ga, SUM( items.item_revenue) AS revenue, SUM(ecommerce.total_item_quantity) AS quantity FROM #--- Replace the following table with your own data `project_id.dataset.analytics_.events_*`, UNNEST(items) AS items WHERE #--- Please adjust the following time range to your own needs _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 8 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND items.item_revenue IS NOT NULL AND ecommerce.total_item_quantity IS NOT NULL GROUP BY 1), gmc AS ( SELECT DISTINCT offer_id, item_group_id, product_type, availability, #--- Replace the following table with your own data FROM `project_id.dataset.Products_merchant_id` ) #--- Here we bring the two data sources together SELECT ga.item_id_ga, item_group_id, product_type, availability, revenue, quantity, FROM ga LEFT JOIN gmc ON ga.item_id_ga=gmc.offer_id
One of the use cases of the output generated by this table is to create a top-performance product group list that can be used as a basis for planning campaign strategies in the following period. This ranking can be calculated by creating rule-based thresholds based on the revenue, sold quantity, and availability of products in the previous seven days.