Enhanced Ecommerce in Shopify. Part one: Data collection

Shopify is a proprietary platform, hence some aspects of its work are not thoroughly explained in the documentation or on the community forums. One of such subjects is the default integration with Google Analytics (GA). There are tons of articles and tutorials on the web that explain how to enable default GA integration in your store through several clicks but none of them explains which data you may collect with help of this integration. It’s said that it may provide data in the “Enhanced Ecommerce” format, which is a rather common choice these days. This format allows to collect a wide spectrum of data about your customers, their behavior and their choices. But again, what information will be collected and be available in GA reports depends on each particular implementation.

The first of two articles I plan to write on this subject will shed more light on how default GA integration works and what data it provides for your GA reports. In the second article I’ll try to add more custom metrics to the collected data. Hope this information will be useful for you, even if you are ok with your existing GA reports. Maybe it will give you some ideas how you may improve them.

Short disclaimer before we go any further.

  • All information in this article is based on publicly available sources.
  • When I say “Google Analytics” I refer to the Universal Analytics for web.
  • In December 2020, Google released a new version of Google Analytics also known as “Google Analytics 4 properties”. At the time of writing Shopify has not yet provided a default integration for it. I’m sure they will do it later. By then, it will be a good topic for a new article, but for now, I talk about the Universal Analytics only.
  • I presume that you already use the default GA integration on your Shopify account with the “Enhanced Ecommerce” option enabled.

A brief introduction to Enhanced Ecommerce

  • information about the items added to the cart;
  • information about successful transactions (a.k.a. purchases).

With the enhanced version we may collect much more data:

  • impression data — information about the products that have been viewed;
  • product data — information about the products that we’ve interacted with;
  • promotion data — information about promo assets like banners or any other ads that have been viewed;
  • action data —information that explains what we do: click, purchase, add or remove product to/from the cart, etc.

Each data type listed above may be collected with a set of predefined objects, events, actions and hit types. We can’t change them or modify their structure, but we can fill them with data and send to GA.

For example, if we want to track product details on the product page, we should use a “productFieldObject” in combination with the “detail” action and the “pageview” hit type.

A “productFieldObject” has the following structure:

{  id,  <== mandatory field
name, <== mandatory field

In combination with the “detail” action Google recommends to fill the id, name, brand, category and variant fields. I should say it’s a bit confusing to select a list of fields for each particular situation as there are no clear instructions regarding this subject or at least I couldn’t find them in the documentation. Hence, I use Google’s examples as a reference and modify them if needed.

Enhanced analytics does not collect data automatically, thus we should write some code.

Step 1. Add a bootstrap code to the <head> tag

(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),

Step 2. Prepare product’s data for sending to GA

/* init default tracker object for our GA account */
ga('create', 'UA-XXXXX-Y');
/* link Enchanced Ecommerce plugin */
ga('require', 'ec');

/* add data to the 'addProduct' object stored in the tracker */
ga('ec:addProduct', {
'id': 'P12345',
'name': 'Android Warhol T-Shirt',
'category': 'Apparel',
'brand': 'Google',
'variant': 'black'

/* set associated action data to 'detail' i.e. product details */
ga('ec:setAction', 'detail');
/* send product details view data with the hitType 'pageview' */
/* hitType - something like a scope of data */
ga('send', 'pageview');

Now, as soon as we open the product page, this code is going to be executed, product’s data will be transformed into special format and sent to GA as GET parameters in the image request from the Google servers.

If you bother to inspect your network traffic, you may find there a request like this:

https://www.google.fr/pagead/1p-conversion/761607038/?random=85073946&cv=9&fst=1609281614630&num=1&label=-3HECIiFqM8BEP7mlOsC&bg=ffffff&guid=ON&resp=GooglemKTybQhCsO&u_h=1080&u_w=1920&u_ah=1053&u_aw=1920&u_cd=24&u_his=6&u_tz=60&u_java=false&u_nplug=3&u_nmime=4&gtm=2oabu0&sendb=1&ig=1&data=event%3Dpage_view%3Bpage_path%3D%2Fproducts%2Ftop-helder%3Bpage_title ... 

A moment later our product’s data should be available in GA reports. Voilà!

The whole process looks rather simple: branch up GA script (step 1), prepare the data (step 2), use it in reports. Looks like everyone can do it. That’s true. If you have some time to get through the exhaustive documentation, you may set up a custom data collection that covers all your needs. Unfortunately, in most cases, there is no time or available human resources, thus default Shopify integration becomes a very appealing choice. It takes care of each step for you and it’s free of charge.

Like most free solutions it comes with a catch. In this case we have two things:

  1. this solution is not transparent. We don’t really know what data is being collected and how it’s being collected. It’s a black box.
  2. we can’t customize it. Shopify does not give us any means to extend or modify the collected data. Of course we may install a Google tag manager on top of this solution to get more data, but how we could be sure that we don’t collect the same data twice?

Let’s see what we can do about all this.

Down the rabbit hole

GA client script can not collect any Ecommerce data on its own. It delegates this part of the job to site developers along with the list of available API. Using these API may be quite a challenging task and something may not work as you expect. In this case we may use the GA debug library. When it is enabled, it starts to write detailed logs to the javascript console. With some luck we may find there something useful to fix our bug or to understand how GA gears work.

Those who like to mess with the code may activate trace logs as follows:

// replace standard boot code with the code below(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
// activate trace mode before any ga() callwindow.ga_debug = {trace: true};...

For casual users there is a Chrome extension created by Google. I find it very handy: activate the extension, refresh the page and all trace logs are in the console. I recommend it to everyone who works with GA on a daily basis.

With help of this extension we may see what data is being sent to GA from every page on our site:

GA trace logs for the product page with a default GA integration on Shopify

Now, we have a general impression of what data we may expect to see in our GA reports. If we want to know how to modify it we should go even further, into murky waters of minified js files.

Follow the white rabbit

Let’s look for a file with a charming name `trekkie.storefront.[long hash value].min.js` in the source code of a product page. This file contains a Shopify library that controls all the data tracking services Shopify has a default integration with. This script installs, configures and feeds the GA library with all necessary data.

If you open this file in your favorite editor, you probably will see something like this:

Don’t worry. It’s OK. At first Neo couldn’t read it and didn’t understand a thing either. It takes time and patience to master this skill. Find any online js deminifying service and convert the file to a more readable form. Here is a copy of what I’ve got after deminification (github link).

What is trekkie library for? It looks to me like a boot script for all analytic services with a list of predefined callback functions for a various data collection events. IMHO it’s efficient and simple enough to extend and maintain. A single js library is being shared across all the site pages. With enabled client caching it will be loaded only once. A server-side script, in turn, adds a tiny page/template-specific js code to trigger a data collection event and pass all necessary data to it, like product details or a list of products in the cart etc.

The initialization script checks for Enhanced Ecommerce support and if it’s enabled, the script applies this format for some events.

Lines 664–674

As we may see below, the default GA integration applies Enhanced format in four cases only: when we view a product page, when we add a product to the cart, when we are on the first checkout step and when we arrive on the payment confirmation page (aka “thank you page”).

Lines 688–693

That gives us not so much data to work with in GA reports. There are no promotion, impression data and almost no action data. There will be a general overview of the sale performance, but almost no data about users behavior. Which may be very useful if we want to estimate the effect of our internal promotional campaigns, quick shop/quick search/wish list buttons, an engagement of new grid layout on the category page and so on. This list may be endless. Everything depends on your business needs. If you want to have additional insights, you should think of a custom solution or find an application that may provide them to you.

Let’s have a closer look at these four cases.

Lines 614–663

All events share the same format for product details. Nothing special here except the id value. If a product does not have SKU (stock-keeping unit id), then the variant’s id takes its place.

Lines 716–719

In the beginning of a checkout process Shopify sends a list of products in the cart plus a checkout step number. When we successfully complete all checkout steps, the same list is being sent plus some additional order information.

Like in the product object, the id value may have one of two values. We’ve already seen that. What I find odd here is that a revenue value is set to a total value when a total value is defined and is not zero. Why is it so if there is a special revenue variable (t.revenue) for this? If you know the answer, please let me know in the comments.

In three out of four cases the data is being sent to GA when a user opens the page. Which is a trivial thing to arrange on any Shopify site regardless of the theme being used. To spot the moment when a user adds a product to the cart is way more challenging from the technical point of view. Especially if it’s a generic solution for any theme structure.

Let’s think. How could we catch the “add to cart” event? Whenever we add a product to the cart, we submit a form with a “/cart/add” action. Adding an event listener on submit events may do the trick: as soon as a user clicks on the “add to cart” button, the event listener kicks in, the event is being canceled with “event.preventDefault()”, we send all that we need to GA and then we resubmit the form with “form.submit()”. Elegant and simple. This is exactly how Shopify does it. Here is a full file version (github link).

Lines 240–247

But what if we add products via Ajax requests? The previous trick with event listeners will not help here. It requires a more sophisticated approach. It turns out the solution lies on the surface, but I wish there there was another one.

Lines 249–258
Lines 282–309

Shopify overwrites a native fetch() function and a XMLHttpRequest object without saying a word about it in the documentation, thus all your XHR and fetch calls are not served by native API as you might have thought.

Why are you doing this?

As a developer I understand why it was done so. This is the only way to cover all Ajax cases. Without it the whole GA integration looks less appealing. It’s like champagne without bubbles — nobody needs it. I’m sure someone smart and savvy has weighed all pros and cons before rolling out this solution on live. Whatever the cost is. Yes, what about the cost? When you mess up with native functions or objects, you should be prepared for the consequences. Overwriting a native object in your private project may not be a big deal when you are aware of what you are doing. In the worst case you will earn yourself a sleepless night to fix the bug. When you do the same thing secretly on a larger scale, you put at risk thousands people who don’t expect that a problem may arise from this particular direction. Besides purely technical risks there is a question of trust and privacy. A lot of data is being transferred via Ajax requests including passwords and other sensitive information. I wouldn’t want this information to be available to anyone except the person it was addressed to. Sniffing Ajax traffic is not a good at all. It’s rather easy to fix a technical bug, but it’s much harder to regain people’s trust. Shopify should be more transparent with its customers about sensitive features like this one.

Let me finish the first part of this article on this bright note. I congratulate those who were patient to read down this far. I hope it was not in vain and you’ve learned something new about default Shopify integration with Enhanced Ecommerce: how it works, what data goes to GA servers and how to check the data.

In the second part I want to make a small experiment. I’ll try to add custom product metrics to the default GA integration. We’ll see how we can do this and whether it’s worth the trouble. Stay tuned!

Takeaway from this article

  • Default GA integration gives enough data for a general overview of a shop’s performance.
  • If you need promotion, impression data or specific action data, you should look for a custom GA setup or for third party applications.
  • Default GA integration rewrites some native web API.

Curious developer at The Other Store