April 28, 2023
Ruben Alanis, Lead Analyst, Digital Analytics
Data Standards, according to the EPA (Environmental Protection Agency), are documented agreements on the representation, format, definition, structuring, tagging, transmission, manipulation, use, and management of data. Data Standards are widely used in every industry and shared among competitors to some degree. To successfully create Data Standards, an organization needs to reach an agreement on how to organize and define their information. This includes creating Data Standards for their web analytics or product analytics tools, such as GA4. If you are unfamiliar with GA4, check out our guide here.
In this article, we will focus on the thought process of how to create Data Standards for GA’s ‘purchase’ event. These same standards can be applied to the rest of the GA4 events.
Many groups or departments within a company will want to share information with their coworkers, partners, and service providers or use the data for different purposes such as reporting. This will be difficult if the information is unstandardized. GA4 processes information in any format since it is meant to be customized by the user. For this reason, developers often use different formats to name or populate variables in the dataLayer.
Not having data standards can be quite costly for a company in terms of resources, communication, and can be a hindrance to reporting. A good example of unstandardized data is the date format. Here are some date format examples: YYYYMMDD, YYYY-MM-DD, YYYY/MM/DD, MM/DD/YYYY, DD/MM/YYYY.
The best way to standardize the data in GA4 is by creating a dataLayer with specific rules and structure. The structure must include every transaction and product-level attribute. Every variable of the dataLayer must be defined, have an expected format, and be used consistently on any iteration of it.
The first step to take before creating custom events or custom definitions is to get familiar with GA4’s dataLayer documentation to see which properties are provided by GA4 and which ones apply to the website or product. We will also need to follow Google’s standards. For example, Google requires that the currency be populated in the 3-letter ISO 4217 format.
The dataLayer can be used differently depending on the business’s needs. To illustrate how to apply Data Standards in the dataLayer we will use a music store as an example. Our music store is an international company that sells its products across markets and has different site editions. This example provides guidelines on the thought process of creating the basic Data Standards for the GA4’s ‘purchase’ event dataLayer.
The music store sells the following products: instruments, sheet music, strings, and accessories. Each product has different attributes. Here are the attributes for each product list:
o Instrument name: Violin, guitar, piano, flute, etc.
o Instrument type: Electric or Acoustic
o Brand/Maker: Yamaha, Steinway, name of the luthier
o Color: Black, white, yellow, orange
· Sheet Music:
o Title: Title of the composition or sheet music book
o Sheet music type: Baroque, Classical, Romantic, Contemporary, Modern
o Publisher: Name of the company that publishes the book
o Composer: Name of the composer
o Instrument name: Instrument associated with the book
o Level: Beginner, Intermediate, Expert
o String set name: Name of the string set
o Instrument name: Instrument the strings are for
o Instrument type: Electric or acoustic
o Brand: Brand of the strings
o Accessory name: The name of the accessory
o Accessory type: The category of the accessory
o Brand: Brand of the product
o Instrument name: Instrument the accessory supports
To translate these item lists, products, and attributes to GA4 we will need to map the attributes to the variables available for GA4:
As of now, the best way to add product-level dimensions in GA4 is to add the attributes to the item_category [2-6]. This means that GA4 allows 6 levels of product segmentation. The names of these variables do not describe any specifics of the value. Therefore, every item category level will need to be documented and used consistently across the website:
· item_category: Includes instrument type, sheet music type, accessory type
· item_category2: Will display the instrument name associated with the product (exclude items from the ‘Instruments’ list)
· item_category3: Composer or writer of the book
To keep consistency in reporting, every value must be displayed in English even if the user is in a site edition in a different language.
The next step would be to map the GA4 dimensions with the product lists since not all dimensions apply to every product:
Every business is unique and will require different things to provide insights that GA4 will not provide by default. Custom dimensions will be able to provide this information. For example, suppose the shipping department of our music store requires a report to evaluate if orders are being shipped on time and another report to show where the products are being sent (country and city). GA4 only provides the purchase date and the location of the user’s IP address.
The new variables will be named:
· shipping_date: Shipping date in YYYY-MM-DD format
· shipping_country: Country ISO code
· shipping_city: City name in English
Other departments might need more information and details of the purchase or the items, and these new requirements will need to be documented before adding them to any current implementation.
To visualize how this would look in a dataLayer we will create an items array with a collection of items. The item collection includes a black electric violin (SKU_12345), the music sheet of Tchaikovsky’s Violin Concerto (SKU_23456), an Evah Pirazzi string set (SKU_34567), and a violin polisher (SKU_45678).
Considering the item attributes and the requirements from the shipping department, the new purchase dataLayer would look like this:
item_name: "Violin Concerto in Dmajor",
item_brand: "International MusicCompany",
item_name:"Violin String Set",
item_brand:"W.E. Hill & Sons",
These Data Standards should be used in all GA4 actions that apply to the business, such as select_item, view_item, add_to_cart, refund, etc. All GA4 event actions can be found here.
In summary, a good GA4 Data Standards documentation must include the following:
· Complete list of products/services and attributes
· List of GA4 events and dataLayer variables
· Thorough definition and expected format of every variable
· Mapping of every GA4 event, parameter, and item categories
· dataLayer examples to illustrate the output
Data Standards are documented agreements to represent the way data is described, represented, and structured. According to the Harvard Business Review, companies that treat data like a product can reduce the time it takes to implement it in new use cases by as much as 90%.
The key to successfully creating Data Standards is having consistent, detailed, and centralized documentation. Every parameter needs to have a clear definition and format so that every team knows how to implement and use it. The Data Standards should be aimed to be permanent, but they can be improved and expanded over time. A governance structure will be needed if a new standard becomes popular. New developments should always follow the existing rules.
As a consulting company, Blend360 can create Data Standards for every GA4 dataLayer to improve data analysis and data governance. The GA4 Data Standards will be determined with specific rules that define every data element and how it should be recorded, providing integrity, accuracy, and consistency. This will eliminate the need to translate naming conventions between different sections of the website and create guidelines for future implementations. To learn more about how we can help you improve your data governance and analysis, get in touch with us.