Is online conversion rate influenced by proximity to a physical store?

Avatar

Stewart Robertson


Google Analytics 360

How to use GA 360 data and BigQuery’s Geography functions to determine if proximity to a physical store influences online user behaviour

BigQuery’s Geographic functions provide a simple way to measure the distance between a session location and a physical location. We can use this data to analyse if conversion rate is influenced by proximity to a store.

Here we will walk you through how to run an analysis to test this for your own brand.

Pre-requisites

In order to run this test, you will need to have:

  • A list of physical store locations with the latitude and longitude of each
  • Google Analytics 360 (GA 360) session data, exported to BigQuery tables.

If you don’t know the latitude and longitude of your stores, there are free mapping websites that can identify them for you.

You will also need to know how to both import your store list into BigQuery, and how to run queries against data in BigQuery in order to run this analysis. If you’re unsure how to do either, Google has some easy to navigate walk throughs here.

 

Building the query

In this guide, we have built up the query in stages using a WITH clause to alias each sub-query. This allows us to explain the reasoning behind the method, making it easier to replicate.

However, we have also provided the full query at the end of the guide which can be copy and pasted with only minimal changes required, should you wish to head straight there.

 

Step 1 – Importing store locations to BigQuery

To run this analysis, you will need to have your list of store locations with latitude and longitude values in a BigQuery table, ensuring that the fields are labelled as latitude and longitude respectively. If you have this list of store locations in a local file, you can follow this guide to import it correctly.

 

Step 2 – Extracting the latitude and longitude values

The first sub-query you will need to do is extracting the latitude and longitude values for each store:

The data returned will look like the following:

store lat long
Leeds 53.8008 -1.5491
Manchester 53.4808 -2.2426
Cardiff 51.4816 -3.1791
Norwich 52.6309 1.2974

 

Step 2 – Identifying session locations

Next, you need to extract the latitude and longitude of session locations. This data is available using the geoNetwork.latitude and geoNetwork.longitude fields in the GA360 BigQuery Export.

The first few results of this query look a little like this:

sessionId latitude longitude
1543771226.77848 51.5074 -0.1278
1543762386.56107 53.796 -1.7594
1543719115.29403 55.8642 -4.2518
1543782918.54377 53.1269 -1.2625

 

Note that:

  1. In order to get a unique session id column we need to combine visitId and fullVisitorId which I have done by simply concatenating these values separated by a full-stop
  2. Occasionally no location can be found for a session which shows up as a value of ‘0.000‘ in the latitude & longitude fields in the export. These will need to be filtered out
  3. geoNetwork.latitude and geoNetwork.longitude fields are stored as strings so we CAST them to FLOAT64
  4. In this example we filtered the sessions to the year ending 30th November 2018.

 

Step 3 – Calculate the distance to the nearest store

Once you have extracted the latitude and longitude for physical locations and sessions, you will need to calculate the distance between each session and the nearest store. This can be calculated via BigQuery’s Geography functions using the following function:

Calculate the distance from each store to each session using a CROSS JOIN, and then GROUP BY sessionId taking the minimum of each calculated distance in order to find the nearest store as follows:

This will map out the sessionId to distance from the nearest store location:

sessionId distance
1543771226.77848 50.005559
1543762386.56107 30.792333
1543719115.29403 66.11263
1543782918.54377 115.960936

 

Step 4 – Calculating the conversion rate

In this example, we have defined a conversion as any session with a transaction. However, you can choose what to use here according to the definition of conversion you would like to measure.

In our case we check if the transactions count is greater than zero using this expression:

IFNULL(totals.transactions, 0) > 0 

Note that we have used the IFNULL operator in this example – this is because BigQuery returns a NULL value, rather than a zero in cases where a session has zero transactions.

Pull this in using an INNER JOIN from the original export, to the SessionLocations:

The result should look like the following:

sessionId distance hasConverted
1543771226.77848 50.005559 False
1543762386.56107 30.792333 True
1543719115.29403 66.11263 False
1543782918.54377 115.960936 False

 

Step 5 – Aggregating values by distance

There are two main ways you can aggregate these values by distance:

  1. You can manually choose aggregation ‘buckets’ based on human intuition. For example, you may want to test if there is a difference in conversion rate for sessions within five miles of a store
  2. You can divide the data up into segments automatically, plot it out and identify any trends.

Manually chosen distance buckets

For this method, you need to:

  1. Define a sub-query with our distance buckets and their thresholds
  2. Do a CROSS JOIN to our session conversion data then filter rows where the distance is outside the threshold
  3. GROUP BY ‘bucket’, then COUNT the sessions and SUM the conversions in order to calculate the conversion rate.

It is important to ensure that you only allocate each session to one distance bucket; it’s easy to cross check the number of rows returned from this query against the ConveredByDistance sub-query to make sure this is the case.

This query will return your final results:

distanceBucket sessionCount conversionCount conversionRate
1. Within 1 Miles 13453322 421148 0.031304
2. Between 1 and 5 Miles 1457340 45268 0.031062
3. Between 5 and 10 Miles 2638155 89445 0.033904
4. Between 10 and 25 Miles 3197742 104457 0.032666
5. Over 25 Miles 429943 14556 0.033856

 

Automatically calculated distance buckets

This method has a different approach to deriving the Thresholds sub query:

  1. Use  BigQuery’s APPROX_QUANTILES() function to split your set of distances into quantiles. For this example I have split it into 32 quantiles
  2. UNNEST this array and use the RANK() function to number each row
  3. Having numbered each row, you can now INNER JOIN this sub-query to itself with a row offset. This will enable you to get an upper-bound and a lower-bound value from the set of quantiles
  4. Continue as before replacing the manually constructed Thresholds with your calculated ones.

For this approach, use the following Thresholds sub query:

Note that we replace the first lower bound with zero and the last upper bound by infinity to ensure we do not miss any values as the original values returned from APPROX_QUANTILES are only approximate min and max values.

This query returns a table similar to the output from the manually chosen distance buckets, but has more data which allows you to visualise conversion rate by distance by plotting it on a scatter chart like this:

In this example, there was no clear trend between store proximity and conversion rate. However, it’s feasible that a trend may be spotted; for example, we might see online conversion rates drop as sessions become closer to physical stores with users more likely to pop into the store. Or, we might see higher conversion rates for sessions carried out near a store, as people see the physical store and decide to look online when they get home.

Either way, carrying out this analysis gives you the data to understand if you should remarket to these users differently or even tailor their on-site experience giving more prominence to in-store features or calls-to-action.

The complete query

As promised, here is the complete query you can use to calculate whether proximity to a physical store influences session behaviour.

In order to use this query, you will need to replace the table identifiers with your own table identifiers and uncomment the correct section for the Thresholds sub-query depending on whether you would like to use automatically calculated distance buckets or to define your own. You may also want to filter for certain dates.

Get the latest digital insights straight to your inboxSubscribe here


Analytics and Data ScienceInsights

The challenges of considered purchases and harnessing behavioural modelling to predict the future

Read article