Can we use a customer’s first product purchase to calculate their life-time value?

Stewart Robertson


Analytics and Data Science

Using CRM, web and app analytics data can help marketers identify high-value customers; this is useful in creating targeted, customized campaigns to attract high-value customers and maintain customer retention.

We can also use data to identify if a particular product attracts customers with a high life-time value (LTV); i.e. if someone purchases a particular product, are they more likely to purchase in the future? A product which increases the likelihood of someone subsequently placing more orders and spending more money is much more valuable than just its retail price.

In this blog, we will run through how to identify a product’s life-time value using Google BigQuery. For the purpose of this analysis, life-time value is calculated as the total value of all transactions, excluding the first one, for a given user. The first transaction is not included in the model as we are specifically measuring subsequent LTV.

Building the query

For this analysis, you will need to run the following sub-queries:

  1. Extract a list of SKUs X sessionId
  2. Extract a list of transacting sessions filterable on whether or not this is the first transaction
  3. Calculate a LTV (as defined above) for each uniqueVisitorId by grouping (2.) by visitorId and taking a SUM() of all non-first transactions.

A SKU or stock keeping unit is a unique code that is used to identify a product.

These can then be pulled together:

  1. INNER JOIN(1.) to (2.) on sessionId and filter WHERE isFirstTransaction = true to give a visitorId –> sku in first transaction
  2. INNER JOIN this result to (3.) above and GROUP BY sku, aggregating the AVG() of revenue to give our final result.

We have provided the full query at the end of the guide; this can be copy and pasted with minimal changes required.

Step 1 – extracting SKUs

The first query you will to do is to extract the SKUs.

The data returned should look like the following:
sessionId sku action_type
1164614722088736449.1526934611 065833 6
756813470025851696.1526937398 065473 6
756813470025851696.1526937398 065838 6
951009676598137343.1526891377 066749 6

 

Step 2 – extracting transaction sessions

Next, you’ll need to extract the transaction sections. You can easily extract where a transaction takes place using a totals.transactions > 0 filter; you will then need to determine if this is the first transaction for a given user using BigQuery’s analytic functions to PARTITION BY visitorId then add a filterable column by checking for those rows where RANK()= 1

You will receive data in the following format:
visitorNumber fullVisitorId sessionId revenue isFirstTransaction
12 1010643915505925760 1010643915505925760.1520173083 24.990000 True
1 1011055334738282173 1011055334738282173.1532413628 41.990000 True
3 1018335016535395271 1018335016535395271.154281634 143.131644 True
1 1026859594613413750 1026859594613413750.1528977270 15.000000 True
6 1034048691243105163 1034048691243105163.1538805374 20.590000 True

 

Step 3 – calculating the LTV

You can calculate the life-time value using the transactingSessions sub-query from above, by grouping visitorId where isFirstTransaction = false.

This will return the following information:
fullVisitorid value
1151702946562056361 100.50
1206704950534240620 27.30
3906282176117214333 597.57
5346764750611188410 112.98
564307924829126123 47.99

 

Step 4 – pulling everything together

Once you have the sku, transactingSessions and ltv subqueries defined, you can put together your final query which joins up all three to calculate the life-time value of a product by doing the following:

  • INNER JOIN sku to transactingSessions on sessionId and filter WHERE isFirstTransaction = true to give a sku –> visitorId lookup
  • INNER JOIN this result to ltv and GROUP BY sku, aggregating the AVG() of revenue to give our final result.

In the following query, we have included additional product information (where available) and some global life-time value statistics – this means we are able to compare the life-time value of our products, compared to other products worldwide. Including this in the query is not essential.

 

You should receive data which looks like the following:
entrySku v2Product
Name
vsProduct
Category
ltvMean
ForSku
ltvCount
ForSku
ltvStdDeviation
forSku
ltvGlobal
Mean
LtvGloBal
Count
ltdGlobal
StdDeviation
064435 HERO EMBROIDERED SWEAT Tops 15336.064379 5 32504.365766 179.073483 124226 373.043939
063761 PETAL SLEEVE TEE Tops 5210.292667 15 18885.019969 179.073483 124226 373.043939
061874 CLEO COTTON BARDOT TOP sale-tops 3765.860000 1 NaN 179.073483 124226 373.043939
063214 OLVIA OPEN 2 PART COURT (not set) 3613.370000 1 NaN 179.073483 124226 373.043939

 

Considerations

While this analysis is useful in calculating a life-time value for a product, there are a few considerations that need to be taken into account when interpreting the results.

For one, the analysis measures a correlation in historic data so cannot be used to attribute cause. This means that, even if product A is shown to produce valuable customers, creating more ads and product listings for product A doesn’t necessarily mean those who convert will go on to become high value customers.

Another key consideration is that this analysis includes recent customer acquisitions, which may involve customers who may still be in the return period and yet to make a return. As we are not measuring the difference in LTV between groups based on initial product purchased (rather than seeking to measure LTV in isolation), this bias is consistent throughout all calculations and can be ignored as long as there is a stable product set.

We are also unable to see how likely the average LTV by sku is likely to vary by chance alone. This means the data could be skewed if, for example, your highest value customer is the only customer who bought product A in their first transaction. To overcome this, you will need to check sample size and carry out more statistical tests.

A third flaw with this dataset is that we do not know if new customers are likely to return in the future; this should average out over time, however you should avoid reaching conclusions if a new product has low or high life-time values.

Want to get the most out of your data? Speak to our data and analytics team and find out how we can help.