Can we use a customer’s first product purchase to calculate their life-time value?

Stewart Robertson

Stewart Robertson

Analytics and Data Science

Using CRM, web and app analytics data can help marketers identify high-value customers; this is useful in creating targeted, customised campaigns to attract high-value customers and maintain customer retention.

We can also use data to identify if a particular product attracts customers with a high life-time value (LTV); i.e. if someone purchases a particular product, are they more likely to purchase in the future? A product which increases the likelihood of someone subsequently placing more orders and spending more money is much more valuable than just its retail price.

In this blog, we will run through how to identify a product’s life-time value using Google BigQuery. For the purpose of this analysis, life-time value is calculated as the total value of all transactions, excluding the first one, for a given user. The first transaction is not included in the model as we are specifically measuring subsequent LTV.

Building the query

For this analysis, you will need to run the following sub-queries:

  1. Extract a list of SKUs X sessionId
  2. Extract a list of transacting sessions filterable on whether or not this is the first transaction
  3. Calculate a LTV (as defined above) for each uniqueVisitorId by grouping (2.) by visitorId and taking a SUM() of all non-first transactions.

A SKU or stock keeping unit is a unique code that is used to identify a product.

These can then be pulled together:

  1. INNER JOIN(1.) to (2.) on sessionId and filter WHERE isFirstTransaction = true to give a visitorId –> sku in first transaction
  2. INNER JOIN this result to (3.) above and GROUP BY sku, aggregating the AVG() of revenue to give our final result.

We have provided the full query at the end of the guide; this can be copy and pasted with minimal changes required.

Step 1 – extracting SKUs

The first query you will to do is to extract the SKUs.

The data returned should look like the following:


Step 2 – extracting transaction sessions

Next, you’ll need to extract the transaction sections. You can easily extract where a transaction takes place using a totals.transactions > 0 filter; you will then need to determine if this is the first transaction for a given user using BigQuery’s analytic functions to PARTITION BY visitorId then add a filterable column by checking for those rows where RANK()= 1


You will receive data in the following format:


Step 3 – calculating the LTV

You can calculate the life-time value using the transactingSessions sub-query from above, by grouping visitorId where isFirstTransaction = false.

This will return the following information:


Step 4 – pulling everything together

Once you have the sku, transactingSessions and ltv subqueries defined, you can put together your final query which joins up all three to calculate the life-time value of a product by doing the following:

  • INNER JOIN sku to transactingSessions on sessionId and filter WHERE isFirstTransaction = true to give a sku –> visitorId lookup
  • INNER JOIN this result to ltv and GROUP BY sku, aggregating the AVG() of revenue to give our final result.

In the following query, we have included additional product information (where available) and some global life-time value statistics – this means we are able to compare the life-time value of our products, compared to other products worldwide. Including this in the query is not essential.



You should receive data which looks like the following:
064435HERO EMBROIDERED SWEATTops15336.064379532504.365766179.073483124226373.043939
063761PETAL SLEEVE TEETops5210.2926671518885.019969179.073483124226373.043939
061874CLEO COTTON BARDOT TOPsale-tops3765.8600001NaN179.073483124226373.043939
063214OLVIA OPEN 2 PART COURT(not set)3613.3700001NaN179.073483124226373.043939



While this analysis is useful in calculating a life-time value for a product, there are a few considerations that need to be taken into account when interpreting the results.

For one, the analysis measures a correlation in historic data so cannot be used to attribute cause. This means that, even if product A is shown to produce valuable customers, creating more ads and product listings for product A doesn’t necessarily mean those who convert will go on to become high value customers.

Another key consideration is that this analysis includes recent customer acquisitions, which may involve customers who may still be in the return period and yet to make a return. As we are not measuring the difference in LTV between groups based on initial product purchased (rather than seeking to measure LTV in isolation), this bias is consistent throughout all calculations and can be ignored as long as there is a stable product set.

We are also unable to see how likely the average LTV by sku is likely to vary by chance alone. This means the data could be skewed if, for example, your highest value customer is the only customer who bought product A in their first transaction. To overcome this, you will need to check sample size and carry out more statistical tests.

A third flaw with this dataset is that we do not know if new customers are likely to return in the future; this should average out over time, however you should avoid reaching conclusions if a new product has low or high life-time values.

Want to get the most out of your data? Speak to our data and analytics team and find out how we can help.

Get the latest digital marketing insights delivered to your inbox