Using CRM, web and app analytics data can help marketers identify high-value customers; this is useful in creating targeted, customised campaigns to attract high-value customers and maintain customer retention.
We can also use data to identify if a particular product attracts customers with a high life-time value (LTV); i.e. if someone purchases a particular product, are they more likely to purchase in the future? A product which increases the likelihood of someone subsequently placing more orders and spending more money is much more valuable than just its retail price.
In this blog, we will run through how to identify a product’s life-time value using Google BigQuery. For the purpose of this analysis, life-time value is calculated as the total value of all transactions, excluding the first one, for a given user. The first transaction is not included in the model as we are specifically measuring subsequent LTV.
Building the query
For this analysis, you will need to run the following sub-queries:
- Extract a list of SKUs X sessionId
- Extract a list of transacting sessions filterable on whether or not this is the first transaction
- Calculate a LTV (as defined above) for each uniqueVisitorId by grouping (2.) by visitorId and taking a SUM() of all non-first transactions.
A SKU or stock keeping unit is a unique code that is used to identify a product.
These can then be pulled together:
- INNER JOIN(1.) to (2.) on sessionId and filter WHERE isFirstTransaction = true to give a visitorId –> sku in first transaction
- INNER JOIN this result to (3.) above and GROUP BY sku, aggregating the AVG() of revenue to give our final result.
We have provided the full query at the end of the guide; this can be copy and pasted with minimal changes required.
Step 1 – extracting SKUs
The first query you will to do is to extract the SKUs.
The data returned should look like the following:
sessionId | sku | action_type |
1164614722088736449.1526934611 | 065833 | 6 |
756813470025851696.1526937398 | 065473 | 6 |
756813470025851696.1526937398 | 065838 | 6 |
951009676598137343.1526891377 | 066749 | 6 |
Step 2 – extracting transaction sessions
Next, you’ll need to extract the transaction sections. You can easily extract where a transaction takes place using a totals.transactions > 0 filter; you will then need to determine if this is the first transaction for a given user using BigQuery’s analytic functions to PARTITION BY visitorId then add a filterable column by checking for those rows where RANK()= 1
You will receive data in the following format:
visitorNumber | fullVisitorId | sessionId | revenue | isFirstTransaction |
12 | 1010643915505925760 | 1010643915505925760.1520173083 | 24.990000 | True |
1 | 1011055334738282173 | 1011055334738282173.1532413628 | 41.990000 | True |
3 | 1018335016535395271 | 1018335016535395271.154281634 | 143.131644 | True |
1 | 1026859594613413750 | 1026859594613413750.1528977270 | 15.000000 | True |
6 | 1034048691243105163 | 1034048691243105163.1538805374 | 20.590000 | True |
Step 3 – calculating the LTV
You can calculate the life-time value using the transactingSessions sub-query from above, by grouping visitorId where isFirstTransaction = false.
This will return the following information:
fullVisitorid | value |
1151702946562056361 | 100.50 |
1206704950534240620 | 27.30 |
3906282176117214333 | 597.57 |
5346764750611188410 | 112.98 |
564307924829126123 | 47.99 |
Step 4 – pulling everything together
Once you have the sku, transactingSessions and ltv subqueries defined, you can put together your final query which joins up all three to calculate the life-time value of a product by doing the following:
- INNER JOIN sku to transactingSessions on sessionId and filter WHERE isFirstTransaction = true to give a sku –> visitorId lookup
- INNER JOIN this result to ltv and GROUP BY sku, aggregating the AVG() of revenue to give our final result.
In the following query, we have included additional product information (where available) and some global life-time value statistics – this means we are able to compare the life-time value of our products, compared to other products worldwide. Including this in the query is not essential.
You should receive data which looks like the following:
entrySku | v2Product Name |
vsProduct Category |
ltvMean ForSku |
ltvCount ForSku |
ltvStdDeviation forSku |
ltvGlobal Mean |
LtvGloBal Count |
ltdGlobal StdDeviation |
064435 | HERO EMBROIDERED SWEAT | Tops | 15336.064379 | 5 | 32504.365766 | 179.073483 | 124226 | 373.043939 |
063761 | PETAL SLEEVE TEE | Tops | 5210.292667 | 15 | 18885.019969 | 179.073483 | 124226 | 373.043939 |
061874 | CLEO COTTON BARDOT TOP | sale-tops | 3765.860000 | 1 | NaN | 179.073483 | 124226 | 373.043939 |
063214 | OLVIA OPEN 2 PART COURT | (not set) | 3613.370000 | 1 | NaN | 179.073483 | 124226 | 373.043939 |
Considerations
While this analysis is useful in calculating a life-time value for a product, there are a few considerations that need to be taken into account when interpreting the results.
For one, the analysis measures a correlation in historic data so cannot be used to attribute cause. This means that, even if product A is shown to produce valuable customers, creating more ads and product listings for product A doesn’t necessarily mean those who convert will go on to become high value customers.
Another key consideration is that this analysis includes recent customer acquisitions, which may involve customers who may still be in the return period and yet to make a return. As we are not measuring the difference in LTV between groups based on initial product purchased (rather than seeking to measure LTV in isolation), this bias is consistent throughout all calculations and can be ignored as long as there is a stable product set.
We are also unable to see how likely the average LTV by sku is likely to vary by chance alone. This means the data could be skewed if, for example, your highest value customer is the only customer who bought product A in their first transaction. To overcome this, you will need to check sample size and carry out more statistical tests.
A third flaw with this dataset is that we do not know if new customers are likely to return in the future; this should average out over time, however you should avoid reaching conclusions if a new product has low or high life-time values.
Want to get the most out of your data? Speak to our data and analytics team and find out how we can help.