How to use a customer’s first product purchase to calculate life-time value

Using CRM, web and app analytics data can help marketers identify high-value customers; this is useful in creating targeted, customised campaigns to attract high-value customers and maintain customer retention.

We can also use data to identify if a particular product attracts customers with a high life-time value (LTV); i.e. if someone purchases a particular product, are they more likely to purchase in the future? A product which increases the likelihood of someone subsequently placing more orders and spending more money is much more valuable than just its retail price.

In this blog, we will run through how to identify a product’s life-time value using Google BigQuery. For the purpose of this analysis, life-time value is calculated as the total value of all transactions, excluding the first one, for a given user. The first transaction is not included in the model as we are specifically measuring subsequent LTV.

Building the query

For this analysis, you will need to run the following sub-queries:

Extract a list of SKUs X sessionId
Extract a list of transacting sessions filterable on whether or not this is the first transaction
Calculate a LTV (as defined above) for each uniqueVisitorId by grouping (2.) by visitorId and taking a SUM() of all non-first transactions.

A SKU or stock keeping unit is a unique code that is used to identify a product.

These can then be pulled together:

INNER JOIN(1.) to (2.) on sessionId and filter WHERE isFirstTransaction = true to give a visitorId –> sku in first transaction
INNER JOIN this result to (3.) above and GROUP BY sku, aggregating the AVG() of revenue to give our final result.

We have provided the full query at the end of the guide; this can be copy and pasted with minimal changes required.

Step 1 – extracting SKUs

The first query you will to do is to extract the SKUs.

[sql]
WITH Sku AS (
SELECT
CONCAT(fullVisitorId, ‘.’, CAST(visitId AS STRING)) as sessionId,
p.productSku as sku,
h.eCommerceAction.action_type

FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
INNER JOIN UNNEST (hits) AS h
INNER JOIN h.product as p
# https://support.google.com/analytics/answer/3437719?hl=en action_type == 6 is a purchase
WHERE h.eCommerceAction.action_type = ‘6’

)

SELECT * FROM Sku limit 4
[/sql]

The data returned should look like the following:

sessionId	sku	action_type
1164614722088736449.1526934611	065833	6
756813470025851696.1526937398	065473	6
756813470025851696.1526937398	065838	6
951009676598137343.1526891377	066749	6

Step 2 – extracting transaction sessions

Next, you’ll need to extract the transaction sections. You can easily extract where a transaction takes place using a totals.transactions > 0 filter; you will then need to determine if this is the first transaction for a given user using BigQuery’s analytic functions to PARTITION BY visitorId then add a filterable column by checking for those rows where RANK()= 1

[sql]
WITH sessionsWithTransactions AS (
SELECT
CONCAT(fullVisitorId, ‘.’, CAST(visitId AS STRING)) as sessionId,
fullVisitorId,
visitNumber,
CAST (totals.totalTransactionRevenue AS FLOAT64)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
totals.transactions > 0
),

orderedTransactions AS (
SELECT
visitNumber,
fullVisitorId,
sessionId,
revenue,
RANK() OVER (PARTITION BY fullVisitorId ORDER BY visitNumber) as rank
FROM sessionsWithTransactions
),

transactingSessions AS (
SELECT
visitNumber,
fullVisitorId,
sessionId,
revenue,
rank = 1 as isFirstTransaction
FROM orderedTransactions
)

SELECT * FROM transactingSessions LIMIT 5
[/sql]

You will receive data in the following format:

visitorNumber	fullVisitorId	sessionId	revenue	isFirstTransaction
12	1010643915505925760	1010643915505925760.1520173083	24.990000	True
1	1011055334738282173	1011055334738282173.1532413628	41.990000	True
3	1018335016535395271	1018335016535395271.154281634	143.131644	True
1	1026859594613413750	1026859594613413750.1528977270	15.000000	True
6	1034048691243105163	1034048691243105163.1538805374	20.590000	True

Step 3 – calculating the LTV

You can calculate the life-time value using the transactingSessions sub-query from above, by grouping visitorId where isFirstTransaction = false.

orderedTransactions AS (
SELECT
visitNumber,
fullVisitorId,
sessionId,
revenue,
RANK() OVER (PARTITION BY fullVisitorId ORDER BY visitNumber) as rank
FROM sessionsWithTransactions
),

transactingSessions AS (
SELECT
visitNumber,
fullVisitorId,
sessionId,
revenue,
rank = 1 as isFirstTransaction
FROM orderedTransactions
),

ltv AS (
SELECT
fullVisitorid,
SUM(revenue) AS value

FROM transactingSessions WHERE isFirstTransaction = false
GROUP BY fullVisitorId

)

SELECT * FROM ltv LIMIT 5
[/sql]

This will return the following information:

fullVisitorid	value
1151702946562056361	100.50
1206704950534240620	27.30
3906282176117214333	597.57
5346764750611188410	112.98
564307924829126123	47.99

Step 4 – pulling everything together

Once you have the sku, transactingSessions and ltv subqueries defined, you can put together your final query which joins up all three to calculate the life-time value of a product by doing the following:

INNER JOIN sku to transactingSessions on sessionId and filter WHERE isFirstTransaction = true to give a sku –> visitorId lookup
INNER JOIN this result to ltv and GROUP BY sku, aggregating the AVG() of revenue to give our final result.

In the following query, we have included additional product information (where available) and some global life-time value statistics – this means we are able to compare the life-time value of our products, compared to other products worldwide. Including this in the query is not essential.

[sql]
# Change this line to refer to your own GA360 dataset:
WITH source AS (SELECT * FROM “bigquery-public-data.google_analytics_sample.ga_sessions_*`,

sku AS (
SELECT
CONCAT(fullVisitorId, ‘.’, CAST(visitId AS STRING)) as sessionId,
p.productSku as sku,
h.eCommerceAction.action_type

FROM Source INNER JOIN UNNEST (hits) AS h INNER JOIN h.product as p
where h.eCommerceAction.action_type = ‘6’ # https://support.google.com/analytics/answer/3437719?hl=en action_type == 6 is a purchase
),

sessionsWithTransactions AS (
SELECT
CONCAT(fullVisitorId, ‘.’, CAST(visitId AS STRING)) as sessionId,
fullVisitorId,
visitNumber,
CAST (totals.totalTransactionRevenue AS FLOAT64)/1000000 AS revenue
FROM source
WHERE
totals.transactions > 0
),

orderedTransactions AS (
SELECT
visitNumber,
fullVisitorId,
sessionId,
revenue,
RANK() OVER (PARTITION BY fullVisitorId ORDER BY visitNumber) as rank
FROM sessionsWithTransactions
),

transactingSessions AS (
SELECT
visitNumber,
fullVisitorId,
sessionId,
revenue,
rank = 1 as isFirstTransaction
FROM orderedTransactions
),

ltv AS (
SELECT
fullVisitorid,
SUM(revenue) AS value

FROM transactingSessions WHERE isFirstTransaction = false
GROUP BY fullVisitorId

## New stuff starts here
########################

skuToVisitor AS (
SELECT
sku.sku,
fullVisitorId

FROM
transactingSessions
INNER JOIN
sku on sku.sessionId = transactingSessions.sessionId

WHERE transactingSessions.isFirstTransaction = TRUE
),

globalStats AS (
SELECT
AVG(ltv.value) AS ltvGlobalMean,
COUNT(1) AS ltvGlobalCount,
STDDEV(ltv.value) AS ltvGlobalStdDeviation
FROM ltv
),

additionalProductDetail AS (
SELECT
p.productSku as sku,
ANY_VALUE(p.v2ProductName) AS v2ProductName,
ANY_VALUE(p.v2ProductCategory) AS v2ProductCategory

FROM Source INNER JOIN UNNEST (hits) AS h INNER JOIN h.product as p

GROUP BY 1
),

finalResult AS (
SELECT
skuToVisitor.sku AS entrySku,
AVG(ltv.value) AS ltvMeanForSku,
COUNT(1) AS ltvCountForSku,
STDDEV(ltv.value) AS ltvStdDeviationForSku

FROM skuToVisitor
INNER JOIN ltv ON ltv.fullVisitorId = skuToVisitor.fullVisitorId

GROUP BY 1
)

SELECT
entrySku,
v2ProductName,
v2ProductCategory,
ltvMeanForSku,
ltvCountForSku,
ltvStdDeviationForSku,
ltvGlobalMean,
ltvGlobalCount,
ltvGlobalStdDeviation
FROM
finalResult
INNER JOIN additionalProductDetail on additionalProductDetail.sku = finalResult.entrySku
CROSS JOIN globalStats
ORDER BY ltvMeanForSku DESC
[/sql]

You should receive data which looks like the following:

entrySku	v2Product Name	vsProduct Category	ltvMean ForSku	ltvCount ForSku	ltvStdDeviation forSku	ltvGlobal Mean	LtvGloBal Count	ltdGlobal StdDeviation
064435	HERO EMBROIDERED SWEAT	Tops	15336.064379	5	32504.365766	179.073483	124226	373.043939
063761	PETAL SLEEVE TEE	Tops	5210.292667	15	18885.019969	179.073483	124226	373.043939
061874	CLEO COTTON BARDOT TOP	sale-tops	3765.860000	1	NaN	179.073483	124226	373.043939
063214	OLVIA OPEN 2 PART COURT	(not set)	3613.370000	1	NaN	179.073483	124226	373.043939

Considerations

While this analysis is useful in calculating a life-time value for a product, there are a few considerations that need to be taken into account when interpreting the results.

For one, the analysis measures a correlation in historic data so cannot be used to attribute cause. This means that, even if product A is shown to produce valuable customers, creating more ads and product listings for product A doesn’t necessarily mean those who convert will go on to become high value customers.

Another key consideration is that this analysis includes recent customer acquisitions, which may involve customers who may still be in the return period and yet to make a return. As we are not measuring the difference in LTV between groups based on initial product purchased (rather than seeking to measure LTV in isolation), this bias is consistent throughout all calculations and can be ignored as long as there is a stable product set.

We are also unable to see how likely the average LTV by sku is likely to vary by chance alone. This means the data could be skewed if, for example, your highest value customer is the only customer who bought product A in their first transaction. To overcome this, you will need to check sample size and carry out more statistical tests.

A third flaw with this dataset is that we do not know if new customers are likely to return in the future; this should average out over time, however you should avoid reaching conclusions if a new product has low or high life-time values.

Want to get the most out of your data? Speak to our data and analytics team and find out how we can help.

Can we use a customer’s first product purchase to calculate their life-time value?

Building the query

Step 1 – extracting SKUs

The data returned should look like the following:

Step 2 – extracting transaction sessions

You will receive data in the following format:

Step 3 – calculating the LTV

This will return the following information:

Step 4 – pulling everything together

You should receive data which looks like the following:

Considerations

Search Laboratory’s evolving relationship with machine learning and AI

Collecting, injecting, modelling, and activating data in GA4

Why digital marketers should unify online and offline customer data

Can we use a customer’s first product purchase to calculate their life-time value?

Building the query

Step 1 – extracting SKUs

The data returned should look like the following:

Step 2 – extracting transaction sessions

You will receive data in the following format:

Step 3 – calculating the LTV

This will return the following information:

Step 4 – pulling everything together

You should receive data which looks like the following:

Considerations

Related insights

Search Laboratory’s evolving relationship with machine learning and AI

Collecting, injecting, modelling, and activating data in GA4

Why digital marketers should unify online and offline customer data