Which Outcome Matters?
Here is a common scenario : An A/B test was conducted, where a random sample of units (e.g. customers) were selected for a campaign and they received Treatment A. Another sample was selected to receive Treatment B. “A” could be a communication or offer and “B” could be no communication or no offer. “A” could be 10% off and “B” could be 20% off. Two groups, two different treatments, where A and B are two discrete treatments, but without loss of generality to greater than 2 treatments and continuous treatments.
So, the campaign runs and results are made available. With our backend system, we can track which of these units took the action of interest (e.g. made a purchase) and which did not. Further, for those that did, we log the intensity of that action. A common scenario is that we can track purchase amounts for those that purchased. This is often called an average order amount or revenue per buyer metric. Or a hundred different names that all mean the same thing — for those that purchased, how much did they spend, on average?
For some use-cases, the marketer is interested in the former metric — the purchase rate. For example, did we drive more (potentially first time) buyers in our acquisition campaign with Treatment A or B? Sometimes, we are interested in driving the revenue per buyer higher so we put emphasis on the latter.
More often though, we are interested in driving revenue in a cost effective manner and what we really care about is the revenue that the campaign produced overall. Did treatment A or B drive more revenue? We don’t always have balanced sample sizes (perhaps due to cost or risk avoidance) and so we divide the measured revenue by the number of candidates that were treated in each group (call these counts N_A and N_B). We want to compare this measure between the two groups, so the standard contrast is simply:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_NmHjbi_qPoTaRbMODRe30w.webp)
This is just the mean revenue for Treatment A minus mean revenue for Treatment B, where that mean is taken over the entire set of targeted units, irrespective if they responded or not. Its interpretation is likewise straightforward — what is the average revenue per promoted unit increase going from Treatment A versus Treatment B?
Of course, this last measure accounts for both of the prior: the response rate multiplied by the mean revenue per responder.
Uncertainty?
How much a buyer spends is highly variable and a couple large purchases in one treatment group or the other can skew the mean significantly. Likewise, sample variation can be significant. So, we want to understand how confident we are in this comparison of means and quantify the “significance” of the observed difference.
So, you throw the data in a t-test and stare at the p-value. But wait! Unfortunately for the marketer, the vast majority of the time, the purchase rate is relatively low (sometimes VERY low) and hence there are a lot of zero revenue values — often the vast majority. The t-test assumptions may be badly violated. Very large sample sizes may come to the rescue, but there is a more principled way to analyze this data that is useful in multiple ways, that will be explained.
Example Dataset
Lets start with the sample dataset to makes things practical. One of my favorite direct marketing datasets is from the KDD Cup 98.
url = 'https://kdd.ics.uci.edu/databases/kddcup98/epsilon_mirror/cup98lrn.zip'
filename = 'cup98LRN.txt'
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()
pdf_data = pd.read_csv(filename, sep=',')
pdf_data = pdf_data.query('TARGET_D >=0')
pdf_data['TREATMENT'] = np.where(pdf_data.RFA_2F >1,'A','B')
pdf_data['TREATED'] = np.where(pdf_data.RFA_2F >1,1,0)
pdf_data['GT_0'] = np.where(pdf_data.TARGET_D >0,1,0)
pdf_data = pdf_data[['TREATMENT', 'TREATED', 'GT_0', 'TARGET_D']]
In the code snippet above we are downloading a zip file (the learning dataset specifically), extracting it and reading it into a Pandas data frame. The nature of this dataset is campaign history from a non-profit organization that was seeking donations through direct mailings. There is no treatment variants within this dataset, so we are pretending instead and segmenting the dataset based on the frequency of past donations. We call this indicator TREATMENT (as the categorical and create TREATED as the binary indicator for ‘A’ ). Consider this the results of a randomized control trial where a portion of the sample population was treated with an offer and the remainder were not. We track each individual and accumulate the amount of their donation.
So, if we examine this dataset, we see that there are about 95,000 promoted individuals, generally distributed equally across the two treatments:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_XhKq9V5wrHOoNdv-337O6g.webp)
Treatment A has a larger response rate but overall the response rate in the dataset is only around 5%. So, we have 95% zeros.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_d4Rey_jyrOFQFdQCRqTLgw.webp)
For those that donated, Treatment A appears to be associated with a lower average donation amount.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_oV16Oan-_R3-ZBrk5h-R2g.webp)
Combining together everyone that was targeted, Treatment A appears to be associated with a higher average donation amount — the higher response rate outweighs the lower donation amount for responders— but not by much.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_uPBb_wAAR2TLlHlraK-WwQ.webp)
Finally, the histogram of the donation amount is shown here, pooled over both treatments, which illustrates the mass at zero and a right skew.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_3o4B59kt_EOoY7ESCPGLBQ.webp)
A numerical summary of the two treatment groups quantifies the phenomenon observed above — while Treatment A appears to have driven significantly higher response, those that were treated with A donated less on average when they responded. The net of these two measures, the one we are ultimately after — the overall mean donation per targeted unit – appears to still be higher for Treatment A. How confident we are in that finding is the subject of this analysis.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_5ubGZIXHhdFAekgvoK4IUA.webp)
Gamma Hurdle
One way to model this data and answer our research question in terms of the difference between the two treatments in generating the average donation per targeted unit is with the Gamma Hurdle distribution. Similar to the more well known Zero Inflated Poisson (ZIP) or NB (ZINB) distribution, this is a mixture distribution where one part pertains to the mass at zero and the other, in the cases where the random variable is positive, the gamma density function.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_h0sDO1A0w0QgnmTzKIp5uQ.webp)
Here π represents the probability that the random variable y is > 0. In other words its the probability of the gamma process. Likewise, (1- π) is the probability that the random variable is zero. In terms of our problem, this pertains to the probability that a donation is made and if so, it’s value.
Lets start with the component parts of using this distribution in a regression – logistic and gamma regression.
Logistic Regression
The logit function is the link function here, relating the log odds to the linear combination of our predictor variables, which with a single variable such as our binary treatment indicator, looks like:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_qkP9KSclpft2fxW438VfcA.webp)
Where π represents the probability that the outcome is a “positive” (denoted as 1) event such as a purchase and (1-π) represents the probability that the outcome is a “negative” (denoted as 0) event. Further, π which is the qty of interest above, is defined by the inverse logit function:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_kLFG7jGpUEZOHqbp-DKFJA.webp)
Fitting this model is very simple, we need to find the values of the two betas that maximize the likelihood of the data (the outcome y)— which assuming N iid observations is:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_iQBYBYICkhBKmuC0OCcilg.webp)
We could use any of multiple libraries to quickly fit this model but will demonstrate PYMC as the means to build a simple Bayesian logistic regression.
Without any of the normal steps of the Bayesian workflow, we fit this simple model using MCMC.
import pymc as pm
import arviz as az
from scipy.special import expit
with pm.Model() as logistic_model:
# noninformative priors
intercept = pm.Normal('intercept', 0, sigma=10)
beta_treat = pm.Normal('beta_treat', 0, sigma=10)
# linear combination of the treated variable
# through the inverse logit to squish the linear predictor between 0 and 1
p = pm.invlogit(intercept + beta_treat * pdf_data.TREATED)
# Individual level binary variable (respond or not)
pm.Bernoulli(name='logit', p=p, observed=pdf_data.GT_0)
idata = pm.sample(nuts_sampler="numpyro")
az.summary(idata, var_names=['intercept', 'beta_treat'])
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_PCma1nanJ8DhgDOzwG-zjA.webp)
If we construct a contrast of the two treatment mean response rates, we find that as expected, the mean response rate lift for Treatment A is 0.026 larger than Treatment B with a 94% credible interval of (0.024 , 0.029).
# create a new column in the posterior which contrasts Treatment A - B
idata.posterior['TREATMENT A - TREATMENT B'] = expit(idata.posterior.intercept + idata.posterior.beta_treat) - expit(idata.posterior.intercept)
az.plot_posterior(
idata,
var_names=['TREATMENT A - TREATMENT B']
)
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_w8gMLIEOtSjShj8Z39vVtg.webp)
Gamma Regression
The next component is the gamma distribution with one of it’s parametrizations of it’s probability density function, as shown above:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_VQCD_xGfUOsWpygyORrDtw.webp)
This distribution is defined for strictly positive random variables and if used in business for values such as costs, customer demand spending and insurance claim amounts.
Since the mean and variance of gamma are defined in terms of α and β according to the formulas:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_MzdTGteHRdwkAiVJcWDsbg.webp)
for gamma regression, we can parameterize by α and β or by μ and σ. If we make μ defined as a linear combination of predictor variables, then we can define gamma in terms of α and β using μ:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_AN3LxtD93LctbhEdNrXHVA.webp)
The gamma regression model assumes (in this case, the inverse link is another common option) the log link which is intended to “linearize” the relationship between predictor and outcome:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_zHxn8BgnahZoguvgfN7KUA.webp)
Following almost exactly the same methodology as for the response rate, we limit the dataset to only responders and fit the gamma regression using PYMC.
with pm.Model() as gamma_model:
# noninformative priors
intercept = pm.Normal('intercept', 0, sigma=10)
beta_treat = pm.Normal('beta_treat', 0, sigma=10)
shape = pm.HalfNormal('shape', 5)
# linear combination of the treated variable
# through the exp to ensure the linear predictor is positive
mu = pm.Deterministic('mu',pm.math.exp(intercept + beta_treat * pdf_responders.TREATED))
# Individual level binary variable (respond or not)
pm.Gamma(name='gamma', alpha = shape, beta = shape/mu, observed=pdf_responders.TARGET_D)
idata = pm.sample(nuts_sampler="numpyro")
az.summary(idata, var_names=['intercept', 'beta_treat'])
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_vSZpBr3JIRb2nA3902nSxA.webp)
# create a new column in the posterior which contrasts Treatment A - B
idata.posterior['TREATMENT A - TREATMENT B'] = np.exp(idata.posterior.intercept + idata.posterior.beta_treat) - np.exp(idata.posterior.intercept)
az.plot_posterior(
idata,
var_names=['TREATMENT A - TREATMENT B']
)
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_UDfCK0GRXrj8cDpo4gubOg.webp)
Again, as expected, we see the mean lift for Treatment A to have an expected value equal to the sample value of -7.8. The 94% credible interval is (-8.3, -7.3).
The components, response rate and average amount per responder shown above are about as simple as we can get. But, its a straight forward extension to add additional predictors in order to 1) estimate the Conditional Average Treatment Effects (CATE) when we expect the treatment effect to differ by segment or 2) reduce the variance of the average treatment effect estimate by conditioning on pre-treatment variables.
Hurdle Model (Gamma) Regression
At this point, it should be pretty straightforward to see where we are progressing. For the hurdle model, we have a conditional likelihood, depending on if the specific observation is 0 or greater than zero, as shown above for the gamma hurdle distribution. We can fit the two component models (logistic and gamma regression) simultaneously. We get for free, their product, which in our example is an estimate of the donation amount per targeted unit.
It would not be difficult to fit this model with using a likelihood function with a switch statement depending on the value of the outcome variable, but PYMC has this distribution already encoded for us.
import pymc as pm
import arviz as az
with pm.Model() as hurdle_model:
## noninformative priors ##
# logistic
intercept_lr = pm.Normal('intercept_lr', 0, sigma=5)
beta_treat_lr = pm.Normal('beta_treat_lr', 0, sigma=1)
# gamma
intercept_gr = pm.Normal('intercept_gr', 0, sigma=5)
beta_treat_gr = pm.Normal('beta_treat_gr', 0, sigma=1)
# alpha
shape = pm.HalfNormal('shape', 1)
## mean functions of predictors ##
p = pm.Deterministic('p', pm.invlogit(intercept_lr + beta_treat_lr * pdf_data.TREATED))
mu = pm.Deterministic('mu',pm.math.exp(intercept_gr + beta_treat_gr * pdf_data.TREATED))
## likliehood ##
# psi is pi
pm.HurdleGamma(name='hurdlegamma', psi=p, alpha = shape, beta = shape/mu, observed=pdf_data.TARGET_D)
idata = pm.sample(cores = 10)
If we examine the trace summary, we see that the results are exactly the same for the two component models.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_jirR0FKbDagR9LAY3l8OTw.webp)
As noted, the mean of the gamma hurdle distribution is π * μ so we can create a contrast:
# create a new column in the posterior which contrasts Treatment A - B
idata.posterior['TREATMENT A - TREATMENT B'] = ((expit(idata.posterior.intercept_lr + idata.posterior.beta_treat_lr))* np.exp(idata.posterior.intercept_gr + idata.posterior.beta_treat_gr)) -
((expit(idata.posterior.intercept_lr))* np.exp(idata.posterior.intercept_gr))
az.plot_posterior(
idata,
var_names=['TREATMENT A - TREATMENT B']
The mean expected value of this model is 0.043 with a 94% credible interval of (-0.0069, 0.092). We could interrogate the posterior to see what proportion of times the donation per buyer is predicted to be higher for Treatment A and any other decision functions that made sense for our case — including adding a fuller P&L to the estimate (i.e. including margins and cost).
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_ne_kjD6As2ja2eiXh4Pc2g.webp)
Notes: Some implementations parameterize the gamma hurdle model differently where the probability of zero is π and hence the mean of the gamma hurdle involves (1-π) instead. Also note that at the time of this writing there appears to be an issue with the nuts samplers in PYMC and we had to fall back on the default python implementation for running the above code.
Summary
With this approach, we get the same inference for both models separately and the extra benefit of the third metric. Fitting these models with PYMC allows us all the benefits of Bayesian analysis — including injection of prior domain knowledge and a full posterior to answer questions and quantify uncertainty!
Credits:
- All images are the authors, unless otherwise noted.
- The dataset used is from the KDD 98 Cup sponsored by Epsilon. https://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html (CC BY 4.0)
The post The Gamma Hurdle Distribution appeared first on Towards Data Science.
Which Outcome Matters? Here is a common scenario : An A/B test was conducted, where a random sample of units (e.g. customers) were selected for a campaign and they received Treatment A. Another sample was selected to receive Treatment B. “A” could be a communication or offer and “B” could be no communication or no
The post The Gamma Hurdle Distribution appeared first on Towards Data Science. Data Science, A B Testing, Deep Dives, Marketing Analytics, Statistics, Uncertainity Towards Data ScienceRead More
![Favorite Favorite](https://dsssolutions.com/wp-content/plugins/wp-favorite-posts/img/star.png)
![Loading Loading](https://dsssolutions.com/wp-content/plugins/wp-favorite-posts/img/loading.gif)
0 Comments