Dr. Owns

January 16, 2025

Geo-randomization & how you might approach a problem with no historical data

One of my favorite things to talk to other data scientists or product leaders about is experiments.

  • A lot of experiments fail. Sometimes an idea works at one company and fails for another.
  • Sometimes you run an experiment and find out later the data isn’t capable of correctly answering the questions you have. 😢
  • But when an experiment works out, it’s a fun story. I’d like to share one of those stories about how we used a geo-randomized test to trace out how many “premium” jobs our job board can handle.

Background: I work at a company called IntelyCare. We help connect nurses with various work opportunities (full-time, part-time, contracts, per-diem… the whole menu).

  • One way we help nurses find their next job is a job board, which we launched in late 2022. (Take a look if you want. intelycare.com/jobs).
  • Things change quickly around here, but if you go the our job board in the year 2024, you’ll notice two possible ways of sorting jobs, by date and by relevance.

Why it matters: The sort-by-relevance feature is our current best lever to guarantee a good experience for paying customers. It also gives us an opportunity to improve the overall efficiency of our job board, which is key to attracting future paid listings.

  • Unfortunately, we can’t put every job at the top of a search result. This means we face a tradeoff between the quantity of premier listings and the quality of the experience in the form of increased applicants.

How it works: Relevance here doesn’t mean what it normally means. Sorry.

  • We give each job a score between 0 and 100. When filling a page with jobs, sorting by relevance means we sort the results by that score. That’s it! For brevity, we’ll say any job with a score higher than the default (0) is “boosted.”
  • I know what you’re thinking, “This isn’t relevance!” And you’re right, at least in the normal sense of the word. The score doesn’t vary across job-seekers or search terms. A better name would be “relevant to Google.” We’re OK with that because a huge share our job-board traffic comes from Google, as shown below. It is the one job seeker we care about above all others.
“Sort by relevance” here is shorthand for “relevant to Google.” (image by author)

In Math: We have N jobs. Every day we generate a vector of N integers between 0 and 100. We feed this vector it into a black box that’s mostly just Google. If we do a good job, the black box rewards us with many job applications.

By putting the “right” jobs at the top of the page (loaded word there), we can improve upon a chronological sort. Before we can identify the right jobs, we need to know whether Google actually rewards higher-placed jobs and, if so, by how much.

Day 0: Making progress when you know nothing

Sometimes, just to justify all the simplifying assumptions I’m going to make later, I start a project by writing down the math equation I’d like to solve. I imagine ours looks something like this:

  • S is our vector of relevancy scores. There are N jobs, so each s_i (an element of S) corresponds to a different job. A function called applies turns S into a scalar. Each day we’d like to find the the S that makes that number as large as possible — the relevancy scores that generate the greatest number of job applications for intelycare.com/jobs.
  • applies is a fine objective function on Day 0. Later on our objective function could change (e.g. revenue, lifetime value). Applies are easy to count, though, and lets me spend my complexity tokens elsewhere. It’s Day 0. We’ll come back to these questions on Day 1.
  • Problem. We know nothing about the applies function until we start feeding it relevancy scores. 😱

First things first: Seeing that we know nothing about the applies function, our first question is, “how do we choose an ongoing wave of daily S vectors so we can learn what the applies function looks like?”

  • We know (1) which jobs are boosted and when, (2) how many applies each job receives each day. Note the absence of page-load data. It’s Day 0! You might not have all the data you want on Day 0, but if we’re clever we can make do with what we have.
  • Note the subtle change in our objective. Earlier our goal was to accomplish some business objective (maximize applies), and eventually we’ll come back to that goal. We’ve taken off the business hat for a minute and put on our science hat. Our only goal now is to learn something. If we can learn something, we can use it (later) to help achieve some business objective.🤓
  • Since our goal is to learn something, above all we want to avoid learning nothing. Remember it’s Day 0 and we have no guarantee that the Google Monster will pay any attention to how we sort things. We may as well go for broke and make sure this thing even works before throwing more time at improving it.

How do we choose an initial wave of daily S vectors? We’ll give every job a score of 0 (default score), and choose a random subset of jobs to boost to 100.

  • Maybe I’m stating the obvious, but it has to be random if you want to isolate the effect of page-position on job applications. We want the only difference between boosted jobs and other jobs to be their relative ordering on the page as determined by our relevance scores. [I can’t tell you how many phone screens I’ve conducted where a candidate doubled down on running an A/B test with the good customers in one group and the bad customers in the other group. In fairness, I’ve also vetted marketing vendors who do the same thing 😭].
  • The randomness will be nice later on for other reasons. It’s likely that some jobs benefit from page-placement more than others. We’ll have an easier time identifying those jobs with a big, randomly-generated dataset.

The Plan: Subtle but Important Details

We know we can’t boost every job. Anytime I put a job at the top of the page, I bump all other jobs down the page (classic example of a “spillover”).

  • The spillover gets worse as I boost more and more jobs, I impose a greater and greater punishment on all other jobs by pushing them down in the sort (including other boosted jobs).
  • With little exception, nursing jobs are in-person and local, so any boosting spillovers will be limited to other nearby jobs. This is important.

How do we choose an initial wave of daily S vectors? (final answer) We’ll give every job a score of 0 (default score), and choose a random subset of jobs to boost to 100. The size of the random subset will vary across geographies.

  • We create 4 groups of distinct geographies with roughly the same amount of web traffic in each group. Each group is balanced along the key dimensions we think are important. We randomly boost a different percentage of jobs in each group.

Here’s how it looked…

Daily Applies for boosted vs unboosted Jobs. Note how boosted jobs do better when there are fewer of them. (image by author)
  • Each black circle represents a different geography. Its elevation shows the difference in applies-per-job between boosted jobs and all other jobs (measured as a percent).
  • While groups are balanced in aggregate, the individual geographies vary considerably. The balance is still important though. Otherwise, what you see in the chart could be an artifact of the mix of urban/rural or large/small geographies in each group. As it is, we’re confident the results come from our relevancy scores.
  • A brazen interpretation of this chart is something like, “the 5% of jobs at the top of the page have ~26% more applies per day than the 95% of jobs placed below. The 10% of jobs at the top of the page have ~21% more applies per day than the 90% of jobs underneath…” and so on. I would never be so bold as to say that in real life, but in a perfect-experiment world it would be true.
  • By the time we boost 25% of jobs, the boost experience is entirely averaged out! We diluted the perks of premium placement to practically nothing for the median geography. “And when everyone is super, no one will be! <evil laugh>.” Can you imagine learning this the hard way?
  • There are many other layers to peel back. Perhaps dilution happens more quickly for nursing specialties with many pages of listings? What about states that overlap with our long-standing per-diem staffing business? Many fine questions, we have answers for some, but all more than I can include in this post.

What comes next? Day 1 is when the real fun begins! 🎉

  • We now have guardrails against diluting our premium experience (super important), but what is the best ~10% of jobs to boost each day? Obviously our paying customers have priority, but then what?
  • Does boost help some jobs more than others? The randomly-generated data from our experiment is well suited to answer this and many other questions. We’ll save those questions for future posts.
  • Once we have a strategy for boosting, is our objective really to maximize the total number of applies? Or do we only care about the applies for boosted jobs? 🤔 (Sometimes I miss the Day 0 days when all the jobs were equally relevant. Might be time to revisit those equations at the top of the post.)

Key Takeaways for Those Who Made It This Far

  • By being thoughtful about how we generated our initial data, we quickly found a convincing answer to our question, set ourselves up to answer many future questions, and saved ourselves a ton of time trying to build an uplift model on non-existent historical data.
  • For a I while I worked with Sean Taylor on product questions like these. His advice was almost always some variation of, “just try it and use the right kind of random assignment.” If you execute well, you can see the results clearly in a chart and avoid all the statistics (obligatory xkcd reference). [hmm, maybe *most* of the statistics. I still love a good regression table.]
  • Spillovers are everywhere. Sometimes varying the treatment across an aggregated group can help like it did here. That can quickly axe your sample-size, but I find it better to have a small data set with meaning than a big data set that’s hot garbage.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Bonus: We ran this experiment in 2023. How are things now?

At the time of our little geo-randomized experiment, you see in the charts that our premium job openings performed ~25% better than regular jobs (meaning they had 25% more applies on average).

Why it matters: We’ve since over a year to grow and iterate our product to ensure our premium listings deliver the best possible experience. Looking at some recent numbers… (literally running the queries as I write this)

  • Boosted job openings receive 425% more applies than regular openings
  • Boosted jobs are 450% more likely to have receive at least one apply compared to regular openings

Not bad! This isn’t randomized, so that 425% includes all sorts of selection bias, additional product work, a crack SEO team, and a successful email operation, all in addition to the incremental effects from premium page position. Importantly, all the extra product and marketing work is focused on a small number of jobs as our initial testing recommends. 🏆


How We Optimized Premium Listings on Our Nursing Job Board was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

​Geo-randomization & how you might approach a problem with no historical dataOne of my favorite things to talk to other data scientists or product leaders about is experiments.A lot of experiments fail. Sometimes an idea works at one company and fails for another.Sometimes you run an experiment and find out later the data isn’t capable of correctly answering the questions you have. 😢But when an experiment works out, it’s a fun story. I’d like to share one of those stories about how we used a geo-randomized test to trace out how many “premium” jobs our job board can handle.Background: I work at a company called IntelyCare. We help connect nurses with various work opportunities (full-time, part-time, contracts, per-diem… the whole menu).One way we help nurses find their next job is a job board, which we launched in late 2022. (Take a look if you want. intelycare.com/jobs).Things change quickly around here, but if you go the our job board in the year 2024, you’ll notice two possible ways of sorting jobs, by date and by relevance.Why it matters: The sort-by-relevance feature is our current best lever to guarantee a good experience for paying customers. It also gives us an opportunity to improve the overall efficiency of our job board, which is key to attracting future paid listings.Unfortunately, we can’t put every job at the top of a search result. This means we face a tradeoff between the quantity of premier listings and the quality of the experience in the form of increased applicants.How it works: Relevance here doesn’t mean what it normally means. Sorry.We give each job a score between 0 and 100. When filling a page with jobs, sorting by relevance means we sort the results by that score. That’s it! For brevity, we’ll say any job with a score higher than the default (0) is “boosted.”I know what you’re thinking, “This isn’t relevance!” And you’re right, at least in the normal sense of the word. The score doesn’t vary across job-seekers or search terms. A better name would be “relevant to Google.” We’re OK with that because a huge share our job-board traffic comes from Google, as shown below. It is the one job seeker we care about above all others.“Sort by relevance” here is shorthand for “relevant to Google.” (image by author)In Math: We have N jobs. Every day we generate a vector of N integers between 0 and 100. We feed this vector it into a black box that’s mostly just Google. If we do a good job, the black box rewards us with many job applications.By putting the “right” jobs at the top of the page (loaded word there), we can improve upon a chronological sort. Before we can identify the right jobs, we need to know whether Google actually rewards higher-placed jobs and, if so, by how much.Day 0: Making progress when you know nothingSometimes, just to justify all the simplifying assumptions I’m going to make later, I start a project by writing down the math equation I’d like to solve. I imagine ours looks something like this:S is our vector of relevancy scores. There are N jobs, so each s_i (an element of S) corresponds to a different job. A function called applies turns S into a scalar. Each day we’d like to find the the S that makes that number as large as possible — the relevancy scores that generate the greatest number of job applications for intelycare.com/jobs.applies is a fine objective function on Day 0. Later on our objective function could change (e.g. revenue, lifetime value). Applies are easy to count, though, and lets me spend my complexity tokens elsewhere. It’s Day 0. We’ll come back to these questions on Day 1.Problem. We know nothing about the applies function until we start feeding it relevancy scores. 😱First things first: Seeing that we know nothing about the applies function, our first question is, “how do we choose an ongoing wave of daily S vectors so we can learn what the applies function looks like?”We know (1) which jobs are boosted and when, (2) how many applies each job receives each day. Note the absence of page-load data. It’s Day 0! You might not have all the data you want on Day 0, but if we’re clever we can make do with what we have.Note the subtle change in our objective. Earlier our goal was to accomplish some business objective (maximize applies), and eventually we’ll come back to that goal. We’ve taken off the business hat for a minute and put on our science hat. Our only goal now is to learn something. If we can learn something, we can use it (later) to help achieve some business objective.🤓Since our goal is to learn something, above all we want to avoid learning nothing. Remember it’s Day 0 and we have no guarantee that the Google Monster will pay any attention to how we sort things. We may as well go for broke and make sure this thing even works before throwing more time at improving it.How do we choose an initial wave of daily S vectors? We’ll give every job a score of 0 (default score), and choose a random subset of jobs to boost to 100.Maybe I’m stating the obvious, but it has to be random if you want to isolate the effect of page-position on job applications. We want the only difference between boosted jobs and other jobs to be their relative ordering on the page as determined by our relevance scores. [I can’t tell you how many phone screens I’ve conducted where a candidate doubled down on running an A/B test with the good customers in one group and the bad customers in the other group. In fairness, I’ve also vetted marketing vendors who do the same thing 😭].The randomness will be nice later on for other reasons. It’s likely that some jobs benefit from page-placement more than others. We’ll have an easier time identifying those jobs with a big, randomly-generated dataset.The Plan: Subtle but Important DetailsWe know we can’t boost every job. Anytime I put a job at the top of the page, I bump all other jobs down the page (classic example of a “spillover”).The spillover gets worse as I boost more and more jobs, I impose a greater and greater punishment on all other jobs by pushing them down in the sort (including other boosted jobs).With little exception, nursing jobs are in-person and local, so any boosting spillovers will be limited to other nearby jobs. This is important.How do we choose an initial wave of daily S vectors? (final answer) We’ll give every job a score of 0 (default score), and choose a random subset of jobs to boost to 100. The size of the random subset will vary across geographies.We create 4 groups of distinct geographies with roughly the same amount of web traffic in each group. Each group is balanced along the key dimensions we think are important. We randomly boost a different percentage of jobs in each group.Here’s how it looked…Daily Applies for boosted vs unboosted Jobs. Note how boosted jobs do better when there are fewer of them. (image by author)Each black circle represents a different geography. Its elevation shows the difference in applies-per-job between boosted jobs and all other jobs (measured as a percent).While groups are balanced in aggregate, the individual geographies vary considerably. The balance is still important though. Otherwise, what you see in the chart could be an artifact of the mix of urban/rural or large/small geographies in each group. As it is, we’re confident the results come from our relevancy scores.A brazen interpretation of this chart is something like, “the 5% of jobs at the top of the page have ~26% more applies per day than the 95% of jobs placed below. The 10% of jobs at the top of the page have ~21% more applies per day than the 90% of jobs underneath…” and so on. I would never be so bold as to say that in real life, but in a perfect-experiment world it would be true.By the time we boost 25% of jobs, the boost experience is entirely averaged out! We diluted the perks of premium placement to practically nothing for the median geography. “And when everyone is super, no one will be! <evil laugh>.” Can you imagine learning this the hard way?There are many other layers to peel back. Perhaps dilution happens more quickly for nursing specialties with many pages of listings? What about states that overlap with our long-standing per-diem staffing business? Many fine questions, we have answers for some, but all more than I can include in this post.What comes next? Day 1 is when the real fun begins! 🎉We now have guardrails against diluting our premium experience (super important), but what is the best ~10% of jobs to boost each day? Obviously our paying customers have priority, but then what?Does boost help some jobs more than others? The randomly-generated data from our experiment is well suited to answer this and many other questions. We’ll save those questions for future posts.Once we have a strategy for boosting, is our objective really to maximize the total number of applies? Or do we only care about the applies for boosted jobs? 🤔 (Sometimes I miss the Day 0 days when all the jobs were equally relevant. Might be time to revisit those equations at the top of the post.)Key Takeaways for Those Who Made It This FarBy being thoughtful about how we generated our initial data, we quickly found a convincing answer to our question, set ourselves up to answer many future questions, and saved ourselves a ton of time trying to build an uplift model on non-existent historical data.For a I while I worked with Sean Taylor on product questions like these. His advice was almost always some variation of, “just try it and use the right kind of random assignment.” If you execute well, you can see the results clearly in a chart and avoid all the statistics (obligatory xkcd reference). [hmm, maybe *most* of the statistics. I still love a good regression table.]Spillovers are everywhere. Sometimes varying the treatment across an aggregated group can help like it did here. That can quickly axe your sample-size, but I find it better to have a small data set with meaning than a big data set that’s hot garbage.— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —Bonus: We ran this experiment in 2023. How are things now?At the time of our little geo-randomized experiment, you see in the charts that our premium job openings performed ~25% better than regular jobs (meaning they had 25% more applies on average).Why it matters: We’ve since over a year to grow and iterate our product to ensure our premium listings deliver the best possible experience. Looking at some recent numbers… (literally running the queries as I write this)Boosted job openings receive 425% more applies than regular openingsBoosted jobs are 450% more likely to have receive at least one apply compared to regular openingsNot bad! This isn’t randomized, so that 425% includes all sorts of selection bias, additional product work, a crack SEO team, and a successful email operation, all in addition to the incremental effects from premium page position. Importantly, all the extra product and marketing work is focused on a small number of jobs as our initial testing recommends. 🏆How We Optimized Premium Listings on Our Nursing Job Board was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.  experiment, nursing, causal-inference, job-board-software, data-science Towards Data Science – MediumRead More

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

FavoriteLoadingAdd to favorites

Dr. Owns

January 16, 2025

Recent Posts

0 Comments

Submit a Comment