Estimating shelter data: The importance of knowing what we don’t know
Last month, we published our 2025 national shelter data report. We’ve published these reports annually for 10 years now. Naturally, the focus is — as it should be — on the animals. What’s the save rate? Are adoptions keeping up with shelter admissions?
Although it’s easy to get lost in the numbers, it’s important to remember that behind the accounting are the lives of millions of sentient beings. That’s why we put so much effort into getting the numbers right. And every year, we’re doing a better job of it.
Reporting the previous year’s data — for nearly 4,000 U.S. shelters — in April is an enormous undertaking. And this quick turnaround is the source of one of the most common questions we receive: Do you really have data from every shelter in the country?
Not exactly. We can’t wait until we’ve collected data from every last shelter. At some point, we report what we know based on the data we do have — and estimate the rest.
This isn’t unique to shelter data. Using estimates to replace missing data is done all the time in a variety of contexts.
When the U.S. Census Bureau “counts” the number of people living in the country every 10 years, they collect as much data as possible via questionnaires and door-knocking. But they don’t hear from every resident; as a result, their population figures are estimates.
In this way, the census is like most surveys: The results are based, to some degree, on incomplete data. This doesn’t mean the results are inaccurate; their accuracy is dependent on how well survey participants represent the general population and how well missing data can be estimated.
Imagine if you wanted to calculate the pet ownership rate for the state of Utah. If you surveyed only the residents of Kanab — where Best Friends Animal Sanctuary is located — your estimate for the state would be too high. Pet ownership in Kanab simply isn’t an accurate reflection of pet ownership across the entire state.
Now, if you expanded your survey into the rest of Kane County, your estimate would be a little more accurate. That’s because the people you’re contacting are a better reflection of Utah residents more generally. Put another way, the high rate of pet ownership in Kanab is offset by the lower rate in the rest of the county. But to really know what Utah’s pet ownership rate is, you’d need to expand your survey much farther.
Would you need to survey every resident? No. In fact, you wouldn’t even need to survey every county if you had enough information to accurately estimate pet ownership rates in the counties omitted from your survey.
What kind of information would help estimate pet ownership? Well, one factor to consider is whether people with pets live in more urban or rural communities. Dog ownership rates, in particular, are much higher among rural residents. Home ownership is another factor. Data from the U.S. Census Bureau suggests that 56.6% of homeowners have pets whereas only 37.1% of renters do.
With enough information, you could predict — with good accuracy, too — the pet ownership rates in the parts of Utah you didn’t survey. Combine these results with those from the parts of the state you have surveyed, and you could be confident that the resulting pet ownership rate was reasonably accurate.
This is essentially what we do with shelter data. Of the roughly 3,900 brick-and-mortar shelters in the country, we obtain data from more than 80%. The missing data is then estimated based on factors we know are good predictors of a shelter’s data. Maybe we have the previous year’s data from the shelter; that’s obviously helpful. Among the other factors we consider is whether the shelter is municipal or private, as well as the population of the county where the shelter is located.
Although there’s no substitute for having a shelter’s data, we know from testing that our estimations have proven impressively accurate. To validate our model, we trained it on 80% of the data we collected and then tested how well that model did predicting the values for the remaining 20% of the data. (Sound familiar? This is how AI programs are “trained.”)
Examining shelter admissions, for example, we found a very close match — a correlation coefficient of 0.93. In technical terms, that means that 86% of the variation in the estimated data could be explained by the variation in the collected data. (In the world of predictive modeling, this is very impressive, indeed!)
In less technical terms, our estimates are more than sufficient to supplement the small share of data missing from the overall U.S. shelter dataset. This isn’t just our opinion, either; we had the process reviewed by two experts, and they agreed.
And this is a point worth emphasizing: Although the estimations are for shelter-level data, they are used only to fill in the missing pieces of the larger picture. We incorporate the estimates when we report aggregated data — shelter admissions for the entire country, for example, or adoptions in a particular state. But we don’t report estimated data for a particular shelter; at the shelter level, the only data we report is data we’ve collected.
To bring this back to the pet ownership example, estimates would be used in reporting Utah’s pet ownership rate. Anything more granular than that would be dicey. Imagine, for example, reporting pet ownership rates for individual households you didn’t survey. That just doesn’t make sense.
To be clear: We’d rather have data from every shelter in the country — and we’re making progress toward that goal. In the meantime, though, our estimates are more than sufficient to guide our work.