Data driven modeling has impacted a variety of industries; why hasn’t publishing embraced the trend?
In his 2003 book, Moneyball, Michael Lewis describes the efforts of Oakland Athletics baseball manager Billy Beane to use analytical models to manage the team. Prior to sabermetrics, baseball teams relied on personal experience, knowledge, and instinct from the staff to make decisions. While they took some numbers into account, like batting average and stolen bases, they didn’t do any serious modeling with them. Sabermetrics went deeper into the data, with newer metrics like slugging percentage, and created more comprehensive models around team success, not player success.
This change is nothing compared to the quant revolution on Wall St. Traditionally stockbrokers and analysts would focus on fundamentals, basic metrics like the balance sheet and cash flow. They’d also apply their industry expertise and instinct to know how a company may be impacted by micro- or macro-economic conditions. The well suited white men who ran Wall St. in the twentieth century have been replaced by the quants over the past few decades. Mathematicians and software developers now create complex models using billions of pieces of data to predict what to trade and when.
It’s not just finance and baseball but economics, supply chains, even farming that now use big data and advanced modeling to predict and plan. Netflix famously committed to House of Cards because the data told them it would be a success. (Unfortunately big data can’t predict edge cases like an actor’s unethical behavior.) So why don’t publishers follow suit?
Like the baseball scouts of old, editors and agents are not entirely without data. Publishers can and do use easily accessible data like demographics and social trends. But also like those scouts, the data they have access to and use often aren’t very reliable as noted in this NY Times article, Millions of Followers? For Book Sales, ‘It’s Unreliable.” Both the scouts and editors suffered from the streetlamp effect, using easily accessible data and believing that it gave them more insight than it really does.
Building more accurate models isn’t easy or cheap. Data scientists, the people who build such models, can cost hundreds of thousands of dollars a year. And these models aren’t one and done. Like a game of cat and mouse, as “the market” (other players in the industry, the consumers who react to new products, and macro-social and economic conditions as a whole) changes, the model needs to be continually evolved. The data feeds on which the model works can also cost well into six figures per year or more.
If the model predicts teenage vampires will be hot this summer everyone is going to come out with a teenage vampire book saturating the market (while the highly underrated senior citizen werewolf stories will continue to languish in the slush pile).
Financial service companies can spend tens of millions or more to build such models because being marginally better, even by a few percentage points, can yield hundreds of millions or even billions of dollars. Baseball teams can spend millions of dollars on models; their budgets range from tens of millions to hundreds of millions a year. Do these numbers work for publishing?
When it comes to the big five publishers, their EBITDA is in the hundreds of millions. There’s no question that investing a few million into better models can yield a return. To be fair, they will likely need multiple models. A hedge fund focusing on commodities builds a model for commodities and that model won’t help them trade a stock like Disney, which has little relation to commodities. Likewise, a hedge fund with models for the media sector couldn’t accurately predict the price of tea in China. Diverse publishers may need to model separately summer romance, children’s books, personal finance, sci-fi, etc. Still the economics will likely work.
What may hold them up, however, is tradition. (Hold my red editing pen . . ..)
Who, every day, must pour over manuscripts,
Pick the most promising, pitch the committee?
Who selects the books, gets them across the line,
To give the public what they want!
Editors, Editors! Tradition!
Editors, Editors! Tradition!
Sabermetrics had been around since the 1970’s. It took nearly 30 years before it caught on, and even then, only once the Oakland Athletics had proven that it does work. The rise of the quants on Wall St. also took decades. In both cases there was an old guard who opposed it. Senior acquisition editors took years to reach their roles; now that they’ve made it, asking them to voluntarily step aside and let a computer decide for them isn’t an easy pill to swallow. (Would a computer recognize that we just mixed metaphors and that this prose needs serious work!?)
It’s going to take an outsider to do it, someone who hasn’t made it. Billy Bean may have been in the majors, but his team was at the bottom of the majors. The Yankees didn’t feel compelled to change what worked for them until they saw that they had no choice. If you work at a major publisher, you’re probably not an outsider (unless you’re an up and comer trying to make a name for yourself).
As for small and mid-sized publishers it’s just not cost effective. These are the outsiders; they can’t regularly compete against the big houses and win without an advantage. These outsiders are also the ones who can’t invest six or seven figures. It would take someone like Marc Benioff, the former CEO of Salesforce who acquired publisher Time, to have the foresight and deep enough pockets to commit a mid-sized publisher to that path. The one possible alternative is for a coalition to develop such models. Just as trade groups allow small companies to work together to create a larger reach, so, too, could industry consortiums form allowing their members to pool their resources into a shared model.
You might object, saying that is the very differential of a publisher. What’s the difference between NBC and CBS? The shows on the stations; otherwise, they’re pretty similar from a consumer’s standpoint. Wouldn’t giving them the same models mean they bid for the exact same shows? Knowing the trends and acting on them is very different. Burberry and Versace can see the same trends and still create totally different fashion lines for the same audience. NBC and CBS both know they need a police drama (or six) but there’s no shortage from which to choose. This is different from a hedge fund that doesn’t want another institution to front-run them, or a baseball team trying to recruit the only player in the draft with a certain set of metrics. Two publishers can both be interested in a certain subgenre for a certain demographic, and both find plenty of books in the space. To be clear, in most fields the model doesn’t do all the work, it just advises the expert, in this case the acquisition editors, who make the final call.
The only downside to this is the sameness. If the model predicts teenage vampires will be hot this summer everyone is going to come out with a teenage vampire book saturating the market (while the highly underrated senior citizen werewolf stories will continue to languish in the slush pile). Still, as modeling tools continue to democratize their costs will go down and mid-size publishers can start to deviate from the standard model, bringing back variety.
There’s a tidal wave of data coming in every industry. You can learn to ride the wave or get caught up in the undertow. Moneybooks, books selected for publication based on predictive modeling, are coming to our industry eventually; place your bets on which outsider will be first to market with them.