The Streetlight Effect and What You “Know That Ain’t So”
Anyone in the analytics of investments or economics knows The Big Short which quotes a Tolstoy line:
“…the simplest thing cannot be made clear to the most intelligent man if he is firmly persuaded that he knows already, without a shadow of a doubt, what is laid before him.”
That’s in the book. The movie uses a folksy version, attributed to others, including Mark Twain, and perhaps most famously to Will Rogers:
“It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.”
We ignore this too often in analytics.
Moneyball and The Streetlight Effect
Another analytics movie is Moneyball. No other story has done more to move number crunching into popular culture. CEOs across the world want to emulate Billy Bean, using analytics for competitive advantage.
Sadly, Moneyball fails to teach important lessons about data quality. Characters in the movie argue about which numbers matter most in baseball. They don’t debate quality. Baseball data is high-quality stuff. We have detailed records of every play, and every player for thousands of games going back decades in Major League Baseball.
This is not so elsewhere. “Where can I get data?” is the most common question for most analytics projects in business and academia. Data in the Big Short was far from the quality data Billy Bean used. Wall Street quants used bad data the wrong way. Sadly, this is not rare.
This is due to the “Streetlight Effect” named for a joke about what is easy vs. what is right. The joke goes like this…
A cop finds a drunk under a streetlight looking for his keys. The cop wants to find the keys too, to keep the drunk from driving. So he joins the search. After several futile minutes of searching, the officer asks…
Cop: ”Are you sure this is where you dropped the keys?”
Drunk: “No, I dropped them down the street.”
Cop: “Then why are we looking here?”
Drunk: “The light is better!”
Seduced by easy to obtain (but bad) data, we live in an age of bad analytics, based on things which are simply untrue.
China and the Virus
Two current examples of easy data which “Just ain’t so” are Covid-19 data, and nearly anything officially reported by China. Of course, these topics are connected.
An example of silly data from China was recently reported by the Wall Street Journal. 57% of Chinese corporate debt is Triple-A according to Chinese credit-raters. To put this in context, only two U.S. firms have been able to obtain this rating. Exactly zero Chinese firms are rated Triple-A by S&P Global Ratings. An interesting coincidence; most high ratings are awarded to state-owned enterprises and well-connected large firms, favored by the Chinese Communist Party.
We’ve written before about epidemic data quality issues. But these problems don’t stop researchers from using their favorite neural network, or other fancy technique. The results are usually meaningless. They used data which either didn’t mean anything or meant something other than what they assumed.
The confluence of these topics is Chinese reporting on the Wuhan Virus. Early in the pandemic, a respected epidemiologist said, “no responsible researcher is using China’s numbers.” He was right of course, but irresponsible researchers and media reports are using Chinese “facts which ain’t so.”
Lone Star’s Automated Intelligent Analytics Solutions
You might be thinking that only humans fall into this trap. But sadly, we find the Streetlight Effect hard at work in the Industrial Internet of Things, or Industry 4.0.
It’s nearly impossible to avoid seeing frequent stories about the vast zettabytes of data supposedly streaming from machines. The implication is that soon we will enjoy deep knowledge of all assets. But reality is much harsher.
Many sensors on machines were installed during production of the asset. Thinking about operational data needs was likely ignored. So, yes, we might be generating a great deal of data, but it may be far from what we want. And, to train a classifier about failures, we have to let many machines fail while we collect information. Your preventative maintenance organization probably thinks this is a bad idea. Even if they agreed, data collected may not be the right data. We may just be under the streetlight again.
Lone Star prefers to use our Hybrid AI methods to generate Digital Twins, and this is particularly powerful for “Failure Twins” needed for condition based maintenance.
TruPredict/ Competitive Analytics & Pricing Solutions
Lone Star’s work in strategic pricing is another place where we see risks from the streetlight effect. Our Competitive Analytics & Pricing Solutions organization has data quality control processes, and data collection mythologies to “search where the keys are” and avoid the Streetlight Effect.
The disciplined processes in our TruPredict pricing software helps users avoid these issues, too.
A key to this work is acknowledging uncertainty. Very often, a vain attempt at perfect estimates in a pricing analysis is fatal. What we know is that we aren’t certain about many things. So “it ain’t so” when we try to pin down a perfect number.
Lone Star provides a range of analytic solutions. Often, we are called on for a second opinion, or to fix a flawed assessment. The Streetlight Effect in all its various forms is among the most common problems we find. It’s been around since Tolstoy and Twain, so that probably won’t change anytime soon.
If you would like to learn more about our solutions, please contact us, to chat.