We see the world not as it is, but as we are. In the domain of data, economists need to rethink what information they use to portray ground truth, and to reimagine what truth they wish to record. The field suffers from a “déformation professionelle,” viewing the economy through the lens of a “small data” world they have long known. But in a “big data” universe—when the variety, frequency, and granularity of data sources (and features to measure) are vastly more numerous—a new mindset is required.
To get a flavor of what such a collision looks like between more information and traditional thinking, consider a bit of history from the field of health care.
In 1990 General Electric released an update to the software for its Signa magnetic resonance imaging (MRI) machines, used for medical scans. Engineers had uncovered a flaw in the system that compressed how it showed tissue containing lipids, or fat. But when the more accurate images became available, many radiologists rebelled. They were unaccustomed to seeing the better scans and felt more comfortable assessing the older ones. There were fears of misdiagnoses owing to new images. GE was forced to add a feature to the MRI machines that let radiologists see the old scans—labeled “classic,” in a nod and a wink to the debacle over the launch of “new Coke” a few years earlier.
An MRI scan is pictorial, informational. It’s not the thing itself. In this way, it’s a bit like economic data, such as growth, unemployment, inflation, and the like. The radiologists in the 1990s preferred the information that was less accurate because they had become accustomed to using compressed scans; their skills were largely honed to work within those constraints. They resisted better images. Is there a risk that today’s economists are vulnerable to the same mental trap?
Galaxy of data
Consider the galaxy of data and AI all around us today, and how novel it is. A quarter-century ago, most things in life did not have a computer chip or connect to a network. It was a bygone age of letters, subway tokens, travel alarm clocks, and credit card transactions that required a signature on a carbon paper form after going through an imprinter, known as a zip-zap machine. Your sleep and exercise weren’t tracked by your wristwatch. Your cordless phone didn’t recognize your face; your bank didn’t verify your voice signature. Cars without satnav systems meant drivers relied on badly refolded maps. Don’t be wistful: The point is that the digitalization of society means that activities that could never be easily rendered into data now are.
This offers the possibility to understand the economy in ways that are more accurate, a better reflection of ground truth, the actual thing being measured. Reporting can happen much faster, perhaps in quasi real time, and in ways that are more granular, down to small segments or even individuals, which older methods were incapable of—instead compressing information like a pre-1990 MRI scan. Accuracy, speed, and details improve. Moreover, what gets measured can itself change, leading to new ways to understand the world (and by doing so, hopefully improve it).
Yet the entities compiling the information will come from the private sector, since it is generating the data in its operations. For example, satellite imagery can track farm yields. Job posting sites can identify which urban areas are growing faster than others, while home sale sites can show which are in decline. In many instances, firms find themselves in the middle of data flows from others’ operations. The payroll processor ADP handles one in six US workers: Its monthly jobs report is used by economists to supplement data from the US Bureau of Labor Statistics.
Alternative indicators
Such alternative indicators (or “alt-data”) may not be compiled using the academically rigorous methods of state statistical agencies. Harnessing the data will require a shift in thinking by today’s practitioners—who may need to reconceive their responsibility, from generating information to working with the private sector to bolster and validate the data’s integrity so that it can be used for broader purposes. It is an echo of the field’s origins.
The term statistics derives from the German “Statistik,” coined in the mid-1700s to mean the “science of the state.” Such metrics may be based on inference: generalizing from what is easily measurable to reach conclusions about what is hard to learn. Because it was often expensive or impossible to count the things themselves, the accepted practice was to find proxies and extrapolate. This approach characterized stats’ earliest days. “The city of Dublin in Ireland appears to have more chimneys than Bristol, and consequently more people,” wrote William Petty at the start of an essay on “political arithmetick” in the 1680s to estimate populations.
Today, developed economies spend billions of dollars a year to produce reliable economic and social indicators. To the high priests and priestesses of official metrics, it is a holy calling, a mark of civilization. “Knowledge is power: Statistics is democracy,” famously stated Olavi Niitamo, who led Statistics Finland from 1979 to 1992.
Data is only a simulacrum of what it aims to quantify, qualify, and record. It is an abstraction, never the thing itself, just as a map is not territory and a weather simulation won’t get you wet. Data contains an “information quotient” of what it depicts. As the world changes, so too must the statistics with which social scientists take the measure of man. Despite worldly philosophers embracing more serious methods to establish a dismal science, informal proxies and extrapolations are still used.
Anecdata
Alan Greenspan, the Federal Reserve chairman from 1987 to 2006, is infamous for embracing “anecdata”—a cross between anecdote and data—to get a leg up on official indicators. As a young economist, among the data he scrutinized were sales of men’s underwear. In his thinking, it is an economic bellwether: the sort of thing people cut back on when belts tighten.
His successors at the Fed followed his lead. At the start of the financial crisis in 2008, just days after Lehman Brothers’ collapse, Janet Yellen, then president of the San Francisco Federal Reserve Bank, warned of a nasty economic downturn during a Federal Open Market Committee meeting. “East Bay plastic surgeons and dentists note that patients are deferring elective procedures,” she reported, according to transcripts released five years later. “Reservations are no longer necessary at many high-end restaurants.” Her colleagues laughed.
How did the statistical agency do? In the fourth quarter of 2008, the first figure released for the US was a decline in GDP of 3.8 percent. That was quickly revised a month later to a drop of 6.2 percent. In the final revision, in July 2011, it was recalculated as having fallen by 8.9 percent—the largest downward revision of GDP on record, and more than twice as bad as first reported. Perhaps alternative indicators would have helped.
The new data sources might have done a faster and better job than existing indicators, and with more detail. For example, ADP, the payroll firm, could have spotted a decline in new employees and a slowdown in pay raises. Google searches related to home purchases may have slowed precipitously. Likewise, professional job listing sites like LinkedIn and Indeed have a lens on recruitment ads—not only those that are posted, but those that are pulled. (That data is used by investors since it’s an early predictor of business wobbles and analyst downgrades, and thus stock prices.)