It’s Time for Data Science to be Simple, Fast and Easily Understandable for Everyone.

I am a passionate listener of scientific podcasts. Since the beginning of the Covid-19 pandemic, my podcast consumption has nearly doubled. One of my favorite podcasts is econtalk.org, organized by the fabulous Russ Roberts from Stanford, and one of my favorite econ-talk episodes is the one with the psychologist, data scientist, and author Gerd Gigerenzer.

Dr. Gigerenzer, who is a former Director of the Center for Adaptive Behavior and Cognition (ABC) at the Max Planck Institute for Human Development and a former Professor of Psychology at the University of Chicago, studies how humans make decisions in complex environments. His main idea is that any decision-making procedure should be simple enough to be interpretable by the decision-maker.

I agree 100 percent, and data science’s failure to adhere to the Gigerenzer Principle is what has retarded our contributions in so many instances.

Intuition is our strongest guide in a world full of measurements, and the purpose of any statistical tool is to only enhance our intuition. It appears to me that many of my fellows in the data science profession disregard this basic truth. We develop models ever-growing in complexity, often justified only by the race to produce more complex models. Software makers respond by developing more and more versatile software programs, able to incorporate thousands of models and model options. At the same time, the ordinary users of statistical software struggle to navigate through this sea of models. And if they are lucky enough to find what they want, they have to make a substantial effort to actually run the model. Learning how to load the data, choosing the right model option to get the heteroscedasticity-robust standard errors, or simply extracting the desired information from the code’s output – all this requires understanding the sometimes rather queer logic of the software developer. After 15 years of experience with coding and debugging in 5 different statistical software programs, I know what I am talking about.

With this background, it came to me as a pleasant surprise when recently a data scientist approached me with three screenshots of code outputs: one produced with R, one with Stata and one produced with Proof Analytics’ new platform. All three outputs came out as a result from the same statistical procedure run on the same dataset. In particular, this was an evaluation of the impact of a variety of firm-specific factors – such as the Click-Through-Rate – on total sales of the firm.

This type of exercise is at the core of the Marketing Mix Modeling (MMM) analysis, a major use case for Proof’s software. Here I must say that, to my perception, the average firm has a hard time performing timely high quality MMM analysis because of the lengthy process of the update and because of the large complexity behind the statistical procedure itself. At the same time, real time analysis is crucial for a quick and flexible marketing strategy. Therefore, any reduction of the complexity of the evaluation is potentially of enormous value to the user.

I had a careful look at the screenshots. The outputs were identical: the estimated coefficients (multipliers, in the MMM language) were identical, the underlying uncertainty estimates (the volatility) were identical too, and so were all other model diagnostics indicators such as R2 and the F-test stat. This is not particularly surprising and only implies that all three software outputs are based on the same (correctly specified) statistical algorithm. What really struck me, however, is that while all three software tools can be employed to solve the same MMM task, Proof’s software tremendously reduces the complexity for the analyst and the business user. No cumbersome data import, no statistical language programming needed to get you what you want, no degree in fine arts necessary to produce an elegant and parsimonious graph. With only several clicks all calculations are performed and the user gets a comprehensive and yet intuitive summary of the results. Importantly, the complex steps of model choice are performed automatically in the background. Thus, the use is “spared” from the tedious technical details of the algorithm, and only the aspects that enhance the business intuition are presented.

Is Proof’s tool the perfect tool for any data scientist? Proof’s strategy has been to balance between convenience and flexibility. Some might find it restrictive because it focuses on a relatively focused number of statistical techniques while options like Stata, R and Python allow you to implement thousands of approaches. However, any simple-to-use tool must constrain its users in some ways – that’s the essence of simplicity. And if the seeming endless complexity of data science – the how many angels can dance on the head of a pin argument – blocks us from effectively helping the audience of decision-makers, then I’d say over-weaning complexity does not advance the profession. 

In closing, I would argue that constraining complexity is in fact a big advantage, meaning that platforms like Proof are a huge step towards making data science far, far more intuitive, cost-effective,  operationally relevant, and mainstream in its value and impact.

Dr. Petyo Bonev is a professor of applied and theoretical econometrics and AI at the University of St. Gallen in Switzerland.