DECISION SCIENCES |

# Opening the Black Box

BOCCONI PROFESSORS SANDRA FORTINI AND SONIA PETRONE PROPOSE A NEW METHODOLOGY TO SHED LIGHT ON BLACK BOX ALGORITHMS BY USING THE PREDICTIVE APPROACH THAT IS TYPICAL OF BAYESIAN STATISTICS

The recent hype about machine learning has been accompanied by some calls to caution from statisticians. In fact, machine learning algorithms are often described as “black boxes”, meaning that the mechanism by which their output is obtained from the input is not transparent. Moreover, such output is typically returned without any quantification of the uncertainty about it. On the contrary, transparency and uncertainty quantification are among the flagship features of statistical procedures. However, machine learning has showcased some remarkable performances (especially on big, complex and streaming data) which cannot be overlooked.

Another difference among machine learning and statistics is that machine learning focuses on prediction, while statistics is typically more interested in inferring the parameters of the assumed probabilistic model. Actually, this is an incomplete picture of statistics. In fact, the predictive approach is shared by Bayesian statistics. Named after Thomas Bayes and his celebrated theorem, this statistical approach combines prior information with data in order to provide not only “posterior” inference on the model parameters, but also effective prediction, both accompanied by a principled uncertainty quantification.

In a recent paper (see below), Sandra Fortini and Sonia Petrone, both Professors at the Bocconi Department of Decision Sciences, employed Bayesian statistics to investigate the functioning of Newton’s algorithm. The latter is a recursive procedure to classify streaming observations into different “populations” (e.g. pattern types, or signal sources), with no feedback about the correctness of previous classiﬁcations (in this sense, the classification task is said to be “unsupervised”). The success of this algorithm is due to the possibility of applying it recursively, “re-using” previous computations whenever a new observation is available. This is crucial for streaming data (data that are continuously generated). Before the work of Fortini and Petrone, it was not clear whether the efficient Newton’s algorithm was an approximation of an exact (but more computationally expensive) rigorous Bayesian procedure.

"As is often the case," explains Fortini, "this algorithm does not explicitly state a probabilistic model. However, since it relies on a predictive rule for the next observation, the tools of Bayesian statistics can be used to unveil the underlying model that is implicitly assumed. This use of the Bayesian predictive approach is not limited to Newton’s algorithm. On the contrary, it can be extended to any algorithm that relies on a predictive rule."

"This line of research," adds Petrone, "demonstrates that the Bayesian predictive approach is more than a philosophical choice. It can concretely help to shed light on algorithms whose functioning would otherwise remain obscure. And again, this is not just a scientific curiosity, because when forecasts are needed to support decisions on matters of life or death (as, for example, in the recent pandemic) we cannot trust algorithms on blind faith. Combining the speed of algorithms with the principled uncertainty quantification of Bayesian statistics can take the best of these two worlds."

Find out more

Fortini, S. and Petrone, S. (2020). “Quasi-Bayes properties of a procedure for sequential learning in mixture models.JRSS, Series B, 82, 1087–1114. DOI: https://doi.org/10.1111/rssb.12385.

Breiman, L. (2001). “Statistical modeling: The two cultures (with comments and a rejoinder by the author).” Statistical Science, 16(3), 199-231, DOI: https://doi.org/10.1214/ss/1009213726.

by Sirio Legramanti

### News

• #### Monitor for Circular Fashion Launches Eight Pilot Tests

In the 2022 report of the observatory by SDA Bocconi School of Management and powered by Enel X, the analysis of sustainability indicators in the textile sector is applied to eight innovative prototypes made by partner companies. From organic cotton jeans to a tshirt that can be repaired, a bag that can be recycled, and a shoe that can be sewn at home

• #### Cybersecuring a Country… with a Podcast

Greta Nasi hosts a series of talks about how modern states aim to protect individuals, firms and society against an invisible enemy

### Seminars

November 2022
Mon Tue Wed Thu Fri Sat Sun
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30

### Seminars

• #### Martin Oehmke, London School of Economics: Green Capital Requirements

MARTIN OEHMKE - London School of Economics

Seminar Room 2-e4-sr03 - Via Roentgen, 1

• #### Leonardo Bursztyn: Justifying Dissent

LEONARDO BURSZTYN - The University of Chicago

Alberto Alesina Seminar Room 5.e4.sr04, floor 5, Via Roentgen 1