Big Data as Gold. A Bocconi Professor Contributes to Faster SimulatorsSIMULATORS ARE INCREASINGLY CRUCIAL IN DECISION MAKING, BUT THEY CAN BE TIME AND MEMORY CONSUMING, HENCE THEY ARE OFTEN SUBSTITUTED BY EMULATORS. EMANUELE BORGONOVO HAS DEVELOPED A FASTER EMULATION ALGORITHM IMPROVING A TECHNIQUE PIONEERED BY A SOUTH AFRICAN STATISTICIAN AND MINING ENGINEER
Continuous advancements in computing capabilities have made it possible to simulate complex systems even on a laptop. These computer experiments can definitely help decision makers, but are often demanding, both in running time and in required memory, which limits their applicability. One option is to substitute these time-consuming simulators with faster emulators, among which one of the most used is kriging. Emanuele Borgonovo, Full Professor at Bocconi Department of Decision Sciences, has developed a new kriging algorithm that significantly reduces running time and memory usage without sacrificing accuracy. This allows to apply kriging to more complex models and train it on larger datasets.
Kriging takes its name from D. G. Krige (1919-2013), a South African statistician and mining engineer who pioneered this technique since his Master’s thesis. Krige was trying to estimate the most likely distribution of gold based on samples from a few boreholes within the Witwatersrand reef complex in South Africa. Kriging, which is also known as Gaussian process regression among statisticians, allows to perform such an interpolation, not only predicting values at unobserved locations but also quantifying the prediction uncertainty. The underlying assumptions of Gaussianity allows to derive closed-form expressions for both the prediction expectation and uncertainty. This has contributed to the success of kriging in various fields including spatial statistics and computer experiments.
However, such closed-form expressions require the inversion of a matrix with n rows and n columns, where n is the number of training points, and the computational cost of this operation rapidly grows with n. One way to reduce the computational cost is by using a suitably selected subset of the training set and making the prediction locally dependent on the closest training points. In particular, Borgonovo and his co-authors– Xuefei Lu (PhD in Statistics at Bocconi in 2019 and now Assistant Professor at the University of Edinburgh), Alessandro Rudi and Lorenzo Rosasco – resorted to Nyström regularization, a method which selects a number m of Nyström centers (usually a subset of the training set), with m much smaller than the size n of the whole training set. They also quantify the discrepancy between the classical kriging estimator and their proposed approximation. Such a discrepancy decays as m grows, while running time and memory requirements grow with m, thus highlighting the trade-off between accuracy and computational gains. The provided analytical expressions for the approximation error are key in allowing the analyst to choose the desired balance between accuracy and available computational resources.
«Kriging is a method with a long history», says Borgonovo, «but it is still a lively research topic, standing at the crossroads of computer experiments and statistical machine learning, two rapidly growing research fields whose advances are starting to cross the disciplinary barriers. Leveraging such cross-fertilization, we have proposed a new kriging algorithm that reduces running time and memory requirements, thus allowing to handle much bigger training sets and more complex models. Before this work, the state of the art for kriging was datasets of less than 100 points and models with less than 100 variables, while we were successful with a dataset of thousands of points and models with 40,000 variables. This is a considerable step forward in the use of kriging for big data».
Xuefei Lu, Alessandro Rudi, Emanuele Borgonovo, Lorenzo Rosasco (2020) “Faster Kriging: Facing High-Dimensional Simulators”. Operations Research 68(1):233-249.
by Sirio Legramanti
Translated by Alex Foti