Improving Accuracy, Reliability, and Interpretability of Distributed Computing
DECISION SCIENCES |

Improving Accuracy, Reliability, and Interpretability of Distributed Computing

RESEARCH BY BOTOND SZABO IS AIMED AT GIVING THEORETICAL UNDERPINNINGS TO THE BLACK BOX PROCEDURES USED TO COPE WITH THE UNPRECEDENTED EXPLOSION OF AVAILABLE DATA

A new study by Botond Szabo (Bocconi Department of Decision Sciences) lays the cornerstone for more accurate, reliable, and interpretable distributed computing methods.
 
In the world of big data, when the need arises to estimate many parameters in very complex statistical models which make use of large amounts of available information, computation time becomes unsustainable even with the fastest supercomputers. One of the strategies developed to cope with the issue is distributed (or parallel) computing. Data (or tasks, in some cases) are divided among many machines and only summary information (the results of computations) is sent to a central location, say a meteorological station, an astronomy observatory, or a traffic control system. This method also mitigates privacy concerns since most data don’t have to be moved around.
 
Infographic by Weiwei Chen

In any case, even communicating only summary information between servers can be costly, sostatisticians have borrowed from electric engineers the idea of bandwidth limitation. “The goal,” says Professor Szabo, “is to minimize the flow of data, losing as little information as possible. Furthermore, parallel computing is often a black-box procedure, i.e. a procedure which transforms inputs into outputs in not-well-understood ways, and this makes results neither completely interpretable, nor reliable. Finding mathematical models which give theoretical underpinnings to such procedures would be desirable.” Prof. Szabo is the recipient of a ERC Grant aimed at tackling such issues.
 
In his paper with Lasse Vuursteen (Delft University of Technology) and Harry van Zanten (Vrije Universiteit Amsterdam), Prof. Szabo derives the best tests to minimize the loss of information in a distributed framework where the data is split over multiple machines and their communication to a central machine is limited to a given quantity of bits.
 
In Statistics, a test is a procedure that determines whether a hypothesis about a parameter is true and how much you can rely on this result. In other words, it quantifies uncertainty. When we read that a hypothesis is “not statistically significant”, it means that no evidence was found in the data to support the hypothesis.
 
“The tests we develop in the paper allow us to achieve the highest accuracy for a given amount of transmitted information or the minimum amount of information to be transmitted for a needed level of accuracy,” explains Prof. Szabo.
 
The paper is a foundational work, using an idealized mathematical case, but Prof. Szabo is already working on more complex settings. “In the long-term,” he says, “we could hopefully obtain more efficient communication algorithms, underpinned by theoretical guarantees.”
 
Botond Szabo, Lasse Vuursteen, Harry Van Zanten, “Optimal High-Dimensional and Nonparametric Distributed Testing Under Communication Constraints.” Annals of Statistics, 51(3): 909-934 (June 2023). DOI: https://dx.doi.org/10.1214/23-AOS2269.

by Fabio Todesco
Bocconi Knowledge newsletter

News

  • A Doctorate to Succeed in Research

    The Bocconi PhD School presentation event introduced by Rector and Dean  

  • LEAP Offers MSc Interns a Taste of What Research Means

    Students are recruited by the development economics laboratory twice a year to work with faculty on ongoing projects, with responsibilities in program design, implementation, monitoring, and evaluation  

Seminars

  November 2023  
Mon Tue Wed Thu Fri Sat Sun
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Seminars

  • Daniele Durante - Skew-symmetric approximations of posterior distributions

    DANIELE DURANTE - Università Bocconi

    Room 3-E4-SR03 (Roentgen)

  • Edouard Schaal - Political Preferences and the Spatial Distribution of Infrastructure: Evidence from California's High-Speed Rail

    EDOUARD SCHAAL - CREI

    Alberto Alesina Seminar Room 5.e4.sr04, floor 5, Via Roentgen 1