Finding the Perfect Number
NEWS |

Finding the Perfect Number

FRANCESCO GROSSETTI HAS RELEASED A COMPUTER PACKAGE CAPABLE OF FINDING OUT THE NUMBER OF TOPICS THAT BEST DESCRIBES THE CONTENT OF CORPUSES OF TEXTS TOO LARGE TO BE MANAGED

If you were tasked with determining what a corpus of 200,000 pages of documents is about, you would face two challenges. The first, of course, is to identify the topics covered in a number of pages difficult to manage manually. The second is to decide how many topics to identify in order to give an answer that is neither reductive (it is unlikely, for example, that three topics would give a minimally exhaustive idea of the topics covered in 200,000 pages), nor unmanageable (with 3,000 topics we would probably be exhaustive, but difficult to interpret).
 
One of the best solutions to the problem of topic identification is the Latent Dirichlet Allocation (LDA) technique, developed in 2003. Based on it, Francesco Grossetti (Department of Accounting) and Craig Lewis (Vanderbilt University) now propose a solution to the identification of the optimal number of topics through a scientific paper (“A Statistical Approach for Optimal Topic Model Identification”, preprint) and OpTop, a package that implements the methodology.
 
“What we present,” Grossetti says, “is a statistical test, which works irrespective of the context and meaning of topics. In technical terms, each topic is an ordered collection of all the words contained in the corpus, whose order represents their importance within a particular topic. It’s up to the researcher who uses this tool to interpret the answers, assigning a label to each topic and choosing to merge topics that are very close in meaning, if appropriate.”
 
For his part, Grossetti has already made use of the technique - and the consequent use of interpretive judgment - in a paper on financial disclosure, which identifies the risk factors made explicit by companies in their financial statements.
 


by Fabio Todesco
Bocconi Knowledge newsletter

People

  • Andrea Colli Illustrates the Economy of The Betrothed

    On the occasion of a Piccolo Teatro and Intesa Sanpaolo project, the historian explains the context in which the first draft of Alessandro Manzoni's novel was set  

  • Chiara Mosca Member of CONSOB

    Today the inaugural meeting was held. The Bocconi jurist was appointed in August by a decree of the President of the Republic  

Seminars

  September 2021  
Mon Tue Wed Thu Fri Sat Sun
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Seminars

  • Going Digital to Conform and Perform: Learning Logics Underpinning Digital Advertising Spending

    Seoyoung KIM, University of Georgia   Job Market Seminar For information please contact dip.mkt@unibocconi.it

    Webinar

  • Explaining Greenium in a Macro-Finance Integrated Assessment Model

    Biao Yang, Bocconi University Practice for Job Market

    Room 7, Via Bocconi 8