Finding the Perfect Number
NEWS |

Finding the Perfect Number

FRANCESCO GROSSETTI HAS RELEASED A COMPUTER PACKAGE CAPABLE OF FINDING OUT THE NUMBER OF TOPICS THAT BEST DESCRIBES THE CONTENT OF CORPUSES OF TEXTS TOO LARGE TO BE MANAGED

If you were tasked with determining what a corpus of 200,000 pages of documents is about, you would face two challenges. The first, of course, is to identify the topics covered in a number of pages difficult to manage manually. The second is to decide how many topics to identify in order to give an answer that is neither reductive (it is unlikely, for example, that three topics would give a minimally exhaustive idea of the topics covered in 200,000 pages), nor unmanageable (with 3,000 topics we would probably be exhaustive, but difficult to interpret).
 
One of the best solutions to the problem of topic identification is the Latent Dirichlet Allocation (LDA) technique, developed in 2003. Based on it, Francesco Grossetti (Department of Accounting) and Craig Lewis (Vanderbilt University) now propose a solution to the identification of the optimal number of topics through a scientific paper (“A Statistical Approach for Optimal Topic Model Identification”, preprint) and OpTop, a package that implements the methodology.
 
“What we present,” Grossetti says, “is a statistical test, which works irrespective of the context and meaning of topics. In technical terms, each topic is an ordered collection of all the words contained in the corpus, whose order represents their importance within a particular topic. It’s up to the researcher who uses this tool to interpret the answers, assigning a label to each topic and choosing to merge topics that are very close in meaning, if appropriate.”
 
For his part, Grossetti has already made use of the technique - and the consequent use of interpretive judgment - in a paper on financial disclosure, which identifies the risk factors made explicit by companies in their financial statements.
 


by Fabio Todesco
Bocconi Knowledge newsletter

People

  • Daniele Durante Wins Award for Young Researchers

    A rare distinction for an academic outside America  

  • Peter Pope's Career Celebrated

    EAA's most prestigious award honors the Bocconi academic's research achievements  

Seminars

  March 2024  
Mon Tue Wed Thu Fri Sat Sun
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Seminars

  • Neuroscience and Machine Learning seminar with Alex Cayco Gajic

    ALEX CAYCO GAJIC - ENS Paris

    Room 3-E4-SR03

  • Taha Choukhmane, MIT: What Drives Investors' Portfolio Choices? Separating Risk Preferences from Frictions

    TAHA CHOUKHMANE - MIT

    Seminar Room 2-e4-sr03 - Via Roentgen, 1