When Machines Learn Prejudices
NEWS |

When Machines Learn Prejudices

WHEN CALLED UPON TO COMPLETE NEUTRAL SENTENCES, POPULAR LANGUAGE MODELS MOST OFTEN USE HURTFUL WORDS IF THE SUBJECT IS A WOMAN RATHER THAN A MAN, AND EVEN MORE SO IF THE SUBJECT IS LGBTQIA+

Three researchers from Bocconi Department of Computing Sciences have demonstrated the existence of a strong bias that penalizes the LGBTQIA+ community in the world's most widely used and most powerful language model (BERT), used by the scientific community to develop countless language-related machine learning tools.
 
When asked to complete a neutral sentence, the BERT language model most often completes it with hurtful words if the subject is a woman rather than a man, and even more so (up to 87% of cases for terms related to certain queer identities) if the subject is LGBTQIA+.
 
Between 2018 and 2019, the world of Natural Language Processing (NLP) was transformed by Google's development of a new language model, BERT. Language models are used by machines to understand natural language like humans do, and BERT has achieved great results from the outset. It is precisely thanks to BERT that Google is able to infer from the context what we mean by a certain word. When we type in “spring” for example, Google comes up with images of both metal coils and flowering landscapes, but if we type in “bed spring” it shows us only metal coils and if we type in “spring nature” only landscapes.
 
One of the methods used to train language models is “masked language modeling”: a sentence with a missing term is fed into the system and the model is asked to enter the most likely term, repeating the exercise until predictions are accurate.
 
Link to related stories. Image: rainbow colors. Story headline: Pride: STEM Disciplines Fight Algorithmic Bias Link to related stories. Image: two schwa. Story headline: How to Make Language Technologies More Inclusive Link to related stories. Image: a hooded person and symbols recalling cyber bullying. Story headline: Machines Get It Wrong: How to Avoid that Woman and Gay Are Mistaken as Bad Words Link to related stories. Image: a gavel on a computer. Story headline: How to Protect User Rights Against an Algorithm

Debora Nozza, Federico Bianchi and Dirk Hovy of Bocconi's Department of Computing Sciences asked BERT to carry out a similar exercise (complete a few sentences, written in six different languages) to develop a measure of the probability of returns with hurtful language (HONEST - Measuring Hurtful Sentence Completion in Language Models) and test whether there is a bias that penalizes women or the LGBTQIA+ community.
 
“We have observed a disturbing percentage of bias,” Nozza says. 4% of male-subject sentences and 9% of female-subject sentences are completed with expressions referring to the sexual sphere. If a sentence is related in any way to queer identities, the percentage is even higher: depending on the term, hurtful completions appear an average of 13% of times, and up to 87%.
 
“The phenomenon of offensive completions affects all kinds of identities,” Nozza concludes, “but in the case of non-queer identities insults are mostly generic, for queer identities they are, in most cases, about the sexual sphere.”
 

by Fabio Todesco
Bocconi Knowledge newsletter

People

  • Kapacinskaite Nominated Among Top 5 for Two Dissertation Awards at AOM

    The Academy of Management leads the discussion on the world's most prominent organizational and management issues  

  • Catherine De Vries in the 50 Influential Researchers List by Apolitical Foundation

    A list of scholars from around the world whose research could help cultivate reflective, representative, and informed politicians  

Seminars

  August 2022  
Mon Tue Wed Thu Fri Sat Sun
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        

Seminars

  • ELLIS@Milan Artificial Intelligence workshop

    GABOR LUGOSI - Department of Economics, Pompeu Fabra University
    RICARDO BAEZA-YATES - Khoury College of Computer Sciences Northeastern University
    NOAM NISAN - School of Computer Science and Engineering, Hebrew University of Jerusalem
    MICHAL VALKO - Institut national de recherche en sciences et technologies du numérique

    AS02 DEUTSCHE BANK - Roentgen building

  • tbd

    ANDREW KING - Questrom School of Business

    Meeting room 4E4SR03 (Roentgen) 4