A Dataset for Studying Gender Bias in Translation

Posted by Romina Stella, Product Manager, Google Translate

Advances on neural machine translation (NMT) have enabled more natural and fluid translations, but they still can reflect the societal biases and stereotypes of the data they’re trained on. As such, it is an ongoing goal at Google to develop innovative techniques to reduce gender bias in machine translation, in alignment with our AI Principles.

One research area has been using context from surrounding sentences or passages to improve gender accuracy – this is a challenge because traditional NMT methods translate sentences individually, but gendered information is not always explicitly stated in each individual sentence. For example, in the following passage in Spanish (a language where subjects aren’t always explicitly mentioned), the first sentence refers explicitly to Marie Curie as the subject, but the second one doesn’t explicitly mention the subject. In isolation, this second sentence could refer to a person of any gender. When translating to English, however, a pronoun needs to be picked, and the information needed for an accurate translation is in the first sentence.

Spanish Text Translation to English Marie Curie nació en Varsovia. Fue la primera persona en recibir dos premios Nobel en distintas especialidades. Marie Curie was born in Warsaw. She was the first person to receive two Nobel Prizes in different specialties.

Advancing translation techniques beyond single sentences requires new metrics for measuring progress and new datasets with the most common context-related errors. Adding to this challenge is the fact that translation errors related to gender (such as picking the correct pronoun or having gender agreement) are particularly sensitive because they may directly refer to people and how they self identify.

To help facilitate progress against the common challenges on contextual translation (e.g., pronoun drop, gender agreement and accurate possessives), we are releasing the Translated Wikipedia Biographies dataset, which can be used

This article is purposely trimmed, please visit the source to read the full article.

The post A Dataset for Studying Gender Bias in Translation appeared first on Google AI Blog.

This post was originally published on this site