This is the 20th day of my participation in the First Challenge 2022

Notes sharing, mainly from ng’s deep learning course. 1

Stereotypes exist

Word embedding has an important impact on the generalization of our model, so we also need to ensure that they are not affected by unexpected form bias. Such as sexism, racism, religious discrimination and so on.

Of course I think it’s a bit strong to use the word prompt, but here we can interpret it as a stereotype.

Here’s an example:

My father is a doctor and my mother is _______.

My father is a company employee and my mother is _______.

Boys like _______. Girls like _______.

The first empty one, of course, is probably “nurse”. The second empty answer is probably “housewife”. The third empty answer is probably “Transformers”. The fourth empty answer is probably “Barbie.”

What is this? This is called gender stereotyping. All of these stereotypes are related to socioeconomic status.

There are no stereotypes in learning algorithms, but there are stereotypes in human writing. Word embedding can learn these stereotypes “well”.

Therefore, we need to modify the learning algorithm as much as possible, reduce or idealize as much as possible, and eliminate these unexpected types of bias.

Over many centuries, I think humanity has made progress in reducing these types of bias. And I think maybe fortunately AI, I think we actually have better ideas for quickly reducing the bias in AI than for quickly reducing the bias in the human race. Although I think we are by no means done for AI as well, and there is still a lot of research and hard work to be done to reduce these types of biases in our learning of learning algorithms.


Eliminate word embedding stereotypes

Using a method called arXiv:1607.065202.

It is mainly divided into the following three steps:

  1. Identify bias direction.
  2. Neutralize: For every word that is not definitional, project to get rid of bias.
  3. Equalize pairs.

Suppose that now we have a learned word embedding.

I’m going to stick with what we did before. It takes 300 dimensional features, and we map them to a two-dimensional plane. The distribution of these words on the plane is shown in the figure.

1. To find the direction

In order to find the main direction of stereotype existing between two words, we have mentioned this method once when we talked about the features of Word embedding. You subtract two vectors to find the main dimension of their difference.


e d o c t o r e n u r s e e_{doctor}-e_{nurse}


e b o y e g i r l e_{boy}-e_{girl}


e h e e s h e e_{he}-e_{she}


e g r a n d m o t h e r e g r a n d f a t h e r e_{grandmother}-e_{grandfather}

After subtracting the above items, we will find that their differences mainly lie in the dimension of gender.

And then I’m going to take an average of the top guys.

We can get the following result:

We can identify the direction in which our stereotype bias predominates. And then you can also find a direction in which a particular bias doesn’t correlate.

Note: In this case, we consider our bias direction “gender” to be a one-dimensional space, while the remaining irrelevant direction is a 299-dimensional subspace. This is a simplification compared with the original paper. For details, please refer to the references provided at the end of the article.

2. Neutralization

There are words that are clearly gender-specific, but there are words that are supposed to be gender-neutral and fair.

Words that are genual, such as grandmother and grandfather, and not genual, such as nurse, doctor. For such words we neutralize them, that is to say, reduce the horizontal distance in the direction of the bias.

3. Balance

The second step is to deal with words that are gender-neutral. So what’s the problem with gender-specific words?

We can see this clearly in the figure above. For the word nurse, it is significantly closer to girl than to boy. If the text is generated and a nurse is mentioned, the girl is more likely to appear. So we need to calculate the equilibrium of distances.

It’s calculated and shifted, and there’s no gender distinction. And gender-specific words are equidistant.


  1. DeepLearning AI China – the world’s leading online AI education and practice platform (deeplearningai.net)
  2. [1607.06520 V1] Man is to Computer Programmer as Woman is to home aker? Debiasing Word Embeddings (arxiv.org) Thank you.