Debiasing Pre-trained Contextualised Embeddings

TL;DR Introduction 왜 contextual model?왜 이 연구방법을 사용해야 하나?Debiasing method 성능 평가는 어떻게?Debiasing Contextualised Embeddings 연구의 방향 연구 방법/순서 Attribute words / Target words 를 정의 Atttribute words나 Target words가 들어간 문장을 추출 Loss 정의 layer별 실험 실험 및 결과 평가 데이터셋 HyperParams Debiasing vs Preserving Info SEAT Test Original vs Debiased

TL;DR

Pretrained Contextualised word embedding을 Debiasing하는 방법

Token단위, 혹은 Sentence단위로 Debiasing
Discriminative gender-related bias를 효과적으로 debiasing하면서도 Semantic info는 유지
이전 연구보다 효과적인 down-stream task/Fine-tune 방식의 debiasing

대부분의 PLM에 적용 가능

BERT, RoBERTa, ALBERT, DistilBERT, ELECTRA에 적용해 테스트함
GPT 계열에는 테스트하지 않음. Encoder 계열에만 적용함.

Trade-off: Accurate vs Unbiased contextualised embedding model

Unbiasing 절차를 거칠때 성능 떨어지는 모델이 있음 (RoBERTa)
PLM마다 debiasing 이후 달라지는 성능의 갭이 차이가 큼(BERT는 거의 성능 안떨어짐)

Token-level debiasing을 모든 토큰/모든 layer에 대해 적용하는 것이 가장 높은 Debiasing

Introduction

왜 contextual model?

NLP에서 Contextual embedding이 효율 ⤴

하지만 non-contextual embedding에서 bias나타나는 것처럼 contextual에서도 bias 나타남

그리고 이건 NLP system에서 propagate되기 때문에 이슈가 됨

→ 이런식으로 문제 제기하는 것과 관련한 논문을 하나 더 읽어볼 예정

Static word embedding에서도 Gender, racial, religious bias가 나타남

Static word embedding에서 Debiasing하는 방법에는..

Projection based methods
Adversarial Methods

Contextual emb models을 debiasing하는 것은 여전이 연구가 부족함

하지만 contextual 모델을 debiasing하는 것이 더 어려움!

params 수가 많음
고정되지 않은 emb vector
이로 인해 단순한 projection 방법은 사용할 수 없음

Contexual model에서 나타나는 특정 단어의 Representation은 단어 자체 외에도 Context를 통해 나타남

일부 context에서는 bias아니고, 일부 context에서는 bias된 용어로 사용됨

왜 이 연구방법을 사용해야 하나?

Debias하기 위해 모델을 다시 학습하는 것은 비용이 너무 비쌈 💲

즉, 학습 데이터셋 자체를 balanced로 만들어 다시 학습하는 것은 비용이 $$$

ex) Gender term을 Data augumentation 통해서 balancing후 PLM 학습

이를 위해 Fine-tune 방식으로 Debiasing하는 것이 현실적임

Debiasing method 성능 평가는 어떻게?

SEAT dataset

MNLI dataset

이번 연구에서는 Gender bias를 타겟으로 해서 debiasing을 진행하는 사례를 보임

Debiasing Contextualised Embeddings

연구의 방향

Fine-tune 환경에서 Contextualised embedding을 Debiasing하는 방법

여전히 Semantic information을 유지하고
Discriminative gender bias를 제거하자

Orthogonal projection in intermediate(hidden) layers by token(or sentence) level

Debiasing 방법은 모델 아키텍처 의존적이지는 않음!

하지만 Transformer encoder(BERT) 계열만 테스트함

연구 방법/순서

Attribute words / Target words 를 정의

Attribute words

Feminine: she, woman, her...
Masculine: he, man, him...

Target words

Occupations: doctor, nusrse, professor...
Gender 중립적(이길 기대하는) 단어

Atttribute words나 Target words가 들어간 문장을 추출

한개 이상의 Attribute/target words가 들어간 문장은 제외 (모호성 방지)

특정 단어 가 들어간 문장 set:

: Attribute words 들어간 문장 셋

: Target words 들어간 문장 셋

연구의 목표는 A에 있는 Semantic info를 유지하면서 + T에 있는 Discriminative bias를 제거하는 것

Loss 정의

Non-contextualised word embedding과 Debiased model의 i번째 Layer의 토큰 t vector를 Inner product

토큰별, 단어별, 문장별로 inner product한 값의 합이

이때, 는 특정 Attribute words a의 non-contextualised word vector

모든 문장에서, 특정 단어의 i번째 layer에서 나타나는 representation의 Avg(mean) vector
만약 단어가 sub-token으로 쪼개지면 위 과정을 모두 거친 후 그 subwords를 합쳐서 Avg

위 를 minimize하는 것 = PLM모델 의 파라미터가 Gender와 같은 Protected attributes와 Orthogonal하게 만듬

하지만 이것만 하면 기존 contextual emb의 의미(semantic info)를 잃어버림

PLM의 Semantic info를 유지하기 위해 Debiased모델과 기존 모델 간 L2 distance를 Regualizer로 사용

모든 문장, 모든 단어, 모든 레이어에 대해 Debiased/기존 모델간 squared L2 distance를 최소화

최종적인 모델의 Loss는 위 두개의 합

이때 일 때 가장 좋은 성능을 보였다고 함 (Grid search)

layer별 실험

First layer only vs Last layer only vs ALL layers

특정 target words only vs 모든 단어

전자는 Token-level debiasing
후자는 Sentence-level debiasing

실험 및 결과

평가 데이터셋

Gender Bias

SEAT 6,7,8
MNLI
"The [Subject] [verb] a/an [object]" 문장을 생성

단어 목록은 Gender bias in contextualized word embeddings(NAACL, 2019) 페이퍼에서 가져옴

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang. 2019. Gender bias in contextualized word embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 629–634, Minneapolis, Minnesota. Association for Computational Linguistics.