Today is the first day of the Lunar New Year. The past year has been a new beginning for us. On some levels, people’s attention to science fiction has never been as enthusiastic as in 2023. I remembered that I had written a science fiction (fantasy) novel before, titled “The Door”. This novel was written in 2015. At that time, besides me, there were several other authors, but I can’t remember who they were. I’m really sorry. I want to post this novel here today as part of my annual summary and the first post of the new year. Since most of the content is written by me, and no one cares about the content, there should be no copyright issues.
This content is not available in English. If you would like to provide a translation, please send an email to yuki@yuki-nagato.com.
两个星期前我上传了静态词嵌入的总结,这次是上下文词嵌入(contextualized word embedding)的总结,包含一些预训练模型。PPT可以点击此处下载。
前言
在NLP领域中,一个重要的问题是如何表示词。在过去,有一些基于分布假说的Static Word Embeddings模型,如Word2vec和GloVe,他们利用词语的上下文分布信息学习每个词的静态嵌入表示,最后得到每个词对应的固定的向量。这种模型的缺点也很明显,第一是不能提现词语的多义性。尽管有一些基于上下文学习一个词对应多个词嵌入的方法,但是这些表示仍然是静态的,不能从根本上处理丰富的词义变化;第二是要将Static Word Embeddings应用到下游任务(例如Named Entity Recognition、Sentiment Analysis)时,还须要设计额外的分类模型,使用RNN或多层CNN再次获得当前的语境信息。
Contextual word embedding模型不再单独训练每个词的表示,而是以语境为单位,每次都输入完整的上下文,输出这些词在当前语境下的表示,并且同一个词在不同的语境中的表示是可变的。这样一方面解决了词语多义性的问题,另一方面通常无需再设计额外的下游任务模型,因为contextual word embedding已支持完整的上下文的输入输出。
本文首先介绍一个常用的特征提取模型Transformer (Vaswani et al., 2017),然后介绍contextual word embedding的训练过程,以及他们的评价方法和结果。