EMNLP2020 | 近期必读Multilingual精选论文

**AMiner平台**由清华大学计算机系研发,拥有我国完全自主知识产权。平台包含了超过2.3亿学术论文/专利和1.36亿学者的科技图谱,提供学者评价、专家发现、智能指派、学术地图等科技情报专业化服务。系统2006年上线,吸引了全球220个国家/地区1000多万独立IP访问,数据下载量230万次,年度访问量超过1100万,成为学术搜索和社会网络挖掘研究的重要数据和实验平台。
AMiner平台:https://www.aminer.cn

导语:多语言、跨语言(Multilingual and cross-lingual)的研究在自然语言处理领域研究中意义重大。首先机器翻译在人们的生活中应用广泛,在这个时代中已经成为了不可或缺的任务,而开发出可以在多种语言中适用的深度学习模型,对于自然语言处理的各种应用都极具价值。目前该领域的研究主要集中在一些resource的构建(如开源工具、数据集等)以及多语言适用的预训练语言模型开发。
根据AMiner-EMNLP2020词云图和论文可以看出,multilingual和cross-lingual的相关工作在本次会议中也有许多不凡的工作,下面我们一起看看multilingual主题的相关论文。

EMNLP2020 | 近期必读Multilingual精选论文

1.论文名称:LAReQA: Language-agnostic answer retrieval from a multilingual pool

论文链接:https://www.aminer.cn/pub/5e96db3891e01129d1a03f1a?conf=emnlp2020

作者:Roy Uma, Constant Noah, Al-Rfou Rami, Barua Aditya, Phillips Aaron, Yang Yinfei

简介:

  • Recent progress in self-supervised pretraining for language understanding has enabled training large multilingual models on 100+ languages at the same time, as in multilingual BERT (“mBERT”) and XLM-R
  • These models, despite being trained without any explicit objective of cross-lingual alignment, are surprisingly effective for cross-lingual transfer, suggesting that the models may have learned to “factor out” language and embed inputs into a language-agnostic space.

  • LAReQA is a challenging new benchmark testing answer retrieval from a multilingual candidate pool.

  • To achieve strong cross-lingual alignment, this model sacrifices performance on both retrieval from a monolingual pool, as well as retrieval of same-language candidates

    EMNLP2020 | 近期必读Multilingual精选论文

2.论文名称:Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

论文链接:https://www.aminer.cn/pub/5ea16b2b91e011fa08b8f878?conf=emnlp2020

作者:Reimers Nils, Gurevych Iryna

简介:

  • Mapping sentences or short text paragraphs to a dense vector space, such that similar sentences are close, has wide applications in NLP.

  • The authors presented a method to make a monolingual sentence embeddings method multilingual with aligned vector spaces between the languages

  • The authors demonstrated that this approach successfully transfers properties from the source language vector space to various target languages.

  • This decoupling significantly simplifies the training procedure compared to previous approaches

    EMNLP2020 | 近期必读Multilingual精选论文

3.论文名称:XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization.

论文链接:https://www.aminer.cn/pub/5f7fe6d80205f07f6897336e?conf=emnlp2020

作者:Alessandro Raganato, Tommaso Pasini, Jose Camacho-Collados, Mohammad Taher Pilehvar

简介:

  • One of the desirable properties of contextualized models, such as BERT and its derivatives, lies in their ability to associate dynamic representations to words, i.e., embeddings that can change depending on the context
  • In this paper the authors have introduced XL-WiC, a large benchmark for evaluating context-sensitive models.

  • In WiC, providing an evaluation framework for contextualized models in those languages, and for experimentation in a cross-lingual transfer setting.

  • Even though current language models are effective performers in the zero-shot cross-lingual setting, there is still room for improvement, especially for far languages such as Japanese or Korean.

EMNLP2020 | 近期必读Multilingual精选论文

4.论文名称:Multilingual Offensive Language Identification with Cross-lingual Embeddings.

论文链接:https://www.aminer.cn/pub/5f7fe6d80205f07f689733c5?conf=emnlp2020

作者:Tharindu Ranasinghe, Marcos Zampieri

简介:

  • Offensive posts on social media result in a number of undesired consequences to users.

This paper is the first study to apply cross-lingual contextual word embeddings in offensive language identification projecting predictions from English to other languages using benchmarked datasets from shared tasks on Bengali, Hindi, and Spanish.

  • The authors would like to further evaluate the models using SOLID, a novel large English dataset with over 9 million tweets, along with datasets in four other languages that were made available for the second edition of OffensEval
  • These datasets were collected using the same methodology and were annotated according to OLID’s guidelines.

EMNLP2020 | 近期必读Multilingual精选论文

5.论文名称:From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers.

论文链接:https://www.aminer.cn/pub/5f7fe6d80205f07f689731f4?conf=emnlp2020

作者:Anne Lauscher, Vinit Ravishankar, Ivan Vulić, Goran Glavaš

简介:

  • Labeled datasets of sufficient size support supervised learning and development in NLP.

While additional fine-tuning of MMT on the small number of target-language instances is computationally cheap, the potential bottleneck lies in possibly expensive data annotation
* This is, especially for minor languages, potentially a major issue and deserves further analysis.

  • The authors have shown that the MMT potential on distant and lower-resource target languages can be quickly unlocked if they are offered a handful of annotated target-language instances
    EMNLP2020 | 近期必读Multilingual精选论文

更多 EMNLP2020论文,可以关注公众号或者链接直达EMNLP2020专题,最前沿的研究方向和最全面的论文数据等你来~
EMNLP2020:https://www.aminer.cn/conf/emnlp2020

EMNLP2020 | 近期必读Multilingual精选论文

Original: https://blog.csdn.net/AI_Conf/article/details/109719791
Author: AMiner学术搜索和科技情报挖掘
Title: EMNLP2020 | 近期必读Multilingual精选论文

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/558588/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球