EMNLP2020 | 近期必读Multilingual精选论文

2023年6月1日下午9:32 • 人工智能 • 阅读 73

**AMiner平台**由清华大学计算机系研发，拥有我国完全自主知识产权。平台包含了超过2.3亿学术论文/专利和1.36亿学者的科技图谱，提供学者评价、专家发现、智能指派、学术地图等科技情报专业化服务。系统2006年上线，吸引了全球220个国家/地区1000多万独立IP访问，数据下载量230万次，年度访问量超过1100万，成为学术搜索和社会网络挖掘研究的重要数据和实验平台。
AMiner平台：https://www.aminer.cn

导语：多语言、跨语言（Multilingual and cross-lingual）的研究在自然语言处理领域研究中意义重大。首先机器翻译在人们的生活中应用广泛，在这个时代中已经成为了不可或缺的任务，而开发出可以在多种语言中适用的深度学习模型，对于自然语言处理的各种应用都极具价值。目前该领域的研究主要集中在一些resource的构建（如开源工具、数据集等）以及多语言适用的预训练语言模型开发。
根据AMiner-EMNLP2020词云图和论文可以看出，multilingual和cross-lingual的相关工作在本次会议中也有许多不凡的工作，下面我们一起看看multilingual主题的相关论文。

1.论文名称：LAReQA: Language-agnostic answer retrieval from a multilingual pool

论文链接：https://www.aminer.cn/pub/5e96db3891e01129d1a03f1a?conf=emnlp2020

作者：Roy Uma, Constant Noah, Al-Rfou Rami, Barua Aditya, Phillips Aaron, Yang Yinfei

简介：

Recent progress in self-supervised pretraining for language understanding has enabled training large multilingual models on 100+ languages at the same time, as in multilingual BERT (“mBERT”) and XLM-R
These models, despite being trained without any explicit objective of cross-lingual alignment, are surprisingly effective for cross-lingual transfer, suggesting that the models may have learned to “factor out” language and embed inputs into a language-agnostic space.
LAReQA is a challenging new benchmark testing answer retrieval from a multilingual candidate pool.
To achieve strong cross-lingual alignment, this model sacrifices performance on both retrieval from a monolingual pool, as well as retrieval of same-language candidates

2.论文名称：Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

论文链接：https://www.aminer.cn/pub/5ea16b2b91e011fa08b8f878?conf=emnlp2020

作者：Reimers Nils, Gurevych Iryna

简介：

Mapping sentences or short text paragraphs to a dense vector space, such that similar sentences are close, has wide applications in NLP.
The authors presented a method to make a monolingual sentence embeddings method multilingual with aligned vector spaces between the languages
The authors demonstrated that this approach successfully transfers properties from the source language vector space to various target languages.
This decoupling significantly simplifies the training procedure compared to previous approaches

3.论文名称：XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization.

论文链接：https://www.aminer.cn/pub/5f7fe6d80205f07f6897336e?conf=emnlp2020

作者：Alessandro Raganato, Tommaso Pasini, Jose Camacho-Collados, Mohammad Taher Pilehvar

简介：

One of the desirable properties of contextualized models, such as BERT and its derivatives, lies in their ability to associate dynamic representations to words, i.e., embeddings that can change depending on the context
In this paper the authors have introduced XL-WiC, a large benchmark for evaluating context-sensitive models.
In WiC, providing an evaluation framework for contextualized models in those languages, and for experimentation in a cross-lingual transfer setting.
Even though current language models are effective performers in the zero-shot cross-lingual setting, there is still room for improvement, especially for far languages such as Japanese or Korean.

4.论文名称：Multilingual Offensive Language Identification with Cross-lingual Embeddings.

论文链接：https://www.aminer.cn/pub/5f7fe6d80205f07f689733c5?conf=emnlp2020

作者：Tharindu Ranasinghe, Marcos Zampieri

简介：

Offensive posts on social media result in a number of undesired consequences to users.

This paper is the first study to apply cross-lingual contextual word embeddings in offensive language identification projecting predictions from English to other languages using benchmarked datasets from shared tasks on Bengali, Hindi, and Spanish.

The authors would like to further evaluate the models using SOLID, a novel large English dataset with over 9 million tweets, along with datasets in four other languages that were made available for the second edition of OffensEval
These datasets were collected using the same methodology and were annotated according to OLID’s guidelines.

5.论文名称：From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers.

论文链接：https://www.aminer.cn/pub/5f7fe6d80205f07f689731f4?conf=emnlp2020

作者：Anne Lauscher, Vinit Ravishankar, Ivan Vulić, Goran Glavaš

简介：

Labeled datasets of sufficient size support supervised learning and development in NLP.

While additional fine-tuning of MMT on the small number of target-language instances is computationally cheap, the potential bottleneck lies in possibly expensive data annotation
* This is, especially for minor languages, potentially a major issue and deserves further analysis.

The authors have shown that the MMT potential on distant and lower-resource target languages can be quickly unlocked if they are offered a handful of annotated target-language instances

更多 EMNLP2020论文，可以关注公众号或者链接直达EMNLP2020专题，最前沿的研究方向和最全面的论文数据等你来~
EMNLP2020：https://www.aminer.cn/conf/emnlp2020

Original: https://blog.csdn.net/AI_Conf/article/details/109719791
Author: AMiner学术搜索和科技情报挖掘
Title: EMNLP2020 | 近期必读Multilingual精选论文

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/558588/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

目标检测: 一文读懂 YOLOX

论文：YOLOX: Exceeding YOLO Series in 2021 论文链接：https://arxiv.org/pdf/2107.08430.pdf 代码链接：htt…

人工智能 2023年6月23日
0098
DataFrame详解——缺失数据处理

缺失数据处理方法解释DataFrame.backfill([axis, inplace, limit, …])后向填充，等同于 DataFrame.fillna(me…

人工智能 2023年7月8日
0072
加速和分布式训练

加速和分布式训练在机器学习和深度学习中，训练模型的过程通常是非常耗时的。为了加快训练速度，可以采用加速和分布式训练的方法。加速训练即使用硬件和算法的优化技术来减少训练时间，而分布…

人工智能 2024年1月1日
0038
为什么会出现梯度爆炸和梯度消失现象？怎么缓解这种现象的发生？

前言：梯度消失现象在深度神经网络训练过程中表现得尤为突出，随着网络层数的加深，损失在反向传播时梯度在不断减小，导致浅层网络的学习进行不下去，参数得不到有效更新。为什么会出现这种现象…

人工智能 2023年7月13日
0066
【集成学习】：Stacking原理以及Python代码实现

Stacking集成学习在各类机器学习竞赛当中得到了广泛的应用，尤其是在结构化的机器学习竞赛当中表现非常好。今天我们就来介绍下stacking这个在机器学习模型融合当中的大杀器的原…

人工智能 2023年6月24日
0092
Cocoa-window

Application的结构在AppDelegate文件中获取当前appDelegate和app对象 AppDelegate *appDelegate = (AppDelegat…

人工智能 2023年6月30日
0072
Pytorch+CUDA安装方法步骤

首先我们要确定本机是否有独立显卡，在右键点击开始按钮—设备管理器-显示适配器中，查看是否有独立显卡。可以看到本机有一个集成显卡和独立显卡NVIDIA GetForce GTX 10…

人工智能 2023年7月20日
0098
基于大数据的农产品价格信息监测分析系统

温馨提示：文末有 CSDN 平台官方提供的学长 Wechat / QQ 名片 :) 项目简介本项目利用网络爬虫技术从某蔬菜网采集所有农产品的价格数据，包括北京、上海、安徽、湖北等…

人工智能 2023年7月16日
0069
机器学习中的数学——距离定义（二十八）：最大均值差异（Maximum Mean Discrepancy, MMD）

分类目录：《机器学习中的数学》总目录相关文章：· 距离定义：基础知识· 距离定义（一）：欧几里得距离（Euclidean Distance）· 距离定义（二）：曼哈顿距离（Manh…

人工智能 2023年6月15日
00104
【pytorch实战学习】第七篇：tensorboard可视化介绍

【pytorch学习实战】第一篇：线性回归【pytorch学习实战】第二篇：多项式回归【pytorch学习实战】第三篇：逻辑回归【pytorch学习实战】第四篇：MNIST数…

人工智能 2023年7月13日
0061
Numpy报错：ImportError: numpy.core.multiarray failed to import

导入自定义的 python 模块时，出现以下报错： ImportError: numpy.core.multiarray failed to import from .cv2 im…

人工智能 2023年6月17日
0057
CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis–文献笔记和翻译

一篇来自 ACM MM2020年关于跨模态-bert模型的文献 ACM MM:ACM Multimedia Conference 领域顶级国际会议，全文的录取率极低，但Poster…

人工智能 2023年5月30日
0078
憨批的语义分割重制版10——Tensorflow2 搭建自己的DeeplabV3+语义分割平台

憨批的语义分割重制版10——Tensorflow2 搭建自己的DeeplabV3+语义分割平台注意事项学习前言什么是DeeplabV3+模型代码下载 DeeplabV3+实…

人工智能 2023年5月25日
0079
【Qt&OpenCV QGraphicsView显示OpenCV读入的图片】

文章目录前言一、新建Qt项目[ProjCV] * 1. Qt–如下7图所示建立新项目，命名：ProjCV，Detials页面内容采用默认，可以自己规划命名。 2. …

人工智能 2023年7月20日
0077
自然语言处理(NLP)：竞赛平台【国际、国内】

“白嫖数据的圣地，NLP技能的训练场”，当你觉得学好了NLP技能想练手却苦于没有数据；当你工作学习之于想通过真实NLP项目来增加项目经验却苦于没有业务场景；…

人工智能 2023年5月31日
0085
【论文阅读+测试】Real-Esrgan超分辨率算法

目录前言一、核心二、主要内容 1.高阶退化模型 2.网络和训练 ESRGAN生成器光谱归一化(SN) U-Net鉴别器 3.论文中效果三、项目使用可执行文件（不支持所有…

人工智能 2023年6月15日
0060

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

EMNLP2020 | 近期必读Multilingual精选论文

大家都在看