# 中文NER的那些事儿2. 多任务，对抗迁移学习详解&代码实现

## 多任务学习

1. 引入额外信息：帮助学习直接从主任务中难以提取出的特征
2. 学到更通用的文本特征：多个任务都需要的信息才是通用信息，也可以理解成正则化，也可以理解为不同任务会带来噪声起到类似bagging的作用

MTL有很多种模型结构，之后我们主要会用到的是前三种，hard,Asymemetry和Customized Sharing, 下面让我们具体看下MTL在NER任务中的各种使用方式。

### 词边界增强：ner+cws

paper: Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning，2016

### 跨领域半监督学习：ner+ner

paper: A unified Model for Cross-Domain and Semi-Supervised Named entity Recognition in Chinese Social Media, 2017

• cross-entropy: 用目标领域n-gram模型计算x的熵
• Gaisssian: 用所有目标领域文本embedding求平均构建(v_{IN}), 计算(v_x)和(v_{IN})的欧式距离
• Polynomial Kernel：(v_x)和(v_{IN})的cosine距离

[confid(x) = \frac{y_{max(x)} – y_{2nd}(x)}{y_{max(x)}} ]

[weight(x,t) = \left{ \begin{array}{align=left} 1.0 \quad \text{x 是领域内}\ func(x, IN) \quad \text{x 是领域外}\ confid(x,t) \quad \text{x 是未标注} \end{array} \right. ]

### 模型实现

def build_graph(features, labels, params, is_training):
input_ids = features['token_ids']
label_ids = features['label_ids']
segment_ids = features['segment_ids']
seq_len = features['seq_len']

embedding = pretrain_bert_embedding(input_ids, input_mask, segment_ids, params['pretrain_dir'],
params['embedding_dropout'], is_training)

lstm_output1 = bilstm(embedding, params['cell_type'], params['rnn_activation'],
params['hidden_units_list'], params['keep_prob_list'],
params['cell_size'], params['dtype'], is_training)

use_bias=True, name='logits')

trans1, loglikelihood1 = crf_layer(logits, label_ids, seq_len, task_params['label_size'], is_training)

tf.summary.scalar('loss', loss1)

lstm_output2 = bilstm(embedding, params['cell_type'], params['rnn_activation'],
params['hidden_units_list'], params['keep_prob_list'],
params['cell_size'], params['dtype'], is_training)

if params['asymmetry']:
lstm_output2 = tf.concat([lstm_output1, lstm_output2], axis=-1)
use_bias=True, name='logits')

trans2, loglikelihood2 = crf_layer(logits, label_ids, seq_len, task_params['label_size'], is_training)

tf.summary.scalar('loss', loss2)

loss = (loss1+loss2)/tf.cast(batch_size, dtype=params['dtype'])
pred_ids = tf.where(tf.equal(task_ids, 0), pred_ids1, pred_ids2) # for infernce all pred_ids will be for 1 task


## 对抗迁移学习

### 梯度反转 GRL

paper: Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism, 2018

[\begin{align} s &= MaxPooling(s)\ D(s) &= softmax(Ws+b)\ \end{align} ]

### 模型实现

def build_graph(features, labels, params, is_training):
input_ids = features['token_ids']
label_ids = features['label_ids']
segment_ids = features['segment_ids']
seq_len = features['seq_len']

embedding = pretrain_bert_embedding(input_ids, input_mask, segment_ids, params['pretrain_dir'],
params['embedding_dropout'], is_training)

share_output = bilstm(embedding, params['cell_type'], params['rnn_activation'],
params['hidden_units_list'], params['keep_prob_list'],
params['cell_size'], params['dtype'], is_training) # batch * max_seq * (2*hidden)
share_max_pool = tf.reduce_max(share_output, axis=1, name='share_max_pool') # batch * (2* hidden) extract most significant feature
# reverse gradient of max_output to only update the unit use to distinguish task
share_max_pool = tf.layers.dropout(share_max_pool, rate=params['share_dropout'],
seed=1234, training=is_training)

lstm_output = bilstm(embedding, params['cell_type'], params['rnn_activation'],
params['hidden_units_list'], params['keep_prob_list'],
params['cell_size'], params['dtype'], is_training)
lstm_output = tf.concat([share_output, lstm_output], axis=-1) # bath * (4* hidden)

use_bias=True, name='logits')

trans1, loglikelihood1 = crf_layer(logits, label_ids, seq_len, task_params['label_size'], is_training)

tf.summary.scalar('loss', loss1)

lstm_output = bilstm(embedding, params['cell_type'], params['rnn_activation'],
params['hidden_units_list'], params['keep_prob_list'],
params['cell_size'], params['dtype'], is_training)
lstm_output = tf.concat([share_output, lstm_output], axis=-1) # bath * (4* hidden)

use_bias=True, name='logits')

trans2, loglikelihood2 = crf_layer(logits, label_ids, seq_len, task_params['label_size'], is_training)

tf.summary.scalar('loss', loss2)

loss = (loss1+loss2)/tf.cast(batch_size, dtype=params['dtype']) + adv_loss * params['lambda']
pred_ids = tf.where(tf.equal(task_ids, 0), pred_ids1, pred_ids2)



### Reference

1. 【CWS+NER MTL】Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning，2016
2. 【Cross-Domain LR Adjust】A unified Model for Cross-Domain and Semi-Supervised Named entity Recognition in Chinese Social Media, 2017
3. 【MTL】Multi-Task Learning for Sequence Tagging: An Empirical Study, 2018
4. 【CWS+NER Adv MTL】Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism, 2018
6. Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition, 2019
7. 【GRL】Unsupervised Domain Adaptation by Backpropagation，2015
8. 【GRL】Domain-Adversarial Training of Neural Networks, 2016
9. https://www.zhihu.com/question/266710153

Original: https://www.cnblogs.com/gogoSandy/p/14773792.html
Author: 风雨中的小七
Title: 中文NER的那些事儿2. 多任务，对抗迁移学习详解&代码实现

(0)