- 1. 任务
- 1.1. 背景
- 1.2. 任务定义
- 1.3. 数据集
- 1.4. 评测标准
- 2. 方法总结
- 2.1. 基于无监督的属性抽取方法
- 2.2. 基于依存关系的半监督的槽填充方法
- 2.3. 基于深度学习的序列标注方法
- 2.4.基于元模式的属性抽取方法
- 3. Paper List
- 3.1. 论文列表
- 4.相关链接
- 5.参考资源
1. 任务
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:cf429bc8-a0ea-411b-9717-7dbf9c09ad27
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:781f6086-c05a-4385-b529-a35483932347
属性抽取(Attribute Extraction):属性抽取的目标是从不同信息源中采集特定实体的属性信息。比如人物实体的生日、性别、国籍等,都是它的属性信息,通过属性抽取,通过多个数据源的获取,我们就可以通过丰富的属性信息来较为完整地刻画一个实体。
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:73274af3-f4a7-4063-8613-128b87b9bad9
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:f9f64da2-463e-41b4-a0db-da441384084c
- Accuracy
- Precision
- f1
2. 方法总结
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:248d58b2-e30f-4242-b349-b9f4071a2945
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:10f7af3e-f94b-4cce-a61e-fc768c8a8d20
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:29aa7a51-d808-479e-98ab-c6de64ace966
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:daf58e92-5dda-4b80-b542-0f7ea8df923a
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:17f0649f-3d94-4194-a2cf-cc4bb52f5830
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:64c88b63-f622-4f4b-881e-d7018801e087
* 方法:通过人工编写规则针对人物场景进行属性抽取,由于人工构造规则模板比较麻烦,可以使用Bootstrapping方法生成规则。 生成规则的步骤如下: 1、人工置顶规则种子作为初始规则种子集Spatter,属性值集合Sattr 2、使用规则种子集Spatter,遍历并匹配文本中的属性值,获取候选属性集合h 3、计算候选属性值集合h中每个属性值的可行度,将可信度较高的三个属性值加入到种子属性值集合Sattr中,若收敛,则算法结束,否则,执行4 4、使用属性值集合Sattr,遍历文本,由匹配到的属性值的上下文两个词,生成候选模板集合h’ 5、计算候选模板集合h’中每个候选模板的可信度,将可信度较高的3个候选模板加入到规则种子集合Spatter中,若Spatter收敛,则算法结束,否则执行步骤2 重复2-5到满足一定的次数。
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:56243dd2-b281-4fda-8783-a676e4a9385f
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:176fcec8-8ba0-4fb4-8c2b-0bbbd65193b0
* 场景:产品的属性抽取
* 论文:《An Unsupervised Approach to Product Attribute Extraction》
* 方法: 1、数据预处理: 找出限定性的短语和名词短语,论文认为一般属性出现在这种词语中 2、对上一步筛选出的名词进行聚类,删除词语稍少的类 3、从类中抽取属性:计算unigrams, bigrams and trigrams,使用作者定义的属性分数函数进行计算,得分高的则为属性。
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:825ec32d-2999-4722-803f-3da0e52d834c
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:14eedaba-ec88-47b3-9442-94b25be099c7
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:420dbd01-fcd3-4ac7-8649-47f2778705ae
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:46810b52-dd6f-485c-aef7-cd8e14ea904a
* 方法: 依存关系:在自然语言处理中,用词与词之间的依存关系来描述语言结构的框架称为依存语法(dependence grammar),又称从属关系语法。利用依存句法进行句法分析也是自然语言理解的重要技术之一。(来自维基百科)。 使用这种方法对人物进行属性抽取的步骤如下: 1、为每个属性生成对应的触发词表 2、根据属性槽特点,识别出句子中所有可能候选属性,比如出生地的NER标注为LOC,感觉是自己设置一些规则匹配一些属性出来 3、通过句子的依存结构,确认侯选属性与主体实体(这里是人物)的关系。将依存关系树看作一个无向图,顶点对应pagerank算法中的网页,利用pagerank算法,计算两个词在句法上的相关性。 4、计算
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:a6a675a6-f7d5-4a2e-9b3a-15604913d334
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:8c9b7ebd-a279-43d8-99a0-1429f35430ff
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:c0cfc653-469d-4ccf-89bd-cf6dfbaafe5a
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:845284e6-2d54-434d-8140-6c381489049f
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:15e6eaa2-1543-41f2-9307-d117ba507ae0
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7878d5e9-ac39-404f-9fc8-3e655ff04191
* 论文:《基于弱监督的属性关系抽取方法》《面向非结构化文本的开放式实体属性抽取》《实体 —属性抽取的GRU+CRF方法》《Personal Attributes Extraction in Chinese Text Based on Distant-Supervision and LSTM》《Bootstrapped Named Entity Recognition for Product Attribute Extraction》等论文中都使用了这种方法进行抽取
* 方法:将属性抽取看作序列标注问题,标注需要花费一定的人工成本,在有些场景下,比如人物属性的抽取,可以使用百度百科等百科词条的结构化信息框进行标注,可以降低一定的人工标注成本;同时,标注时也可以使用Bootstrap方法由种子发现更多潜在属性值,这种方法在《Bootstrapped Named Entity Recognition for Product Attribute Extraction》论文中提到,是一种类似于Pakhomov 2002提出的首字母扩写算法的算法。该算法学习如何将首字母缩写与其正确扩展相关联的上下文,作者认为,分类器在经过标记的已知品牌训练集上进行训练,可以学习可以区分当前含义的上下文模式。序列标注常使用的模型:CRF模型、神经网络模型如BI-GRU+CRF模型等。
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:26790e3f-46c6-42f8-9a2a-d13f87044f84
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:f72c75ad-92e1-4f15-81ba-e4c0a45bf1c0
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:b3f0f387-b733-458c-b8bf-51f1beeeadc9
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:e41b3f2d-5261-412c-ac93-9bbe3085b15d
* 论文:《MetaPAD-Meta Pattern Discovery from Massive Text Corpora》
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:1970e3de-a903-46cc-8a08-0374ba1d8a7d
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:b9ab734e-6e17-4924-9f44-471eae30ff69
3. Paper List
会议/年份论文链接ACM SIGKDD International Conference 2017MetaPAD-Meta Pattern Discovery from Massive Text Corpora
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3c6f3e49-b26c-4567-bd8c-7ac0f693bfbc
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:6bf3f2cf-9b7c-411f-9185-f4280c310be7
4. 相关链接
5. 参考资源
Original: https://blog.csdn.net/ox180x/article/details/124095596
Author: ox180x
Title: 属性抽取调研-工业界
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/561222/
转载文章受原作者版权保护。转载请注明原作者出处!