属性抽取调研-工业界

1. 任务

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:cf429bc8-a0ea-411b-9717-7dbf9c09ad27

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:781f6086-c05a-4385-b529-a35483932347

属性抽取(Attribute Extraction):属性抽取的目标是从不同信息源中采集特定实体的属性信息。比如人物实体的生日、性别、国籍等,都是它的属性信息,通过属性抽取,通过多个数据源的获取,我们就可以通过丰富的属性信息来较为完整地刻画一个实体。

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:73274af3-f4a7-4063-8613-128b87b9bad9

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:f9f64da2-463e-41b4-a0db-da441384084c

  • Accuracy
  • Precision
  • f1

2. 方法总结

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:248d58b2-e30f-4242-b349-b9f4071a2945

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:10f7af3e-f94b-4cce-a61e-fc768c8a8d20

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:29aa7a51-d808-479e-98ab-c6de64ace966

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:daf58e92-5dda-4b80-b542-0f7ea8df923a

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:17f0649f-3d94-4194-a2cf-cc4bb52f5830

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:64c88b63-f622-4f4b-881e-d7018801e087

* 方法:通过人工编写规则针对人物场景进行属性抽取,由于人工构造规则模板比较麻烦,可以使用Bootstrapping方法生成规则。 生成规则的步骤如下: 1、人工置顶规则种子作为初始规则种子集Spatter,属性值集合Sattr 2、使用规则种子集Spatter,遍历并匹配文本中的属性值,获取候选属性集合h 3、计算候选属性值集合h中每个属性值的可行度,将可信度较高的三个属性值加入到种子属性值集合Sattr中,若收敛,则算法结束,否则,执行4 4、使用属性值集合Sattr,遍历文本,由匹配到的属性值的上下文两个词,生成候选模板集合h’ 5、计算候选模板集合h’中每个候选模板的可信度,将可信度较高的3个候选模板加入到规则种子集合Spatter中,若Spatter收敛,则算法结束,否则执行步骤2 重复2-5到满足一定的次数。
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:56243dd2-b281-4fda-8783-a676e4a9385f

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:176fcec8-8ba0-4fb4-8c2b-0bbbd65193b0

* 场景:产品的属性抽取
* 论文:《An Unsupervised Approach to Product Attribute Extraction》
* 方法: 1、数据预处理: 找出限定性的短语和名词短语,论文认为一般属性出现在这种词语中 2、对上一步筛选出的名词进行聚类,删除词语稍少的类 3、从类中抽取属性:计算unigrams, bigrams and trigrams,使用作者定义的属性分数函数进行计算,得分高的则为属性。

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:825ec32d-2999-4722-803f-3da0e52d834c

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:14eedaba-ec88-47b3-9442-94b25be099c7

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:420dbd01-fcd3-4ac7-8649-47f2778705ae

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:46810b52-dd6f-485c-aef7-cd8e14ea904a

* 方法: 依存关系:在自然语言处理中,用词与词之间的依存关系来描述语言结构的框架称为依存语法(dependence grammar),又称从属关系语法。利用依存句法进行句法分析也是自然语言理解的重要技术之一。(来自维基百科)。 使用这种方法对人物进行属性抽取的步骤如下: 1、为每个属性生成对应的触发词表 2、根据属性槽特点,识别出句子中所有可能候选属性,比如出生地的NER标注为LOC,感觉是自己设置一些规则匹配一些属性出来 3、通过句子的依存结构,确认侯选属性与主体实体(这里是人物)的关系。将依存关系树看作一个无向图,顶点对应pagerank算法中的网页,利用pagerank算法,计算两个词在句法上的相关性。 4、计算
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:a6a675a6-f7d5-4a2e-9b3a-15604913d334

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:8c9b7ebd-a279-43d8-99a0-1429f35430ff

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:c0cfc653-469d-4ccf-89bd-cf6dfbaafe5a

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:845284e6-2d54-434d-8140-6c381489049f

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:15e6eaa2-1543-41f2-9307-d117ba507ae0

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7878d5e9-ac39-404f-9fc8-3e655ff04191

* 论文:《基于弱监督的属性关系抽取方法》《面向非结构化文本的开放式实体属性抽取》《实体 —属性抽取的GRU+CRF方法》《Personal Attributes Extraction in Chinese Text Based on Distant-Supervision and LSTM》《Bootstrapped Named Entity Recognition for Product Attribute Extraction》等论文中都使用了这种方法进行抽取
* 方法:将属性抽取看作序列标注问题,标注需要花费一定的人工成本,在有些场景下,比如人物属性的抽取,可以使用百度百科等百科词条的结构化信息框进行标注,可以降低一定的人工标注成本;同时,标注时也可以使用Bootstrap方法由种子发现更多潜在属性值,这种方法在《Bootstrapped Named Entity Recognition for Product Attribute Extraction》论文中提到,是一种类似于Pakhomov 2002提出的首字母扩写算法的算法。该算法学习如何将首字母缩写与其正确扩展相关联的上下文,作者认为,分类器在经过标记的已知品牌训练集上进行训练,可以学习可以区分当前含义的上下文模式。序列标注常使用的模型:CRF模型、神经网络模型如BI-GRU+CRF模型等。
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:26790e3f-46c6-42f8-9a2a-d13f87044f84

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:f72c75ad-92e1-4f15-81ba-e4c0a45bf1c0

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:b3f0f387-b733-458c-b8bf-51f1beeeadc9

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:e41b3f2d-5261-412c-ac93-9bbe3085b15d

* 论文:《MetaPAD-Meta Pattern Discovery from Massive Text Corpora》
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:1970e3de-a903-46cc-8a08-0374ba1d8a7d

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:b9ab734e-6e17-4924-9f44-471eae30ff69

3. Paper List

会议/年份论文链接ACM SIGKDD International Conference 2017MetaPAD-Meta Pattern Discovery from Massive Text Corpora

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3c6f3e49-b26c-4567-bd8c-7ac0f693bfbc

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:6bf3f2cf-9b7c-411f-9185-f4280c310be7

4. 相关链接

5. 参考资源

Original: https://blog.csdn.net/ox180x/article/details/124095596
Author: ox180x
Title: 属性抽取调研-工业界

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/561222/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球