使用DGL完成节点分类任务

2023年7月13日上午3:22 • 人工智能 • 阅读 54

更多图神经网络和深度学习内容请关注：

; 节点分类任务概述

节点分类(node classification)任务是在图数据处理中最流行任务之一，一个模型需要预测每个节点属于哪个类别。

在图神经网络出现之前，用于结点分类任务的方法可归为两大类：

仅使用连通性（如DeepWalk或node2vec）
简单地结合连通性和节点自身的特征

相比之下， GNNs是一个通过结合局部邻域（广义上的邻居，包含结点自身）的连通性及其特征来获得节点表征的方法。

Kipf等人将节点分类问题描述为 一个半监督的节点分类任务。图神经网络只需要一小部分已标记的节点，即可准确地预测其他节点的类别。

本文将展示如何在 Cora数据集中(即以论文为节点，以论文引用为边的引文网络)使用少量标签构建半监督节点分类任务的GNN模型。其具体任务为预测给定论文的类别。每个论文节点均包含一个单词计数向量（word count vector）作为它的特征，这些特征进行了归一化(使其总和为1)，参考论文第5.2节。

使用DGL完成节点分类

导入相对应的包

import dgl
import torch
import torch.nn as nn
import torch.nn.functional as F
import dgl.data

Using backend: pytorch

加载数据集

dataset = dgl.data.CoraGraphDataset()

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.

DGL数据集对象可以包含一个或多个图。一般情况下， 整图分类任务数据集包含多个图， 边预测和 节点分类数据集只包含一个图，如节点分类任务中的 Cora数据集只包含一个图。

g = dataset[0]

DGL图将节点特征和边特征分别存储在两个类似字典的属性 ndata和 edata中，在Cora数据集中，图包含以下节点特征（其他数据集也类似）：

train_mask：布尔张量，表示节点是否在训练集中。
val_mask：布尔张量，表示节点是否在验证集中。
test_mask：布尔张量，表示节点是否在测试集中。
label：节点类别。
feat：节点特征。

print("Node feature")
print(g.ndata)

print("Edge feature")
print(g.edata)

Node feature
{'train_mask': tensor([ True,  True,  True,  ..., False, False, False]), 'label': tensor([3, 4, 4,  ..., 3, 3, 3]), 'val_mask': tensor([False, False, False,  ..., False, False, False]), 'test_mask': tensor([False, False, False,  ...,  True,  True,  True]), 'feat': tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])}
Edge feature
{}

定义图卷积网络（GCN）

本文将构建一个两层图卷积网络（GCN）。其中每一层都通过聚合邻居信息来计算新的节点表示，若需要构建多层GCN网络，我们可简单地堆叠 dgl.nn.GraphConv模块，这些都模块继承于 torch.nn.Module。（假设DGL使用的后端框架为PyTorch）

from dgl.nn import GraphConv

class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_class):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_class)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

in_feats = g.ndata["feat"].shape[1]
h_feats = 16
num_class = (torch.max(g.ndata["label"]) + 1).item()

model = GCN(in_feats, h_feats, num_class)

DGL提供了许多流行的邻居聚合模块的实现，我们可以使用一行代码即可轻松调用它们。

训练GCN模型

GCN模型训练过程类似其他PyTorch神经网络训练过程。

def train(g, model, learning_rate=0.01, num_epoch=100):
    optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
    best_val_acc = 0
    best_test_acc = 0

    features = g.ndata["feat"]
    labels = g.ndata["label"]
    train_mask = g.ndata["train_mask"]
    test_mask = g.ndata["test_mask"]
    val_mask = g.ndata["val_mask"]

    for epoch in range(num_epoch):
        result = model(g, features)
        pred = result.argmax(1)

        loss = F.cross_entropy(result[train_mask], labels[train_mask])

        train_acc = (pred[train_mask]==labels[train_mask]).float().mean()
        val_acc  = (pred[val_mask]==labels[val_mask]).float().mean()
        test_acc  = (pred[test_mask]==labels[test_mask]).float().mean()

        if best_val_acc < val_acc:
            best_val_acc, best_test_acc = val_acc, test_acc
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if epoch % 5 == 0:
            print('In epoch {}, loss: {}, val acc: {:.3f} (best {:.3f}), test acc: {:.3f} (best {:.3f})'.format(
                epoch, loss, val_acc, best_val_acc, test_acc, best_test_acc))

if __name__ == "__main__":
    train(g, model, num_epoch=200, learning_rate=0.002)

In epoch 0, loss: 1.0601081612549024e-06, val acc: 0.764 (best 0.764), test acc: 0.764 (best 0.764)
In epoch 5, loss: 9.979492006095825e-07, val acc: 0.760 (best 0.764), test acc: 0.764 (best 0.764)
In epoch 10, loss: 9.494142432231456e-07, val acc: 0.762 (best 0.764), test acc: 0.764 (best 0.764)
In epoch 15, loss: 9.017308570946625e-07, val acc: 0.764 (best 0.764), test acc: 0.764 (best 0.764)
In epoch 20, loss: 8.557504429518303e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 25, loss: 8.157304023370671e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 30, loss: 7.71452903336467e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 35, loss: 7.322842634494009e-07, val acc: 0.764 (best 0.764), test acc: 0.764 (best 0.764)
In epoch 40, loss: 6.948185955479858e-07, val acc: 0.764 (best 0.764), test acc: 0.764 (best 0.764)
In epoch 45, loss: 6.624618436035234e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 50, loss: 6.292536340879451e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 55, loss: 6.028573125149705e-07, val acc: 0.764 (best 0.764), test acc: 0.764 (best 0.764)
In epoch 60, loss: 5.807185630146705e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 65, loss: 5.534708407139988e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 70, loss: 5.381440359997214e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 75, loss: 5.117477144267468e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 80, loss: 4.913119937555166e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 85, loss: 4.759851037761109e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 90, loss: 4.5640075541086844e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 95, loss: 4.368164354673354e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 100, loss: 4.2319251747358066e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 105, loss: 4.07865627494175e-07, val acc: 0.764 (best 0.764), test acc: 0.766 (best 0.764)
In epoch 110, loss: 3.993507107225014e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 115, loss: 3.840238207430957e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 120, loss: 3.755089039714221e-07, val acc: 0.762 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 125, loss: 3.6358795796331833e-07, val acc: 0.762 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 130, loss: 3.5081561122751737e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 135, loss: 3.414492084630183e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 140, loss: 3.363402356626466e-07, val acc: 0.762 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 145, loss: 3.218648316760664e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 150, loss: 3.159043444611598e-07, val acc: 0.762 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 155, loss: 3.0568645570383524e-07, val acc: 0.762 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 160, loss: 2.988745109178126e-07, val acc: 0.764 (best 0.764), test acc: 0.765 (best 0.764)
In epoch 165, loss: 2.895080797316041e-07, val acc: 0.766 (best 0.766), test acc: 0.765 (best 0.765)
In epoch 170, loss: 2.792901625525701e-07, val acc: 0.766 (best 0.766), test acc: 0.765 (best 0.765)
In epoch 175, loss: 2.733296753376635e-07, val acc: 0.766 (best 0.766), test acc: 0.765 (best 0.765)
In epoch 180, loss: 2.673692165444663e-07, val acc: 0.764 (best 0.766), test acc: 0.765 (best 0.765)
In epoch 185, loss: 2.614087861729786e-07, val acc: 0.762 (best 0.766), test acc: 0.765 (best 0.765)
In epoch 190, loss: 2.53745326972421e-07, val acc: 0.762 (best 0.766), test acc: 0.765 (best 0.765)
In epoch 195, loss: 2.486363541720493e-07, val acc: 0.762 (best 0.766), test acc: 0.765 (best 0.765)

完整代码为

import dgl
import torch
import torch.nn as nn
import torch.nn.functional as F
from dgl.data import CoraGraphDataset
from dgl.nn import GraphConv

class GCN(nn.Module):
"""
    GCN network
"""
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

def train(g, model, num_epoch = 100, learning_rate =  0.001):
"""
    train function
"""
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    best_val_accurate = 0
    best_test_accurate = 0

    features = g.ndata["feat"]
    labels = g.ndata["label"]
    train_mask = g.ndata["train_mask"]
    test_mask = g.ndata["test_mask"]
    val_mask = g.ndata["val_mask"]

    for e in range(num_epoch):

        result = model(g, features)

        pred = result.argmax(dim=1)

        loss = F.cross_entropy(result[train_mask], labels[train_mask])

        train_accurate = (pred[train_mask]==labels[train_mask]).float().mean()
        test_accurate = (pred[test_mask]==labels[test_mask]).float().mean()
        val_accurate = (pred[val_mask]==labels[val_mask]).float().mean()
        if best_val_accurate < val_accurate:
            best_val_accurate, best_test_accurate = val_accurate, test_accurate

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if e % 5 == 0:
            print('In epoch {}, loss: {:.3f}, val acc: {:.3f} (best {:.3f}), test acc: {:.3f} (best {:.3f})'.format(
                e, loss, val_accurate, best_val_accurate, test_accurate, best_test_accurate))

def main():
    dataset = CoraGraphDataset()
    g = dataset[0]

    in_feats = g.ndata["feat"].shape[1]
    h_feats = 16
    num_classes = dataset.num_classes

    model = GCN(in_feats, h_feats, num_classes)
    train(g, model)

if __name__ == "__main__":
    main()

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.

In epoch 0, loss: 1.946, val acc: 0.104 (best 0.104), test acc: 0.114 (best 0.114)
In epoch 5, loss: 1.942, val acc: 0.276 (best 0.276), test acc: 0.314 (best 0.314)
In epoch 10, loss: 1.936, val acc: 0.452 (best 0.452), test acc: 0.452 (best 0.452)
In epoch 15, loss: 1.929, val acc: 0.546 (best 0.546), test acc: 0.549 (best 0.549)
In epoch 20, loss: 1.921, val acc: 0.612 (best 0.612), test acc: 0.631 (best 0.631)
In epoch 25, loss: 1.913, val acc: 0.640 (best 0.640), test acc: 0.647 (best 0.647)
In epoch 30, loss: 1.904, val acc: 0.654 (best 0.654), test acc: 0.670 (best 0.670)
In epoch 35, loss: 1.895, val acc: 0.684 (best 0.684), test acc: 0.692 (best 0.692)
In epoch 40, loss: 1.886, val acc: 0.690 (best 0.692), test acc: 0.695 (best 0.693)
In epoch 45, loss: 1.876, val acc: 0.700 (best 0.700), test acc: 0.694 (best 0.694)
In epoch 50, loss: 1.866, val acc: 0.706 (best 0.708), test acc: 0.701 (best 0.699)
In epoch 55, loss: 1.855, val acc: 0.710 (best 0.710), test acc: 0.698 (best 0.698)
In epoch 60, loss: 1.844, val acc: 0.708 (best 0.712), test acc: 0.702 (best 0.699)
In epoch 65, loss: 1.833, val acc: 0.704 (best 0.712), test acc: 0.702 (best 0.699)
In epoch 70, loss: 1.821, val acc: 0.702 (best 0.712), test acc: 0.704 (best 0.699)
In epoch 75, loss: 1.809, val acc: 0.704 (best 0.712), test acc: 0.705 (best 0.699)
In epoch 80, loss: 1.796, val acc: 0.706 (best 0.712), test acc: 0.704 (best 0.699)
In epoch 85, loss: 1.783, val acc: 0.702 (best 0.712), test acc: 0.706 (best 0.699)
In epoch 90, loss: 1.769, val acc: 0.694 (best 0.712), test acc: 0.703 (best 0.699)
In epoch 95, loss: 1.755, val acc: 0.692 (best 0.712), test acc: 0.706 (best 0.699)

使用GPU进行训练

在GPU上进行训练需要使用 to方法将模型和图都放到GPU上，PyTorch训练其他神经网络模型类似。

g = g.to('cuda')
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes).to('cuda')
train(g, model)

参考

翻译整理自Node Classification with DGL

Original: https://blog.csdn.net/huanghelouzi/article/details/116430387
Author: huanghelouzi
Title: 使用DGL完成节点分类任务

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/688971/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

高光谱目标检测论文学习（1）—— Hyperspectral Target Detection:Hypothesis Testing,SNR and SA Theories

前言从这篇开始，将会学习一些高光谱目标检测的论文，我仍然把这篇文章放到了读论文专栏里，但是不对其进行顺序编号了，因为这个方向是比较小众的。今天来学习这篇TGRS2021的最新论文…

人工智能 2023年6月21日
0094
毕设路线—pytorch环境下的深度学习的高光谱图像分类问题

毕设快要结束了，一路走来一直记录着点点滴滴的技术内容，主要想写给自己看吧，作为一个项目整理的大致框架，改完最终定稿，再填补每一部分的细节。另外如果以后有做这个方向的朋友看到了，希…

人工智能 2023年7月13日
0066
详解Transformer中Self-Attention以及Multi-Head Attention

原文名称：Attention Is All You Need原文链接：https://arxiv.org/abs/1706.03762 如果不想看文章的可以看下我在b站上录的视频：…

人工智能 2023年5月27日
0099
用python玩转办公软件（pandas数据分析）入门

使用pandas库进行数据分析教学文章目录使用pandas库进行数据分析教学 * 1、pandas介绍 2、csv文件介绍 3、pandas常用操作csv – （1…

人工智能 2023年7月8日
0072
2022知识图谱发展报告阅读——1 知识表示与建模

第一章知识表示与建模知识图谱发展报告（2022）下载链接 (访问密码: 2096) 第一章知识表示与建模一、任务定义、目标和研究意义二、研究内容与关键科学问题 1. 本体…

人工智能 2023年6月4日
0083
从零学习知识图谱——04（常见知识库及知识图谱的知识表示方法）

知识库 Cyc 是持续时间最久，影响范围较广，争议也较多的知识库项目。Cyc 知识库的知识表示框架主要由术语 Terms 和断言 Assertions 组成。Terms 包含概念、…

人工智能 2023年6月1日
0083
逻辑回归(Logistic Regression)

逻辑回归(Logistic Regression) 文章目录逻辑回归(Logistic Regression) * 1 变量表 2 逻辑回归模型 3 代价函数（Cost Func…

人工智能 2023年7月18日
0072
TensorFlow读书笔记

1.TensorFlow基础知识： Tensor（张量）意味着N维数组，Flow（流）意味着基于数据流图的计算.数据流图中的图就是我们所说的有向图,我们知道,在图这种数据结构中包含…

人工智能 2023年5月25日
0072
AI遮天传 ML-集成学习

“Two heads are better than one.” “三个臭皮匠，顶一个诸葛亮” 把多个人的智慧集合到一起，可能会比一…

人工智能 2023年6月24日
0078
语法纠错数据生成方法

语法纠错属于句子级的校对，需要检测并纠正句子中的错误，其中语法纠错目前已经公开的数据量不多，而现在解决语法纠错的问题主要是采用深度学习的模型，这些深度学习的模型需要大量的训练数据，…

人工智能 2023年5月30日
0075
业务模型设计

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年7月29日
0063
python数据科学导论-泰坦尼克号之数据分析

题目一:数据清洗及预处理一、首先导入读取csv的pandas包，然后读取训练数据集及预测数据集 import pandas as pd import numpy as np im…

人工智能 2023年6月11日
00106
有关不平衡学习与SMOTE算法

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言 * 因为最近的任务中运用到了Smote算法，但是我找了网上好多帖子都没有解决问题，因此去阅读了im…

人工智能 2023年7月16日
0040
Python数据可视化第三节

第三章；图表辅助元素的定制 3.1 认识图表常用的辅助元素图表的辅助元素是指除根据数据绘制的图形之外的元素，常用的辅助元素包括坐标轴、标题、图例、网格、参考线、注释文本和表格，他…

人工智能 2023年7月16日
0052
【预测模型-随机森林分类】基于随机森林算法实现数据分类附matlab代码

1 内容介绍 1.1.1 基本单元—决策树决策树是广泛用于分类和回归任务的模型，因其结构呈树形，故称决策树. 学习决策树，本质上讲就是学习一系列if/else问题，目标是通过尽可…

人工智能 2023年6月30日
0073
gendef和pexports

gendef 和 pexports gendef和pexports都可以从DLL和对应头文件中提取信息，创建相应的.def文件以列出每个DLL中可用的符号。然后，你可以用dllto…

人工智能 2023年6月26日
0068

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31