RNA 13. SCI 文章中加权基因共表达网络分析之 WGCNA

2023年7月18日上午3:37 • 人工智能 • 阅读 60

WGCNA 分析流程 2008 年发表在 BMC 之后的影响力还是很高的，先后在各大期刊都能看到，但是就其分析的过程来看，还是需要有一定 R 语言的基础才能完整的复现出来文章中的结果，这期就搞出来供大家参考，因为我自己尝试了一下结直肠癌的数据集，但是有些地方不太理想，故还是有软件包自带的数据来运行，终结起来就是输入两个文件，其他就是根据自己的数据集进行过滤，筛选，最终确定一个满意的基因集。

前言

相关网络在生物信息学中的应用日益广泛。例如，加权基因共表达网络分析是一种描述微阵列样本间基因相关性模式的系统生物学方法。加权相关网络分析(Weighted correlation network analysis, WGCNA)可用于寻找高度相关基因的聚类(模块)，利用模块特征基因或模块内的中心基因来总结这些聚类，利用特征基因网络方法(eigengene network methodology,WGCNA)将模块之间和模块与外部样本性状进行关联，以及用于计算模块成员度量。相关网络促进了基于网络的基因筛选方法，可用于识别候选生物标记物或治疗靶点。这些方法已成功应用于各种生物学背景，如癌症、小鼠遗传学、酵母遗传学和脑成像数据分析。虽然相关网络方法的部分内容已在单独的出版物中进行了描述，但仍需要提供一种用户友好的、全面的、一致的软件实现和辅助工具。

WGCNA软件包是一个R函数的综合集合，用于进行加权相关网络分析的各个方面。该软件包包括网络构建、模块检测、基因选择、拓扑性质计算、数据仿真、可视化以及与外部软件的接口等功能。除了R包，我们还提供了R软件教程。虽然这些方法的开发是由基因表达数据驱动的，但底层的数据挖掘方法可以应用于各种不同的设置。

WGCNA软件包为加权相关网络分析提供了R函数，如基因表达数据的共表达网络分析。计算方法较为复杂，这里略去，我们根据文献中给出来的WGCNA方法概述一步一步进行实际操作。构建基因共表达网络：使用加权的表达相关性。识别基因集：基于加权相关性，进行层级聚类分析，并根据设定标准切分聚类结果，获得不同的基因模块，用聚类树的分枝和不同颜色表示。如果有表型信息，计算基因模块与表型的相关性，鉴定性状相关的模块。研究模型之间的关系，从系统层面查看不同模型的互作网络。从关键模型中选择感兴趣的驱动基因，或根据模型中已知基因的功能推测未知基因的功能。导出TOM矩阵，绘制相关性图。

WGCNA 分析流程如下：

1.构建基因共表达网络：使用加权的表达相关性；

2.识别基因集：基于加权相关性，进行层级聚类分析，并根据设定标准切分聚类结果，获得不同的基因模块，用聚类树的分枝和不同颜色表示；

3.如果有表型信息，计算基因模块与表型的相关性，鉴定性状相关的模块；

4.研究模型之间的关系，从系统层面查看不同模型的互作网络；

5.从关键模型中选择感兴趣的驱动基因，或根据模型中已知基因的功能推测未知基因的功能；

6.导出TOM矩阵，绘制相关性图。

; 实例解析

WGCNA 整个流程的输入文件为 RNA-seq 测序结果计算获得的 Count 矩阵,通常我们需要转置输入数据，这样 rows = treatments and columns = gene probes，WGCNA 输出的是聚类基因列表和加权基因相关网络文件。WGCNA 数据过滤分析流程如下：

1. Data input and cleaning

软件包安装及加载，如下：

if(!require(WGCNA)){
    BiocManager::install("WGCNA")
}
library(WGCNA)

第一步，读取表达数据，并进行预处理成适合于网络分析的格式，并通过删除明显的离群样本、基因和缺失条目过多的样本来过滤数据。表达式数据包含在R包中，文件名为 LiverFemale3600.csv。

The following setting is important, do not omit.

options(stringsAsFactors = FALSE)
Read in the female liver data set
femData = read.csv("./FemaleLiver-Data/LiverFemale3600.csv")
Take a quick look at what is in the data set:
dim(femData)
## [1] 3600  143
head(names(femData))
## [1] "substanceBXH"   "gene_symbol"    "LocusLinkID"    "ProteomeID"
## [5] "cytogeneticLoc" "CHROMOSOME"
datExpr0 = as.data.frame(t(femData[, -c(1:8)]))
names(datExpr0) = femData$substanceBXH
rownames(datExpr0) = names(femData)[-c(1:8)]
gsg = goodSamplesGenes(datExpr0, verbose = 3)
##  Flagging genes and samples with too many missing values...

##   ..step 1
gsg$allOK
## [1] TRUE

If the last statement returns TRUE, all genes have passed the cuts. If not,
we #remove the offending genes and samples from the data:
if (!gsg$allOK) {
    # Optionally, print the gene and sample names that were removed:
    if (sum(!gsg$goodGenes) > 0)
        printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes],
            collapse = ", ")))
    if (sum(!gsg$goodSamples) > 0)
        printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples],
            collapse = ", ")))
    # Remove the offending genes and samples from the data:
    datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}
Next we cluster the samples (in contrast to clustering genes that will come
later) to see if there are any obvious outliers.

sampleTree = hclust(dist(datExpr0), method = "average")
Plot the sample tree: Open a graphic output window of size 12 by 9 inches The
user should change the dimensions if the window is too large or too small.

sizeGrWindow(12,9) pdf(file = 'Plots/sampleClustering.pdf', width = 12,
height = 9);
par(mar = c(0, 4, 2, 0), cex = 0.6)
plot(sampleTree, main = "Sample clustering to detect outliers", sub = "", xlab = "",
    cex.lab = 1.5, cex.axis = 1.5, cex.main = 2)
Plot a line to show the cut
abline(h = 15, col = "red")

Figure 1: Clustering dendrogram of samples based on their Euclidean distance.

添加表型信息，如下：

Determine cluster under the line
clust = cutreeStatic(sampleTree, cutHeight = 15, minSize = 10)
table(clust)
## clust
##   0   1
##   1 134
clust 1 contains the samples we want to keep.

keepSamples = (clust == 1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
traitData = read.csv("./FemaleLiver-Data/ClinicalTraits.csv")
dim(traitData)
## [1] 361  38
names(traitData)
##  [1] "X"                  "Mice"               "Number"
##  [4] "Mouse_ID"           "Strain"             "sex"
##  [7] "DOB"                "parents"            "Western_Diet"
## [10] "Sac_Date"           "weight_g"           "length_cm"
## [13] "ab_fat"             "other_fat"          "total_fat"
## [16] "comments"           "X100xfat_weight"    "Trigly"
## [19] "Total_Chol"         "HDL_Chol"           "UC"
## [22] "FFA"                "Glucose"            "LDL_plus_VLDL"
## [25] "MCP_1_phys"         "Insulin_ug_l"       "Glucose_Insulin"
## [28] "Leptin_pg_ml"       "Adiponectin"        "Aortic.lesions"
## [31] "Note"               "Aneurysm"           "Aortic_cal_M"
## [34] "Aortic_cal_L"       "CoronaryArtery_Cal" "Myocardial_cal"
## [37] "BMD_all_limbs"      "BMD_femurs_only"
remove columns that hold information we do not need.

allTraits = traitData[, -c(31, 16)]
allTraits = allTraits[, c(2, 11:36)]
dim(allTraits)
## [1] 361  27
names(allTraits)
##  [1] "Mice"               "weight_g"           "length_cm"
##  [4] "ab_fat"             "other_fat"          "total_fat"
##  [7] "X100xfat_weight"    "Trigly"             "Total_Chol"
## [10] "HDL_Chol"           "UC"                 "FFA"
## [13] "Glucose"            "LDL_plus_VLDL"      "MCP_1_phys"
## [16] "Insulin_ug_l"       "Glucose_Insulin"    "Leptin_pg_ml"
## [19] "Adiponectin"        "Aortic.lesions"     "Aneurysm"
## [22] "Aortic_cal_M"       "Aortic_cal_L"       "CoronaryArtery_Cal"
## [25] "Myocardial_cal"     "BMD_all_limbs"      "BMD_femurs_only"
Form a data frame analogous to expression data that will hold the clinical
traits.

femaleSamples = rownames(datExpr)
traitRows = match(femaleSamples, allTraits$Mice)
datTraits = allTraits[traitRows, -1]
rownames(datTraits) = allTraits[traitRows, 1]
collectGarbage()
Re-cluster samples
sampleTree2 = hclust(dist(datExpr), method = "average")
Convert traits to a color representation: white means low, red means high,
grey means missing entry
traitColors = numbers2colors(datTraits, signed = FALSE)
Plot the sample dendrogram and the colors underneath.

plotDendroAndColors(sampleTree2, traitColors, groupLabels = names(datTraits), main = "Sample dendrogram and trait heatmap")

Figure 2: Clustering dendrogram of samples based on their Euclidean distance.

save(datExpr, datTraits, file = "FemaleLiver-01-dataInput.RData")

2. Network construction and module detection

这一步是使用WGCNA方法进行的所有网络分析的基石。有三种不同的方式构建网络并识别模块，包括：a.采用第一步网络建设和模块检测功能，适合希望到达的用户以最小的努力获得结果; b.为想要尝试定制/替代方法的用户提供分步网络建设和模块检测; c.为希望分析数据的用户提供一种全数据块网络构建和模块检测自动检测方法集太大，无法集中分析。

首先选择软阈值能力，网络拓扑分析，如下：

Choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to = 20, by = 2))
Call the network topology analysis function
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
## pickSoftThreshold: will use block size 3600.

##  pickSoftThreshold: calculating connectivity for given powers...

##    ..working on genes 1 through 3600 of 3600
## Warning: executing %dopar% sequentially: no parallel backend registered
##    Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.

## 1      1   0.0278  0.345          0.456  747.00  762.0000 1210.0
## 2      2   0.1260 -0.597          0.843  254.00  251.0000  574.0
## 3      3   0.3400 -1.030          0.972  111.00  102.0000  324.0
## 4      4   0.5060 -1.420          0.973   56.50   47.2000  202.0
## 5      5   0.6810 -1.720          0.940   32.20   25.1000  134.0
## 6      6   0.9020 -1.500          0.962   19.90   14.5000   94.8
## 7      7   0.9210 -1.670          0.917   13.20    8.6800   84.1
## 8      8   0.9040 -1.720          0.876    9.25    5.3900   76.3
## 9      9   0.8590 -1.700          0.836    6.80    3.5600   70.5
## 10    10   0.8330 -1.660          0.831    5.19    2.3800   65.8
## 11    12   0.8530 -1.480          0.911    3.33    1.1500   58.1
## 12    14   0.8760 -1.380          0.949    2.35    0.5740   51.9
## 13    16   0.9070 -1.300          0.970    1.77    0.3090   46.8
## 14    18   0.9120 -1.240          0.973    1.39    0.1670   42.5
## 15    20   0.9310 -1.210          0.977    1.14    0.0951   38.7
Plot the results: sizeGrWindow(9, 5)
par(mfrow = c(1, 2))
cex1 = 0.9
Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[, 1], -sign(sft$fitIndices[, 3]) * sft$fitIndices[, 2], xlab = "Soft Threshold (power)",
    ylab = "Scale Free Topology Model Fit,signed R^2", type = "n", main = paste("Scale independence"))
text(sft$fitIndices[, 1], -sign(sft$fitIndices[, 3]) * sft$fitIndices[, 2], labels = powers,
    cex = cex1, col = "red")
this line corresponds to using an R^2 cut-off of h
abline(h = 0.9, col = "red")
Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[, 1], sft$fitIndices[, 5], xlab = "Soft Threshold (power)", ylab = "Mean Connectivity",
    type = "n", main = paste("Mean connectivity"))
text(sft$fitIndices[, 1], sft$fitIndices[, 5], labels = powers, cex = cex1, col = "red")

Figure 3: Analysis of network topology for various soft-thresholding powers.

softPower = 6
adjacency = adjacency(datExpr, power = softPower)
Turn adjacency into topological overlap
TOM = TOMsimilarity(adjacency)
## ..connectivity..

## ..matrix multiplication (system BLAS)..

## ..normalization..

## ..done.

dissTOM = 1 - TOM
Call the hierarchical clustering function
geneTree = hclust(as.dist(dissTOM), method = "average")
Plot the resulting clustering tree (dendrogram) sizeGrWindow(12,9)
plot(geneTree, xlab = "", sub = "", main = "Gene clustering on TOM-based dissimilarity",
    labels = FALSE, hang = 0.04)

Figure 4: Clustering dendrogram of genes, with dissimilarity based on topological overlap.

We like large modules, so we set the minimum module size relatively high:
minModuleSize = 30
Module identification using dynamic tree cut:
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM, deepSplit = 2, pamRespectsDendro = FALSE,
    minClusterSize = minModuleSize)
##  ..cutHeight not given, setting it to 0.996  ===>  99% of the (truncated) height range in dendro.

##  ..done.

table(dynamicMods)
## dynamicMods
##   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
##  88 614 316 311 257 235 225 212 158 153 121 106 102 100  94  91  78  76  65  58
##  20  21  22
##  58  48  34

Convert numeric lables into colors
dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)
## dynamicColors
##        black         blue        brown         cyan    darkgreen      darkred
##          212          316          311           94           34           48
##        green  greenyellow         grey       grey60    lightcyan   lightgreen
##          235          106           88           76           78           65
##  lightyellow      magenta midnightblue         pink       purple          red
##           58          153           91          158          121          225
##    royalblue       salmon          tan    turquoise       yellow
##           58          100          102          614          257
Plot the dendrogram and colors underneath sizeGrWindow(8,6)
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut", dendroLabels = FALSE,
    hang = 0.03, addGuide = TRUE, guideHang = 0.05, main = "Gene dendrogram and module colors")

Figure 5: Clustering dendrogram of genes, with dissimilarity based on topological overlap, together with assigned module colors.

Calculate eigengenes
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
Calculate dissimilarity of module eigengenes
MEDiss = 1 - cor(MEs)
Cluster module eigengenes
METree = hclust(as.dist(MEDiss), method = "average")
Plot the result sizeGrWindow(7, 6)
plot(METree, main = "Clustering of module eigengenes", xlab = "", sub = "")
MEDissThres = 0.25
Plot the cut line into the dendrogram
abline(h = MEDissThres, col = "red")

Figure 6: Clustering dendrogram of genes, with dissimilarity based on topological overlap, together with assigned merged module colors and the original module colors.

Call an automatic merging function
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)

##  mergeCloseModules: Merging modules whose distance is less than 0.25
##    multiSetMEs: Calculating module MEs.

##      Working on set 1 ...

##      moduleEigengenes: Calculating 23 module eigengenes in given set.

##    multiSetMEs: Calculating module MEs.

##      Working on set 1 ...

##      moduleEigengenes: Calculating 19 module eigengenes in given set.

##    Calculating new MEs...

##    multiSetMEs: Calculating module MEs.

##      Working on set 1 ...

##      moduleEigengenes: Calculating 19 module eigengenes in given set.

The merged module colors
mergedColors = merge$colors
Eigengenes of the new merged modules:
mergedMEs = merge$newMEs
sizeGrWindow(12, 9) pdf(file = 'Plots/geneDendro-3.pdf', wi = 9, he = 6)
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors), c("Dynamic Tree Cut",
    "Merged dynamic"), dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05)

Figure 7: Clustering dendrogram of genes, with dissimilarity based on topological overlap

3. Relating modules to external clinical traits and identifying important genes

将模块与外部临床特征相关联，如下：

Rename to moduleColors
moduleColors = mergedColors
Construct numerical labels corresponding to the colors
colorOrder = c("grey", standardColors(50))
moduleLabels = match(moduleColors, colorOrder) - 1
MEs = mergedMEs
Save module colors and labels for use in subsequent parts
save(MEs, moduleLabels, moduleColors, geneTree, file = "FemaleLiver-02-networkConstruction-stepByStep.RData")
Define numbers of genes and samples
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
Recalculate MEs with color labels
MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(MEs, datTraits, use = "p")
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)
sizeGrWindow(10,6) Will display correlations and their p-values
textMatrix = paste(signif(moduleTraitCor, 2), "\n(", signif(moduleTraitPvalue, 1),
    ")", sep = "")
dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3))
Display the correlation values within a heatmap plot
labeledHeatmap(Matrix = moduleTraitCor, xLabels = names(datTraits), yLabels = names(MEs),
    ySymbols = names(MEs), colorLabels = FALSE, colors = greenWhiteRed(50), textMatrix = textMatrix,
    setStdMargins = FALSE, cex.text = 0.3, zlim = c(-1, 1), main = paste("Module-trait relationships"))
## Warning in greenWhiteRed(50): WGCNA::greenWhiteRed: this palette is not suitable for people
## with green-red color blindness (the most common kind of color blindness).

## Consider using the function blueWhiteRed instead.

Figure 8: Module-trait associations.

模内分析：鉴定高 GS 和 MM 基因

Define variable weight containing the weight column of datTrait
weight = as.data.frame(datTraits$weight_g)
names(weight) = "weight"
names (colors) of the modules
modNames = substring(names(MEs), 3)
geneModuleMembership = as.data.frame(cor(datExpr, MEs, use = "p"))
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples))
names(geneModuleMembership) = paste("MM", modNames, sep = "")
names(MMPvalue) = paste("p.MM", modNames, sep = "")
geneTraitSignificance = as.data.frame(cor(datExpr, weight, use = "p"))
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples))
names(geneTraitSignificance) = paste("GS.", names(weight), sep = "")
names(GSPvalue) = paste("p.GS.", names(weight), sep = "")
module = "brown"
column = match(module, modNames)
moduleGenes = moduleColors == module
sizeGrWindow(7, 7);
par(mfrow = c(1, 1))
verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]), abs(geneTraitSignificance[moduleGenes,
    1]), xlab = paste("Module Membership in", module, "module"), ylab = "Gene significance for body weight",
    main = paste("Module membership vs. gene significance\n"), cex.main = 1.2, cex.lab = 1.2,
    cex.axis = 1.2, col = module)

Figure 9: A scatterplot of Gene Significance (GS) for weight vs. Module Membership (MM) in the brown module

head(names(datExpr)[moduleColors == "brown"])
## [1] "MMT00000149" "MMT00000283" "MMT00000550" "MMT00000701" "MMT00001190"
## [6] "MMT00001555"
annot = read.csv(file = "./FemaleLiver-Data/GeneAnnotation.csv")
probes = names(datExpr)
probes2annot = match(probes, annot$substanceBXH)
The following is the number or probes without annotation:
sum(is.na(probes2annot))
## [1] 0
Should return 0.  Create the starting data frame
geneInfo0 = data.frame(substanceBXH = probes, geneSymbol = annot$gene_symbol[probes2annot],
    LocusLinkID = annot$LocusLinkID[probes2annot], moduleColor = moduleColors, geneTraitSignificance,
    GSPvalue)
Order modules by their significance for weight
modOrder = order(-abs(cor(MEs, weight, use = "p")))
Add module membership information in the chosen order
for (mod in1:ncol(geneModuleMembership)) {
    oldNames = names(geneInfo0)
    geneInfo0 = data.frame(geneInfo0, geneModuleMembership[, modOrder[mod]], MMPvalue[,
        modOrder[mod]])
    names(geneInfo0) = c(oldNames, paste("MM.", modNames[modOrder[mod]], sep = ""),
        paste("p.MM.", modNames[modOrder[mod]], sep = ""))
}
Order the genes in the geneInfo variable first by module color, then by
geneTraitSignificance
geneOrder = order(geneInfo0$moduleColor, -abs(geneInfo0$GS.weight))
geneInfo = geneInfo0[geneOrder, ]
write.csv(geneInfo, file = "geneInfo.csv")

4. Interfacing network analysis with other data such as functional annotation and gene ontology

我们之前的分析已经确定了几个高度相关的模块（标记为 brown, red, and salmon）与重量。为了便于生物学解释，我们想知道基因的基因本体模块，它们是否在某些功能类别中显着丰富等。

Read in the probe annotation annot = read.csv(file =
'./FemaleLiver-Data/GeneAnnotation.csv'); Match probes in the data set to the
probe IDs in the annotation file
probes = names(datExpr)
probes2annot = match(probes, annot$substanceBXH)
Get the corresponding Locuis Link IDs
allLLIDs = annot$LocusLinkID[probes2annot]
$ Choose interesting modules
intModules = c("brown", "red", "salmon")
for (module in intModules) {
    # Select module probes
    modGenes = (moduleColors == module)
    # Get their entrez ID codes
    modLLIDs = allLLIDs[modGenes]
    # Write them into a file
    fileName = paste("LocusLinkIDs-", module, ".txt", sep = "")
    write.table(as.data.frame(modLLIDs), file = fileName, row.names = FALSE, col.names = FALSE)
}
As background in the enrichment analysis, we will use all probes in the
analysis.

fileName = paste("LocusLinkIDs-all.txt", sep = "")
write.table(as.data.frame(allLLIDs), file = fileName, row.names = FALSE, col.names = FALSE)

GO 富集分析，如下：

GOenr = GOenrichmentAnalysis(moduleColors, allLLIDs, organism = "mouse", nBestP = 10)
## Warning in GOenrichmentAnalysis(moduleColors, allLLIDs, organism = "mouse", : This function is deprecated and will be removed in the near future.

## We suggest using the replacement function enrichmentAnalysis
## in R package anRichment, available from the following URL:
## https://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/GeneAnnotation/
##  GOenrichmentAnalysis: loading annotation data...

##   ..of the 3038  Entrez identifiers submitted, 2843 are mapped in current GO categories.

##   ..will use 2843 background genes for enrichment calculations.

##   ..preparing term lists (this may take a while)..

##   ..working on label set 1 ..

##     ..calculating enrichments (this may also take a while)..

##     ..putting together terms with highest enrichment significance..

tab = GOenr$bestPTerms[[4]]$enrichment
write.table(tab, file = "GOEnrichmentTable.csv", sep = ",", quote = TRUE, row.names = FALSE)
keepCols = c(1, 2, 5, 6, 7, 12, 13)
screenTab = tab[, keepCols]
Round the numeric columns to 2 decimal places:
numCols = c(3, 4)
screenTab[, numCols] = signif(apply(screenTab[, numCols], 2, as.numeric), 2)
Truncate the the term name to at most 40 characters
screenTab[, 7] = substring(screenTab[, 7], 1, 40)
Shorten the column names:
colnames(screenTab) = c("module", "size", "p-val", "Bonf", "nInTerm", "ont", "term name")
rownames(screenTab) = NULL
Set the width of R&#x2019;s output. The reader should play with this number to
obtain satisfactory output.

options(width = 95)
Finally, display the enrichment table:
head(screenTab)
##   module size   p-val Bonf nInTerm ont                             term name
## 1  black  167 5.9e-05    1      15  MF signaling receptor activator activity
## 2  black  167 5.9e-05    1      15  MF              receptor ligand activity
## 3  black  167 1.2e-04    1      15  MF signaling receptor regulator activity
## 4  black  167 6.8e-04    1       5  BP                        mRNA transport
## 5  black  167 7.1e-04    1       4  BP                    dopamine transport
## 6  black  167 1.7e-03    1       6  BP                       amine transport

5. Network visualization using WGCNA functions

可视化基因网络，对所有基因做图，如下：

nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
Calculate topological overlap anew: this could be done more efficiently by saving the TOM
calculated during module detection, but let us do it again here.

dissTOM = 1 - TOMsimilarityFromExpr(datExpr, power = 6)
## TOM calculation: adjacency..

## ..will not use multithreading.

##  Fraction of slow calculations: 0.396405
## ..connectivity..

## ..matrix multiplication (system BLAS)..

## ..normalization..

## ..done.

Transform dissTOM with a power to make moderately strong connections more visible in the
heatmap
plotTOM = dissTOM^7
Set diagonal to NA for a nicer plot
diag(plotTOM) = NA
Call the plot function sizeGrWindow(9,9)
TOMplot(plotTOM, geneTree, moduleColors, main = "Network heatmap plot, all genes")

Figure 10: Visualizing the gene network using a heatmap plot.

对所有基因进行筛选，在绘制热图，如下：

nSelect = 400
For reproducibility, we set the random seed
set.seed(10)
select = sample(nGenes, size = nSelect)
selectTOM = dissTOM[select, select]
There&#x2019;s no simple way of restricting a clustering tree to a subset of genes, so we must
re-cluster.

selectTree = hclust(as.dist(selectTOM), method = "average")
selectColors = moduleColors[select]
Open a graphical window sizeGrWindow(9,9) Taking the dissimilarity to a power, say 10, makes
the plot more informative by effectively changing the color palette; setting the diagonal to
NA also improves the clarity of the plot
plotDiss = selectTOM^7
diag(plotDiss) = NA
TOMplot(plotDiss, selectTree, selectColors, main = "Network heatmap plot, selected genes")

Figure 11: Visualizing the gene network using a heatmap plot

可视化特征基因网络，如下：

Recalculate module eigengenes
MEs = moduleEigengenes(datExpr, moduleColors)$eigengenes
Isolate weight from the clinical traits
weight = as.data.frame(datTraits$weight_g)
names(weight) = "weight"
Add the weight to existing module eigengenes
MET = orderMEs(cbind(MEs, weight))
Plot the relationships among the eigengenes and the trait sizeGrWindow(5,7.5);
par(cex = 1)
plotEigengeneNetworks(MET, "Eigengene dendrogram", marDendro = c(0, 4, 2, 0), plotHeatmaps = FALSE)

Figure 12: Visualization of the eigengene network representing the relationships among the modules and the clinical trait weight.

Plot the heatmap matrix (note: this plot will overwrite the dendrogram plot)
par(cex = 1)
plotEigengeneNetworks(MET, "Eigengene adjacency heatmap", marHeatmap = c(3, 4, 2, 2), plotDendrograms = FALSE,
    xLabelsAngle = 90)

Figure 13: Visualization of the eigengene network representing the relationships among the modules and the clinical trait weight.

6. Export of networks to external software

将网络数据导出到网络可视化软件中，有两种方式： 1.VisANT
2.Cytoscape
但是每种软件有自己的输入文件要求，根据代码生成就可以了，如下： 1. 输入到 VisANT 软件进行基因网络的互作绘图。

Recalculate topological overlap
TOM = TOMsimilarityFromExpr(datExpr, power = 6)
## TOM calculation: adjacency..

## ..will not use multithreading.

##  Fraction of slow calculations: 0.396405
## ..connectivity..

## ..matrix multiplication (system BLAS)..

## ..normalization..

## ..done.

Read in the annotation file annot = read.csv(file =
'./FemaleLiver-Data/GeneAnnotation.csv'); Select module
module = "brown"
Select module probes
probes = names(datExpr)
inModule = (moduleColors == module)
modProbes = probes[inModule]
Select the corresponding Topological Overlap
modTOM = TOM[inModule, inModule]
dimnames(modTOM) = list(modProbes, modProbes)
Export the network into an edge list file VisANT can read
vis = exportNetworkToVisANT(modTOM, file = paste("VisANTInput-", module, ".txt", sep = ""), weighted = TRUE,
    threshold = 0, probeToGene = data.frame(annot$substanceBXH, annot$gene_symbol))
nTop = 30
IMConn = softConnectivity(datExpr[, modProbes])
##  softConnectivity: FYI: connecitivty of genes with less than 45 valid samples will be returned as NA.

##  ..calculating connectivities..

top = (rank(-IMConn) <= ntop) vis="exportNetworkToVisANT(modTOM[top," top], file="paste("VisANTInput-"," module, "-top30.txt", sep ), weighted="TRUE," threshold="0," probetogene="data.frame(annot$substanceBXH," annot$gene_symbol)) < code></=>

输入到 Cytoscape 软件进行基因网络的互作绘图。

Recalculate topological overlap if needed
TOM = TOMsimilarityFromExpr(datExpr, power = 6)
## TOM calculation: adjacency..

## ..will not use multithreading.

##  Fraction of slow calculations: 0.396405
## ..connectivity..

## ..matrix multiplication (system BLAS)..

## ..normalization..

## ..done.

Read in the annotation file annot = read.csv(file =
'./FemaleLiver-Data/GeneAnnotation.csv'); Select modules
modules = c("brown", "red")
Select module probes
probes = names(datExpr)
inModule = is.finite(match(moduleColors, modules))
modProbes = probes[inModule]
modGenes = annot$gene_symbol[match(modProbes, annot$substanceBXH)]
Select the corresponding Topological Overlap
modTOM = TOM[inModule, inModule]
dimnames(modTOM) = list(modProbes, modProbes)
Export the network into edge and node list files Cytoscape can read
cyt = exportNetworkToCytoscape(modTOM, edgeFile = paste("CytoscapeInput-edges-", paste(modules,
    collapse = "-"), ".txt", sep = ""), nodeFile = paste("CytoscapeInput-nodes-", paste(modules,
    collapse = "-"), ".txt", sep = ""), weighted = TRUE, threshold = 0.02, nodeNames = modProbes,
    altNodeNames = modGenes, nodeAttr = moduleColors[inModule])

Figure 14: Visualization of the network connections among the most connected genes in the brown module, generated by the VisANT software.

这个软件包个人感觉还需要一定改进，后期修改一下便于代码运用不太好的小伙伴们，实现一键生成图片，这样可以更好的方便大家使用！

关注公众号，每日更新，扫码进群交流不停歇，马上就出视频版，关注我，您最佳的选择！

桓峰基因

生物信息分析，SCI文章撰写及生物信息基础知识学习：R语言学习，perl基础编程，linux系统命令，Python遇见更好的你

50篇原创内容

公众号

References:

Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. Published 2008 Dec 29. doi:10.1186/1471-2105-9-559
Zhang, B. and Horvath, S., 2005. A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology, 4(1).
Zhenjun Hu, Evan S. Snitkin, and Charles DeLisi. VisANT: an integrative framework for networks in systems biology. Brief Bioinform, 9(4):317–325, 2008.
Paul Shannon, Andrew Markiel, Owen Ozier, Nitin S. Baliga, Jonathan T. Wang, Daniel Ramage, Nada Amin, Benno Schwikowski, and Trey Ideker. Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research, 13(11):2498–2504, 200

Original: https://blog.csdn.net/weixin_41368414/article/details/123470994
Author: 桓峰基因
Title: RNA 13. SCI 文章中加权基因共表达网络分析之 WGCNA

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/700055/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

一文带你了解「图数据库」Nebula 的存储设计和思考

本文首发于 Nebula Graph Community 公众号在上次的 nebula-storage on nLive 直播中，来自 Nebula 存储团队的负责人王玉珏（四王…

人工智能 2023年6月1日
00148
GDAL 安装教程（Python）

引言本文介绍 GDAL（Geospatial Data Abstraction Library，空间数据抽象库）的 Python 版本安装教程。简介 GDAL 是用于栅格和矢量…

人工智能 2023年7月5日
0062
Ubuntu20.04安装k8s v1.21.0

禁用swap分区, 修改网络配置 sudo vim /etc/fstab把有swap的那一行注释掉即可，如下：然后执行如下命令 cat <<EOF | sudo tee…

人工智能 2023年6月28日
0089
KNN算法实现鸢尾花数据集分类

KNN算法实现鸢尾花数据集分类作者介绍数据集介绍 KNN算法介绍用KNN实现鸢尾花分类作者介绍乔冠华，女，西安工程大学电子信息学院，2020级硕士研究生，张宏伟人工智能课…

人工智能 2023年7月1日
0097
OpenCV（25）轮廓检测（轮廓提取、属性、近似轮廓、外接矩形和外接圆）

目录一、轮廓检测基础理论 1、轮廓概述 2、API介绍 1、cv.findContours函数（查找轮廓） 2、cv.drawContours函数（画出轮廓）检测轮廓并画出：（…

人工智能 2023年6月26日
0067
新技术的成熟、商业模式的完备，产业互联网的落地提供了土壤

仅仅只是站在互联网的角度来看待产业互联网，只会把产业互联网带入到互联网的发展怪圈之中。这是我们看到如此多的产业互联网玩家投身其中，却一直并未有所突破的关键原因。纵然是那些头部的互联…

人工智能 2023年5月30日
0095
保姆级教程 – atlas500部署yolov3-tiny检测实时视频流 [2] – yolov3-tiny模型转换到om模型

保姆级教程 – atlas500部署yolov3-tiny检测实时视频流 [2] – yolov3-tiny模型转换到om模型接上文 -> 内网环境…

人工智能 2023年7月9日
0088
pytorch torchvision pip本地安装,可快速安装

pip 快速安装pytorch torchvision1.先在命令窗口查看自己的cuda对应版本，如果有独立显卡，可使用cuda加速器处理，输入指令为nvidia-smi,我的版本…

人工智能 2023年7月21日
0098
快速傅里叶变换（FFT）基础

参考博文频谱分析是一种将复杂信号分解为较简单信号的技术。首先来很清楚几个概念： FT（Fourier Transformation）: 傅里叶变换。其时域信号、频域信号都是连续…

人工智能 2023年7月5日
00109
解决调用torch_geometric报错No module named ‘torch_sparse‘等问题，以及torch_sparse torch_scatter等的安装问题

出现的问题：torch_geometric报错会出现为torch_sparse torch_scatter等的问题最近又开始搞图神经网络方面的东西，要用到 torch_geom…

人工智能 2023年7月23日
0085
VONR介绍

1 VoNR信令流程 5G的网络架构其实承袭自4G，只支持分组交换，不支持电路交换，也就是说自身的5GC核心网是没法支撑语音业务的，必须依赖于一个叫做IMS的系统。 IMS又叫IP…

人工智能 2023年5月27日
00101
论文阅读 AutoSF: Searching Scoring Functions for Knowledge Graph Embedding

AutoSF: Searching Scoring Functions for Knowledge Graph Embedding AutoSF:知识图嵌入的搜索评分函数发表期刊…

人工智能 2023年6月1日
0096
美颜磨皮算法之保边（双边&引导）滤波器原理及 Python 实现

保边滤波是对图像操作后，不会模糊边缘的部分（如下图所示），属于非线性的滤波方法，常见的保边滤波有双边滤波和引导滤波，典型应用场景是去噪，磨皮，扣图本文介绍两种保边滤波器，分别是…

人工智能 2023年6月20日
00113
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

基本简介论文下载地址：https://arxiv.org/abs/1612.00593代码开源地址：https://github.com/charlesq34/pointnet作…

人工智能 2023年6月30日
0089
ICCV21 – 无监督语义分割《Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals》

文章目录 * – 原文地址 – 初识 – 相知 – 回顾原文地址点我跳转到原文初识在无监督设置下，学习密集语义表征(dens…

人工智能 2023年6月15日
00129
torchvision.transforms.RandomResizedCrop方法解读

(1)Crop：随机大小和随机宽高比的裁剪，且随机的范围可以指定。 (2)Resize：Resize到指定的大小。先进行随机大小和随机宽高比的Crop操作，再对Crop出来的区域…

人工智能 2023年7月21日
0061

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31