跟着Cell学单细胞转录组分析(十二):转录因子分析

2023年7月15日下午8:23 • 人工智能 • 阅读 55

转录因子分析可以了解细胞异质性背后的基因调控网络的异质性。转录因子分析也是单细胞转录组常见的分析内容，R语言分析一般采用的是SCENIC包，具体原理可参考两篇文章。1、《SCENIC : single-cell regulatory networkinference and clustering》。2、《Ascalable SCENIC workflow for single-cell gene regulatory network analysis》。 但是说在前头，SCENIC的计算量超级大，非常耗费内存和时间，如非必要，不要用一般的电脑分析尝试。可以借助服务器完成分析，或者减少分析细胞数，再或者使用SCENIC的Python版本。这里我们也是仅仅进行演示，数据没有实际意义，人为减少了基因与细胞，然而就这也很费时间。重要的是看看流程。

首先开始前，需要做两件事。第一毫无疑问是安装和加载R包，需要的比较多，如果没有请安装。第二则是下载基因注释配套数据库。

library(Seurat)
library(tidyverse)
library(foreach)
library(RcisTarget)
library(doParallel)
library(SCopeLoomR)
library(AUCell)
BiocManager::install(c("doMC", "doRNG"))
library(doRNG)
BiocManager::install("GENIE3")
library(GENIE3)
#if (!requireNamespace("devtools", quietly = TRUE))
devtools::install_github("aertslab/SCENIC")
packageVersion("SCENIC")
library(SCENIC)
#&#x8FD9;&#x91CC;&#x4E0B;&#x8F7D;&#x4EBA;&#x7684;
dbFiles <- c("https: resources.aertslab.org cistarget databases homo_sapiens hg19 refseq_r45 mc9nr gene_based hg19-500bp-upstream-7species.mc9nr.feather", "https: hg19-tss-centered-10kb-7species.mc9nr.feather") for(featherurl in dbfiles) { download.file(featherurl, destfile="basename(featherURL))" } < code></->

接着就是构建分析文件。


#&#x6784;&#x5EFA;&#x5206;&#x6790;&#x6570;&#x636E;
exprMat <- as.matrix(immune@assays$rna@data)#表达矩阵 exprmat[1:4,1:4]#查看数据 cellinfo <- immune@meta.data[,c("celltype","ncount_rna","nfeature_rna")] colnames(cellinfo) c('celltype', 'ngene' ,'numi') head(cellinfo) table(cellinfo$celltype) #构建scenicoptions对象，接下来的scenic分析都是基于这个对象的信息生成的 scenicoptions initializescenic(org="hgnc" , dbdir="F:/cisTarget_databases" ncores="1)" < code></->

构建共表达网络，最后一步很费时间。


Co-expression network
genesKept <- 1 2 3 4 5 6 7 8 9 10 676 1839 genefiltering(exprmat, scenicoptions) exprmat.filtered <- exprmat[geneskept, ] exprmat.filtered[1:4,1:4] runcorrelation(exprmat.filtered, exprmat.filtered.log log2(exprmat.filtered + 1) rungenie3(exprmat.filtered.log, #using tfs as potential regulators... #running genie3 part #finished running genie3. #warning message: #in : # only (37%) of the in database were found dataset. do they use same gene ids? < code></->

构建基因调控网络GRN并进行AUC评分。也是耗费时间的过程。运行完成的结果就是整个分析得到的内容，需要按照自己的目的去筛选。


Build and score the GRN
scenicOptions <- runscenic_1_coexnetwork2modules(scenicoptions) scenicoptions <- runscenic_2_createregulons(scenicoptions) exprmat_log log2(exprmat + 1) runscenic_3_scorecells(scenicoptions,exprmat_log) runscenic_4_aucell_binarize(scenicoptions) saverds(scenicoptions, file="int/scenicOptions.Rds" ) < code></->

以下是运行记录


>scenicOptions <- 1 2 3 13 27 149 174 236 436 500 523 617 671 676 12551 22290 1993247 runscenic_2_createregulons(scenicoptions) 13:21 step 2. identifying regulons tfmodulessummary: [,1] top5pertargetandtop3sd top5pertargetandtop50 top1sdandtop10pertarget top50pertargetandtop1sd top50andtop10pertarget w0.005 w0.005andtop1sd top50pertarget top50andtop3sd top3sd top50 w0.005andtop50pertarget top1sd top5pertarget top10pertarget w0.001 rcistarget: calculating auc scoring database: [source file: hg19-500bp-upstream-7species.mc9nr.feather] hg19-tss-centered-10kb-7species.mc9nr.feather] not all characters in c: users liuhl desktop 1.r could be decoded using cp936. to try a different encoding, choose "file | reopen with encoding..." from the main menu.17:17 adding motif annotation biocparallel... number of motifs initial enrichment: annotated matching tf: 17:38 pruning targets 19:04 that support regulons: preview enrichment saved as: output step2_motifenrichment_preview.html there were warnings (use warnings() see them)> exprMat_log <- log2(exprmat + 1)> scenicOptions <- 318 runscenic_3_scorecells(scenicoptions,exprmat_log) 19:06 step 3. analyzing the network activity in each individual cell number of regulons to evaluate on cells: biggest (non-extended) regulons: elf1 (1760g) ets1 (1734g) fli1 (1604g) elk3 (1493g) polr2a (1453g) chd2 (1251g) etv3 (1249g) elk4 (1148g) taf1 (974g) erg (956g) quantiles for genes detected by cell: (non-detected are shuffled at end ranking. keep it mind when choosing threshold calculating auc). min 1% 5% 10% 50% 100% 205.00 224.76 276.90 321.40 695.00 997.00 warning .aucell_calcauc(genesets="geneSets," rankings="rankings," ncores="nCores," : using only first (aucmaxrank) calculate auc. 19:07 finished running aucell. plotting heatmap... t-snes... message: max(denscurve$y[nextmaxs]) max里所有的参数都不存在；回覆-inf> scenicOptions <- 168 207 439 runscenic_4_aucell_binarize(scenicoptions) binary regulon activity: tf regulons x cells. (299 including 'extended' versions) are active in more than 1% (4.39)> saveRDS(scenicOptions, file = "int/scenicOptions.Rds")
</-></-></-></->

每一步的分析结果SCENIC都会自动保存在所创建的int和out文件夹。接下来对结果进行可视化，这里是随机选的，没有生物学意义。实际情况是要根据自己的研究目的。

1、可视化转录因子与seurat细胞分群联动


#regulons AUC
AUCmatrix <- readrds("int 3.4_regulonauc.rds") aucmatrix <- aucmatrix@assays@data@listdata$auc data.frame(t(aucmatrix), check.names="F)" regulonname_auc colnames(aucmatrix) gsub(' \\(','_',regulonname_auc) gsub('\\)','',regulonname_auc) scrnaauc addmetadata(immune, aucmatrix) scrnaauc@assays$integrated null saverds(scrnaauc,'immuneauc.rds') #二进制regulo auc binmatrix 4.1_binaryregulonactivity.rds") data.frame(t(binmatrix), regulonname_bin colnames(binmatrix) \\(','_',regulonname_bin) gsub('\\)','',regulonname_bin) scrnabin binmatrix) scrnabin@assays$integrated saverds(scrnabin, 'immunebin.rds') < code></->

作图使用Seurat中FeaturePlot函数。小提琴图也是可以的。


FeaturePlot(scRNAauc, features='CEBPB_extended_1035g', label=T, reduction = 'umap')
FeaturePlot(scRNAbin, features='CEBPB_extended_1035g', label=T, reduction = 'umap')

2、最常见的热图，选择需要可视化的regulons。

library(pheatmap)
celltype = subset(cellInfo,select = 'CellType')
AUCmatrix <- t(aucmatrix) binmatrix <- t(binmatrix) regulons c('cebpb_extended_1035g','runx1_extended_200g', 'foxc1_extended_100g','mybl1_extended_112g', 'irf1_extended_1785g', 'elf1_1760g','elf1_extended_2165g', 'irf1_extended_1765g','ets1_extended_2906g', 'yy1_extended_1453g','atf3_extended_1022g', 'e2f4_extended_637g', 'klf2_12g','hes1_13g', 'gata3_20g','hoxb2_extended_362g', 'sox4_extended_10g', 'runx3_extended_170g','egr3_extended_23g', 'mxd4_extended_182g','hoxd9_extended_25g') aucmatrix aucmatrix[rownames(aucmatrix)%in%regulons,] binmatrix[rownames(binmatrix)%in%regulons,] pheatmap(aucmatrix, show_colnames="F," annotation_col="celltype," width="6," height="5)" pheatmap(binmatrix, color="colorRampPalette(colors" = c("white","black"))(100), < code></->

好了，以上是一个基本的流程演示，具体怎么用这个结果，如何解读，可以参考相关的高分文献，了解分析原理，与自己的研究相结合。
更多精彩内容请访问我的个人公众号《KS科研分享与服务》！

Original: https://blog.csdn.net/qq_42090739/article/details/123701891
Author: TS的美梦
Title: 跟着Cell学单细胞转录组分析(十二):转录因子分析

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/694997/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

SpringBoot整合MongoDB

1、集成简介 spring-data-mongodb提供了MongoTemplate与MongoRepository两种方式访问mongodb，MongoRepository操作简…

人工智能 2023年6月28日
0055
HOI经典论文WACV 2018 | Learning to Detect Human-Object Interactions 论文学习笔记

本文研究内容为在静态图像种检测人物交互关系。其被定义为预测一个人、物边界框，以及一个将他们联系起来的交互类标签。HOI检测是计算机视觉种一个基本的问题，因为它提供了被检测物体之间交…

人工智能 2023年7月1日
0069
ConvNet—20年代的卷积神经网络

A ConvNet for the 2020s 摘要 1. 引言 2. ConvNet 现代化：路线图 * 2.1 训练技巧 2.2 宏观设计 2.3 ResNeXt-ify 2….

人工智能 2023年7月27日
0049
java毕业设计海滨体育馆管理系统（附源码、数据库）

项目运行环境配置： Jdk1.8 + Tomcat8.5 + Mysql + HBuilderX（Webstorm也行）+ Eclispe（IntelliJ IDEA,Eclis…

人工智能 2023年6月27日
0058
Mutli-Head Attention 和 Self-Attention 的区别与联系

最近在阅读论文的过程中接触到了Multi-Head Attention的结构，脑子里的第一反应是都叫Attention，那Mutli-Head Attention 和 Self-A…

人工智能 2023年5月28日
0091
实验5 支持向量机分类实验

一、实验要求在计算机上验证和测试莺尾花数据的支持向量机分类实验，sklearn的支持向量机分类算法。实验目的 1、掌握支持向量机的原理 2、能够理解支持向量机分类算法； 3、掌…

人工智能 2023年7月2日
0074
微信小程序组件化

组件定义 1、创建组件构造器使用的时Component 配置文件中设置component:true 2、引入组件首先声明这个组件，在配置文件声明 "usingComp…

人工智能 2023年7月29日
00100
极智AI | Tengine 模型转换及量化

欢迎关注我的公&#…

人工智能 2023年5月26日
0095
宝藏机器学习资料分享(超高质量pdf直接下载)

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年6月24日
0092
图像识别基础代码汇总（python+opencv）

为了方便复制粘贴，汇总一下基础图像处理代码（如有遗漏欢迎指出，后续再添加修改）没有原理讲解，我也是个小白，方便日后写代码直接复制使用做的笔记一、导入需要用的设置二、读入、显示、…

人工智能 2023年6月18日
0076
集成学习之Stacking（堆栈）方法

文章目录集成学习（Ensemble learning） Stacking（堆栈）方法定义 Stacking中的交叉验证 Stacking中的过拟合问题其他集成学习（Ensem…

人工智能 2023年6月23日
00135
车路协同智能路侧决策系统边缘计算系统功能技术要求

1 范围车路协同智能路侧决策系统边缘计算系统功能技术要求。本文件旨在规定车路协同智能路侧决策系统中路侧计算系统的功能要点、软件架构、数据结构和性能参数等。该系统可以统一接入…

人工智能 2023年6月30日
0083
分类算法模型的评价标准

分类模型的评价标准目录分类模型的评价标准 * 1.混淆矩阵 – 概念 2.准确率 3.召回率(较多使用) 4.精确率 5. f1_score：精确率和召回率的调和平…

人工智能 2023年7月2日
0090
【LeetCode】647. 回文子串

题目描述给你一个字符串 s ，请你统计并返回这个字符串中回文子串的数目。回文字符串是正着读和倒过来读一样的字符串。子字符串是字符串中的由连续字符组成的一个序列。具有不同…

人工智能 2023年6月27日
0069
FasterRCNN

FasterRCNN 论文：”Faster R-CNN: Towards Real-Time Object Detection with Region Proposal…

人工智能 2023年7月12日
0055
cs224w（图机器学习）2021冬季课程学习笔记12 Knowledge Graph Embeddings

本章主要内容：本章首先介绍了异质图heterogeneous graph 和 relational GCN (RGCN)。接下来介绍了知识图谱补全knowledge graph…

人工智能 2023年6月1日
0056

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

跟着Cell学单细胞转录组分析(十二):转录因子分析

大家都在看