跟着Cell学单细胞转录组分析(十三):单细胞GSVA分析|这个包涵盖大多数物种

之前我们发过GSVA分析(有了这个包,猪的GSEA和GSVA分析也不在话下(第一集)【后续来了】有了这个包,猪的GSEA和GSVA分析也不在话下(第二集)),接着单细胞系列,重新说一下GSVA分析。

首先是数据集的问题,通常只做人和小鼠的,想要做其他物种的,苦于没有数据集,不过这里说的这个包,即使猪的GSVA分析也都可以做。

这里我们以小鼠为示例,也是常见物种的GSVA分析,结合单细胞的数据!GSVA其实就是对功能富集的量化,然后进行差异分析,寻找感兴趣的通路在样本中的变化。不同于常规的GO、KEGG受差异基因阈值的影响,GSEA受实验分组的影响,GSVA能够对通路量化,看感兴趣通路在多组之间的变化!

首先加载和安装必要的包并加载单细胞数据。

library(Seurat)
#source("http://bioconductor.org/biocLite.R")
#biocLite("GSVA")
library(GSVA)
library(tidyverse)
library(ggplot2)
library(clusterProfiler)
library(org.Mm.eg.db)
library(dplyr)

immuneT <- subset(immune, celltype="="T" cells")#提取我们需要分析的细胞类型 immunet <- as.matrix(immunet@assays$rna@counts)#提取count矩阵 meta immunet@meta.data[,c("orig.ident", "sex", "age", "stim", "samples")]#分组信息,为了后续作图 < code></->

之前一直苦于MSigDB数据库只有人的数据集,没有小鼠和其他物种的,网上也有教程如何根据基因同源性进行转化的,但是很麻烦,也不一定成功。还好有一个新的数据包被发现了,简直是福音—msigdbr包,可以做GSEA和GSVA。


#install.packages("msigdbr")
library(msigdbr)
msigdbr_species() #&#x53EF;&#x4EE5;&#x770B;&#x5230;&#xFF0C;&#x8FD9;&#x4E2A;&#x5305;&#x6DB5;&#x76D6;&#x4E86;20&#x4E2A;&#x7269;&#x79CD;
A tibble: 20 x 2
   species_name                 species_common_name
   <chr>                        <chr>
 1 Anolis carolinensis          Carolina anole, green anole
 2 Bos taurus                   bovine, cattle, cow, dairy cow, domestic cattle, dome~
 3 Caenorhabditis elegans       roundworm
 4 Canis lupus familiaris       dog, dogs
 5 Danio rerio                  leopard danio, zebra danio, zebra fish, zebrafish
 6 Drosophila melanogaster      fruit fly
 7 Equus caballus               domestic horse, equine, horse
 8 Felis catus                  cat, cats, domestic cat
 9 Gallus gallus                bantam, chicken, chickens, Gallus domesticus
10 Homo sapiens                 human
11 Macaca mulatta               rhesus macaque, rhesus macaques, Rhesus monkey, rhesu~
12 Monodelphis domestica        gray short-tailed opossum
13 Mus musculus                 house mouse, mouse
14 Ornithorhynchus anatinus     duck-billed platypus, duckbill platypus, platypus
15 Pan troglodytes              chimpanzee
16 Rattus norvegicus            brown rat, Norway rat, rat, rats
17 Saccharomyces cerevisiae     baker's yeast, brewer's yeast, S. cerevisiae
18 Schizosaccharomyces pombe 9~ NA
19 Sus scrofa                   pig, pigs, swine, wild boar
20 Xenopus tropicalis           tropical clawed frog, western clawed frog
&#x67E5;&#x770B;&#x4E0B;&#x5C0F;&#x9F20;&#x7684;&#x57FA;&#x56E0;&#x96C6;&#xFF0C;&#x662F;&#x5426;&#x4E0E;MSigDB&#x6570;&#x636E;&#x5E93;&#x4E00;&#x6837;

mouse <- 5 msigdbr(species="Mus musculus" ) mouse[1:5,1:5] # a tibble: x gs_cat gs_subcat gs_name gene_symbol entrez_gene <chr>  <chr>          <chr>          <chr>             <int>
1 C3     MIR:MIR_Legacy AAACCAC_MIR140 Abcc4            239273
2 C3     MIR:MIR_Legacy AAACCAC_MIR140 Abraxas2         109359
3 C3     MIR:MIR_Legacy AAACCAC_MIR140 Actn4             60595
4 C3     MIR:MIR_Legacy AAACCAC_MIR140 Acvr1             11477
5 C3     MIR:MIR_Legacy AAACCAC_MIR140 Adam9             11502
table(mouse$gs_cat) #&#x67E5;&#x770B;&#x76EE;&#x5F55;&#xFF0C;&#x4E0E;MSigDB&#x4E00;&#x6837;&#xFF0C;&#x5305;&#x542B;9&#x4E2A;&#x6570;&#x636E;&#x96C6;
###C1      C2      C3      C4      C5      C6      C7      C8       H
  20049  533767  795972   92353 1248327   30556  988358  109328    7411
</int></chr></chr></chr></-></chr></chr>

本例中,我们要分析GO,因为mouse文件包含了所有的基因集,所以要查看GO在哪里,然后将需要的文件提出来。

table(mouse$gs_subcat)
  CGN             CGP              CM              CP
         167344           42770          376981           49583            3881
    CP:BIOCARTA         CP:KEGG          CP:PID     CP:REACTOME CP:WIKIPATHWAYS
           4860           13694            8196           98232           27923
          GO:BP           GO:CC           GO:MF             HPO     IMMUNESIGDB
         660368          100991          105717          381251          944068
 MIR:MIR_Legacy       MIR:MIRDB        TFT:GTRD  TFT:TFT_Legacy             VAX
          34118          372658          235886          153310           44290
mouse_GO_bp = msigdbr(species = "Mus musculus",
                      category = "C5", #GO&#x5728;C5
                      subcategory = "GO:BP") %>%
                      dplyr::select(gs_name,gene_symbol)#&#x8FD9;&#x91CC;&#x53EF;&#x4EE5;&#x9009;&#x62E9;gene symbol&#xFF0C;&#x4E5F;&#x53EF;&#x4EE5;&#x9009;&#x62E9;ID&#xFF0C;&#x6839;&#x636E;&#x81EA;&#x5DF1;&#x6570;&#x636E;&#x9700;&#x6C42;&#x6765;&#xFF0C;&#x4E3B;&#x8981;&#x4E3A;&#x4E86;&#x65B9;&#x4FBF;
mouse_GO_bp_Set = mouse_GO_bp %>% split(x = .$gene_symbol, f = .$gs_name)#&#x540E;&#x7EED;gsva&#x8981;&#x6C42;&#x662F;list&#xFF0C;&#x6240;&#x4EE5;&#x5C06;&#x4ED6;&#x8F6C;&#x5316;&#x4E3A;list

表达矩阵数据有了,通路信息有了,就可以进行GEVA分析了,代码简单就一句!保存结果!

T_gsva <- gsva(expr="immuneT," gset.idx.list="mouse_GO_bp_Set," kcdf="Poisson" , #查看帮助函数选择合适的kcdf方法 parallel.sz="5)" write.table(t_gsva, 't_gsva.xls', row.names="T," col.names="NA," sep="\t" ) < code></->

接着差异分析可以用limma包,类似于转录组芯片数据分析流程。

group <- c(rep("control", 50), rep("test", 71)) %>% as.factor()#&#x8BBE;&#x7F6E;&#x5206;&#x7EC4;&#xFF0C;&#x5BF9;&#x7167;&#x5728;&#x524D;
desigN <- 0 model.matrix(~ + group) #构建比较矩阵 colnames(design) <- levels(group) fit="lmFit(test_control," design) fit2 ebayes(fit) diff="topTable(fit2,adjust='fdr',coef=2,number=Inf)#&#x6821;&#x51C6;&#x6309;&#x7167;&#x9700;&#x6C42;&#x4FEE;&#x6539;" ?toptable write.csv(diff, file="Diff.csv" ) < code></-></->

最后对差异的感兴趣的通路进行可视化:


up <- c("gobp_egg_activation", "gobp_tendon_development", "gobp_somite_specification", "gobp_threonine_catabolic_process", "gobp_regulation_of_glutamate_receptor_clustering", "gobp_negative_chemotaxis", "gobp_negative_regulation_of_fat_cell_proliferation", "gobp_regulation_of_t_helper_17_cell_lineage_commitment", "gobp_regulation_of_antimicrobial_humoral_response") down <- c("gobp_determination_of_pancreatic_left_right_asymmetry", "gobp_mitotic_dna_replication", "gobp_eosinophil_chemotaxis", "gobp_neutrophil_mediated_cytotoxicity", "gobp_potassium_ion_export_across_plasma_membrane", "gobp_regulation_of_leukocyte_mediated_cytotoxicity", "gobp_regulation_of_sequestering_of_zinc_ion", "gobp_endothelin_receptor_signaling_pathway", "gobp_pre_replicative_complex_assembly_involved_in_cell_cycle_dna_replication", "gobp_establishment_of_planar_polarity_of_embryonic_epithelium") test c(up,down) diff$id rownames(diff) q diff[test,] group1 c(rep("treat", 9), rep("control", 10)) df data.frame(id="Q$ID," score="Q$t,group=group1" ) # 按照t score排序 sortdf df[order(df$score),] sortdf$id factor(sortdf$id, levels="sortdf$ID)#&#x589E;&#x52A0;&#x901A;&#x8DEF;ID&#x90A3;&#x4E00;&#x5217;" ggplot(sortdf, aes(id, score,fill="group))" + geom_bar(stat="identity" ,alpha="0.7)" coord_flip() theme_bw() #去除背景色 theme(panel.grid="element_blank())+" theme(panel.border="element_rect(size" = 0.6))+ labs(x , y="t value of GSVA score" )+ scale_fill_manual(values="c("#008020","#08519C"))#&#x8BBE;&#x7F6E;&#x989C;&#x8272;" < code></->

跟着Cell学单细胞转录组分析(十三):单细胞GSVA分析|这个包涵盖大多数物种

整个流程就结束了,希望对你们的研究能有启发,GO、KEGG做多了,可以换着做一下GSVA分析!

Original: https://blog.csdn.net/qq_42090739/article/details/124005072
Author: TS的美梦
Title: 跟着Cell学单细胞转录组分析(十三):单细胞GSVA分析|这个包涵盖大多数物种

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/640354/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球