For Example, there are 205 mutations in gene p53 of 514 tumors, while 96 stage IV tumors have 86 mutations. We expect that 96 stage IV tumors should have 96 x 205 / 514 = 38 mutations, while we observed 86. Is that significantly different from the general mutation pattern?
> sam <- matrix(c(86,96,38,96),nrow=2,ncol=2) > sam
[,1] [,2] [1,] 86 38 [2,] 96 96
> chisq.test(sam)
Pearson's Chi-squared test with Yates' continuity correction data: sam X-squared = 10.7773, df = 1, p-value = 0.001028
> chisq.test(sam)$p.value
[1] 0.001027552
Following is a csv file example.
x<-read.csv("chisq.csv",header=T,sep=",",dec=".") zz <- file("out_chisq.txt","w") title <- names(x) writeLines(paste(title[1],title[2],title[3],title[4],title[5], "Chisq P Value",sep=","),con=zz,sep="\n") xR <- nrow(x) sam<-array(dim=c(2,2)) for (i in 1:xR) { sam[1,] <- c(x[i,2],x[i,3]) sam[2,] <- c(x[i,4],x[i,5]) pv<- chisq.test(sam)$p.value writeLines(paste(x[i,1],x[i,2],x[i,3],x[i,4],x[i,5],pv,sep=","), con=zz,sep="\n") } close(zz)
The content of the output file is:
Gene,Unique.observed,Unique.expected,duplicated.observed, duplicate.expected,Chisq P Value TTN,27,33,60,54,0.425175749168081 GATA3,38,20,17,35,0.00116789922038592 HLA-DRB6,18,15,24,27,0.655008761576397 MUC16,13,15,28,26,0.815855072976336 NR1H2,11,15,29,25,0.473920420172139 GPRIN2,12,14,27,25,0.810181236410474 MAP3K1,15,14,24,25,1 GPRIN1,13,14,25,24,1 MLL3,12,14,26,24,0.808944275014528 MAP3K4,8,14,29,23,0.203492032204285 CDH1,17,12,17,22,0.326688384050414 ENSG00000245549,15,12,18,21,0.616574005797083 ZNF384,12,12,20,20,0.796253414737639 FRG1B,11,11,20,20,0.790676108831151 AKD1,9,11,21,19,0.784191229401619 OBSCN,12,11,17,18,1 NCOA3,8,10,20,18,0.77477725929156 USH2A,8,10,20,18,0.77477725929156 ENSG00000198786,12,10,15,17,0.781814003488769
Download the csv file and the R source code:
Data File
R Source Code File