Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
709 views
in Technique[技术] by (71.8m points)

selecting number of leaf nodes of dendrogram in heatmap.2 in R

In Matlab you can designate the number of nodes in a dendrogram that you wish to plot as part of the dendrogram function: dendrogram(tree,P) generates a dendrogram plot with no more than P leaf nodes.

My attempts to do the same with heatmap2 in R have failed miserably. The posts to stackoverflow and biostars have suggested using cutree but heatmap2 gets stuck with postings' suggestions on Rowv option. Here "TAD" is the data matrix 8 columns by 831 rows.

# cluster it
hr <- hclust(dist(TAD, method="manhattan"), method="average")

# draw the heat map
heatmap.2(TAD, main="Hierarchical Cluster",
          Rowv=as.dendrogram(cutree(hr, k=5)),
          Colv=NA, dendrogram="row", col=my_palette, density.info="none", trace="none")

returns the message:

Error in UseMethod("as.dendrogram") : 
  no applicable method for 'as.dendrogram' applied to an object of class "c('integer', 'numeric')"

Is using cutree the correct avenue to explore for plotting a restricted dendrogram? Is there any easier way to do this akin to matlab?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Just to clarify and provide some data... I do not want to drop any of the rows; instead of plotting/interpreting 831 branches, I would like to interpret 3 branches, and so would like the row dendrogram to be constrained to 3 branches (at height 150) and the corresponding heatmap of all 831 rows to be clustered into the 3 upper branches of the original dendrogram.

#Here is a random n=10 subset of my data; which for 10 observed fish has the %of time each spent within     
#a depth bin (Bin1-Bin8)

zz <- "ID Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8
1    0    0    0    0    0  0.0   0.0 100.0
2    0    0    0    0    0  0.0   0.0 100.0
3    0    0    0    0    0  0.0   0.0 100.0
4    0    0    0    0    0 70.8  29.2   0.0
5    0    0    0  100    0  0.0   0.0   0.0
6    0    0    0    0    0  0.0  93.3   6.7
7    0    0    0    0    0 27.5  72.5   0.0
8    0    0    0    0    0 53.5  46.5   0.0
9    0    0    0    0    0  0.0 100.0   0.0
10    0    0    0    0    0  0.0  72.1  27.9 "

TAD <- read.table(text=zz, header = TRUE)
IDnames <- TAD[,1]
x<-data.matrix(TAD[,2:ncol(TAD)])
rownames(x) <- IDnames

Without worrying about heatmap for the time being, the distance matrix and hclustering is done on the numeric matrix x

TAD.dist <- dist(x, method="manhattan", diag=FALSE, upper=FALSE)
TAD.cluster <- hclust(TAD.dist, method="average", members=NULL)

a plot of this resultant dendrogram reveals all ten branches,

plot(TAD.cluster)

but a cutoff height of 150 will restrain to only 3 branches

hcd = as.dendrogram(TAD.cluster)
rowDend<- cut(hcd, h = 150)$upper
plot(rowDend)

the dendrogram plotted with plot(rowDend) is what I would like to see on the row dendrogram for the following heatmap

heatmap.2 (x,
distfun = function(x) dist(x, method='manhattan', diag=FALSE, upper=FALSE),
hclustfun = function(x) hclust(x,method = 'average'),
dendrogram = "row",
#Rowv=rowDend, #this is where I thought I could restrain the row dendrogram
Colv="NA",
trace="none",
)

But I can not find any way to restrain the row dendrogram in heatmap for the desired number of interpretable branches. Plotting all 831 branches is extremely messy.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...