欢迎来到装配图网! | 帮助中心 装配图网zhuangpeitu.com!
装配图网
ImageVerifierCode 换一换
首页 装配图网 > 资源分类 > PPT文档下载
 

Cluster Analsis of聚类分析

  • 资源ID:66471778       资源大小:1.67MB        全文页数:123页
  • 资源格式: PPT        下载积分:10积分
快捷下载 游客一键下载
会员登录下载
微信登录下载
三方登录下载: 微信开放平台登录 支付宝登录   QQ登录   微博登录  
二维码
微信扫一扫登录
下载资源需要10积分
邮箱/手机:
温馨提示:
用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)
支付方式: 支付宝    微信支付   
验证码:   换一换

 
账号:
密码:
验证码:   换一换
  忘记密码?
    
友情提示
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

Cluster Analsis of聚类分析

1Cluster Analysis ofMicroarray Data4/13/2009Copyright 2009 Dan Nettleton2Clustering Group objects that are similar to one another together in a cluster. Separate objects that are dissimilar from each other into different clusters. The similarity or dissimilarity of two objects is determined by comparing the objects with respect to one or more attributes that can be measured for each object.3Data for Clustering attributeobject 1 2 3 . m 1 4.7 3.8 5.9 . 1.3 2 5.2 6.9 3.8 . 2.9 3 5.8 4.2 3.9 . 4.4 . . . . . . . . . . . . . . . . . . n 6.3 1.6 4.7 . 2.0 4Microarray Data for Clustering attributeobject 1 2 3 . m 1 4.7 3.8 5.9 . 1.3 2 5.2 6.9 3.8 . 2.9 3 5.8 4.2 3.9 . 4.4 . . . . . . . . . . . . . . . . . . n 6.3 1.6 4.7 . 2.0 genestime pointsestimated expression levels5Microarray Data for Clustering attributeobject 1 2 3 . m 1 4.7 3.8 5.9 . 1.3 2 5.2 6.9 3.8 . 2.9 3 5.8 4.2 3.9 . 4.4 . . . . . . . . . . . . . . . . . . n 6.3 1.6 4.7 . 2.0 genestissue typesestimated expression levels6Microarray Data for Clustering attributeobject 1 2 3 . m 1 4.7 3.8 5.9 . 1.3 2 5.2 6.9 3.8 . 2.9 3 5.8 4.2 3.9 . 4.4 . . . . . . . . . . . . . . . . . . n 6.3 1.6 4.7 . 2.0 genestreatmentconditionsestimated expression levels7Microarray Data for Clustering attributeobject 1 2 3 . m 1 4.7 3.8 5.9 . 1.3 2 5.2 6.9 3.8 . 2.9 3 5.8 4.2 3.9 . 4.4 . . . . . . . . . . . . . . . . . . n 6.3 1.6 4.7 . 2.0 samplesgenesestimated expression levels8Clustering: An Example Experiment Researchers were interested in studying gene expression patterns in developing soybean seeds. Seeds were harvested from soybean plants at 25, 30, 40, 45, and 50 days after flowering (daf). One RNA sample was obtained for each level of daf.9An Example Experiment (continued) Each of the 5 samples was measured on two two-color cDNA microarray slides using a loop design. The entire process we repeated on a second occasion to obtain a total of two independent biological replications.1025304045502530404550Rep 1Rep 2Diagram Illustrating the Experimental Design11The daf means estimated for each gene from a mixed linear model analysis provide a useful summary of the data for cluster analysis.Normalized Data for One Example GenedafdafNormalized Log SignalEstimated Means + or 1 SE12400 genes exhibited significant evidence of differential expression across time (p-value= G(4)-SE56The Gap Statistic Suggests K=3 Clusters57Gap Analysis for Two-Color Array Data (N=100)k=Number of Clustersk=Number of Clusterslog WkG(k)G(k)=log Wk log Wk vs. k(+ or 1 standard error) log Wk and log Wk vs. k *58Gap Analysis for Two-Color Array Data (N=100)k=Number of ClustersG(k)Gap AnalysisEstimates K=11Clusters“zoomed in” versionof previous plot596061626364656667686970Plot of Cluster Medoids71Principal Components Principal components can be useful for providing low-dimensional views of high-dimensional data. 1 2 . m 1 2 X = . . . nDataMatrixorDataSetx11 x12 . . . x1mx21.xn1x2m.xnmxn2 . . .observationorobjectvariableorattributenumber of variablesnumber of observations72Principal Components (continued) Each principal component of a data set is a variable obtained by taking a linear combination of the original variables in the data set. A linear combination of m variables x1, x2, ., xm is given by c1x1 + c2x2 + + cmxm. For the purpose of constructing principal components, the vector of coefficients is restricted to have unit length, i.e., c1 + c2 + + cm = 1.22273Principal Components (continued) The first principal component is the linear combination of the variables that has maximum variation across the observations in the data set. The jth principal component is the linear combination of the variables that has maximum variation across the observations in the data set subject to the constraint that the vector of coefficients be orthogonal to coefficient vectors for principal components 1, ., j-1.74The Simple Data Example x1 x2 75The First Principal Component Axis x1 x2 76The First Principal Components x1 x2 1st PC forthis pointis signeddistancebetween itsprojectiononto the1st PC axisand theorigin.77The Second Principal Component Axis x1 x2 78The Second Principal Component x1 x2 2nd PC forthis pointis signeddistancebetween itsprojectiononto the2nd PC axisand theorigin.79Plot of PC1 vs. PC2 PC1 PC2 80Compare the PC plot to the plotof the original data below. x1 x2 Because thereare only twovariables here,the plot ofPC2 vs. PC1 isjust a rotationof the originalplot.81There is more to be gained when the number of variables is greater than 2. Consider the principal components for the 400 significant genes from our two-color microarray experiment. Our data matrix has n=400 rows and m=5 columns. We have looked at this data using parallel coordinate plots. What would it look like if we projected the data points to 2-dimensions?82Projection of Two-Color Array Datawith 11-Medoid Clustering PC1 PC2 a=1b=2c=3d=4e=5f=6g=7h=8i=9j=10k=1183Projection of Two-Color Array Datawith 11-Medoid Clustering PC1 PC3 a=1b=2c=3d=4e=5f=6g=7h=8i=9j=10k=1184 PC2 85Projection of Two-Color Array Datawith 11-Medoid Clustering PC1 PC3 a=1b=2c=3d=4e=5f=6g=7h=8i=9j=10k=1186Hierarchical Clustering Methods Hierarchical clustering methods build a nested sequence of clusters that can be displayed using a dendrogram. We will begin with some simple illustrations and then move on to a more general discussion.87The Simple Example Datawith Observation Numbers x1 x2 88Dendrogram for the Simple Example DataTree Structurenodesa parent nodeterminal nodes or leavescorrespondingto objectsroot nodedaughter nodes(daughter nodeswith same parentare sister nodes)89A Hierarchical Clustering of theSimple Example DataScatterplot of DataDendrogramx1x2clusters within clusterswithin clusters.90Dendrogram for the Simple Example DataThe heightof a noderepresents thedissimilaritybetween thetwo clustersmergedtogether atthe node.These two clusters have a dissimilarity of about 1.75.91The appearance of a dendrogram is not unique.Any twosister nodescould tradeplaces withoutchanging themeaning of thedendrogram.Thus 14 next to 7 does not imply that these objects are similar.92Dendrogram for the Simple Example DataBy convention,R dendrogramsshow the lowersister nodeon the left.Ties are brokenby observationnumber.The appearance of a dendrogram is not unique.e.g., 13 is to the left of 14 93The lengthsof the branchesleading toterminal nodeshave noparticularmeaning in Rdendrograms.The appearance of a dendrogram is not unique.94Cutting the tree at a given height will correspond to a partitioning of the data into k clusters.k=2 Clusters95Cutting the tree at a given height will correspond to a partitioning of the data into k clusters.k=3 Clusters96Cutting the tree at a given height will correspond to a partitioning of the data into k clusters.k=4 Clusters97Cutting the tree at a given height will correspond to a partitioning of the data into k clusters.k=10 Clusters98Agglomerative (Bottom-Up) Hierarchical Clustering Define a measure of distance between any two clusters. (An individual object is considered a cluster of size one.) Find the two nearest clusters and merge them together to form a new cluster. Repeat until all objects have been merged into a single cluster.99Common Measures of Between-Cluster Distance Single Linkage a.k.a. Nearest Neighbor: the distance between any two clusters A and B is the minimum of all distances from an object in cluster A to an object in cluster B. Complete Linkage a.k.a Farthest Neighbor: the distance between any two clusters A and B is the maximum of all distances from an object in cluster A to an object in cluster B.100Common Measures of Between-Cluster Distance Average Linkage: the distance between any two clusters A and B is the average of all distances from an object in cluster A to an object in cluster B. Centroid Linkage: the distance between any two clusters A and B is the distance between the centroids of cluster A and B. (The centroid of a cluster is the componentwise average of the objects in a cluster.) 101Agglomerative Clustering Using Average Linkage for the Simple Example Data SetScatterplot of DataDendrogramx1x2ABCDEFGHIJMKLNOP102Agglomerative Clustering Using Average Linkage for the Simple Example Data SetA. 1-2 AB. 9-10 BC. 3-4 CD. 5-6 DE. 7-(5,6) EF. 13-14 FG. 11-12 GH. (1,2)-(3,4) HI. (9,10)-(11,12) Ietc.JKLMNOP103Agglomerative Clustering Using Single Linkage for the Simple Example Data Set104Agglomerative Clustering Using Complete Linkage for the Simple Example Data Set105Agglomerative Clustering Using Centroid Linkage for the Simple Example Data SetCentroid linkage isnot monotone inthe sense thatlater cluster mergescan involve clustersthat are more similarto each other thanearlier merges.106Agglomerative Clustering Using Centroid Linkage for the Simple Example Data SetThe merge between4 and (1,2,3,5) createsa cluster whose centroidis closer to the (6,7)centroid than 4 was tothe centroid of (1,2,3,5).107Agglomerative Clustering Using Single Linkage for the Two-Color Microarray Data Set108Agglomerative Clustering Using Complete Linkage for the Two-Color Microarray Data Set109Agglomerative Clustering Using Average Linkage for the Two-Color Microarray Data Set110Agglomerative Clustering Using Centroid Linkage for the Two-Color Microarray Data Set111Which Between-Cluster Distance is Best? Depends, of course, on what is meant by “best”. Single linkage tends to produce “long stringy” clusters. Complete linkage produces compact spherical clusters but might result in some objects that are closer to objects in clusters other than their own. (See next example.) Average linkage is a compromise between single and complete linkage. Centroid linkage is not monotone. 1121. Conduct agglomerative hierarchical clustering for this datausing Euclidean distance and complete linkage.2. Display your results using a dendrogram.3. Identify the k=2 clustering using your results.113Results of Complete-Linkage ClusteringResults for k=2 Clusters114Divisive (Top-Down) Hierarchical Clustering Start with all data in one cluster and divide it into two clusters (using, e.g., 2-means or 2-medoids clustering). At each subsequent step, choose one of the existing clusters and divide it into two clusters. Repeat until there are n clusters each containing a single object.115Potential Problem with Divisive Clustering15116Macnaughton-Smith et al. (1965)1.Start with objects in one cluster A.2.Find the object with the largest average dissimilarity to all other objects in A and move that object to a new cluster B.3.Find the object in cluster A whose average dissimilarity to other objects in cluster A minus its average dissimilarity to objects in cluster B is maximum. If this difference is positive, move the object to cluster B.4.Repeat step 3 until no objects satisfying 3 are found.5.Repeat steps 1 through 4 to one of the existing clusters (e.g., the one with the largest average within-cluster dissimilarity) until n clusters of 1 object each are obtained. 117Macnaughton-Smith Divisive Clustering15AB118Macnaughton-Smith Divisive Clustering15AB119Macnaughton-Smith Divisive Clustering15AB120Macnaughton-Smith Divisive Clustering15ABB121Macnaughton-Smith Divisive Clustering15ABBNext continue to split each of these clustersuntil each object is in a cluster by itself.122Dendrogram for the Macnaughton-Smith Approach123Agglomerative vs. Divisive Clustering Divisive clustering has not been studied as extensively as agglomerative clustering. Divisive clustering may be preferred if only a small number of large clusters is desired. Agglomerative clustering may be preferred if a large number of small clusters is desired.

注意事项

本文(Cluster Analsis of聚类分析)为本站会员(仙***)主动上传,装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知装配图网(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!