生物信息学课件英文原版课件 (104)

上传人:r****d 文档编号:183976770 上传时间:2023-01-31 格式:PPT 页数:69 大小:401.50KB
收藏 版权申诉 举报 下载
生物信息学课件英文原版课件 (104)_第1页
第1页 / 共69页
生物信息学课件英文原版课件 (104)_第2页
第2页 / 共69页
生物信息学课件英文原版课件 (104)_第3页
第3页 / 共69页
资源描述:

《生物信息学课件英文原版课件 (104)》由会员分享,可在线阅读,更多相关《生物信息学课件英文原版课件 (104)(69页珍藏版)》请在装配图网上搜索。

1、1/31/20231CS 267:Applications of Parallel ComputersLecture 19:Graph Partitioning Part IIKathy Yelick :/www-inst.eecs.berkeley.edu/cs2671/31/20232Recap of Last Lecture Partitioning with nodal coordinates:Inertial method Projection onto a sphere Algorithms are efficient Rely on graphs having nodes con

2、nected(mostly)to“nearest neighbors in space Partitioning without nodal coordinates:Breadth-First Search simple,but not great partition Kernighan-Lin good corrector given reasonable partition Spectral Method good partitions,but slow Today:Spectral methods revisited Multilevel methods1/31/20233Basic D

3、efinitions Definition:The Laplacian matrix L(G)of a graph G(N,E)is an|N|by|N|symmetric matrix,with one row and column for each node.It is defined by L(G)(i,i)=degree of node I(number of incident edges)L(G)(i,j)=-1 if i!=j and there is an edge(i,j)L(G)(i,j)=0 otherwise2 -1 -1 0 0-1 2 -1 0 0-1 -1 4 -1

4、 -10 0 -1 2 -10 0 -1 -1 212345G=L(G)=1/31/20234Properties of Laplacian Matrix Theorem 1:Given G,L(G)has the following properties (proof on web page)L(G)is symmetric.This means the eigenvalues of L(G)are real and its eigenvectors are real and orthogonal.Rows of L sum to zero:Let e=1,1T,i.e.the column

5、 vector of all ones.Then L(G)*e=0.The eigenvalues of L(G)are nonnegative:0=l1=l2=ln The number of connected components of G is equal to the number of li equal to 0.Definition:l2(L(G)is the algebraic connectivity of GThe magnitude of l2 measures connectivityIn particular,l2!=0 if and only if G is con

6、nected.1/31/20235Spectral Bisection Algorithm Spectral Bisection Algorithm:Compute eigenvector v2 corresponding to l2(L(G)For each node n of G if v2(n)0 put node n in partition N-else put node n in partition N+Why does this make sense?Recall l2(L(G)is the algebraic connectivity of G Theorem(Fiedler)

7、:Let G1(N,E1)be a subgraph of G(N,E),so that G1 is“less connected than G.Then l2(L(G)=l2(L(G),i.e.the algebraic connectivity of G1 is less than or equal to the algebraic connectivity of G.(proof on web page)1/31/20236Motivation for Spectral Bisection Vibrating string has modes of vibration,or harmon

8、ics Modes computable as follows Model string as masses connected by springs(a 1D mesh)Write down F=ma for coupled system,get matrix A Eigenvalues and eigenvectors of A are frequencies and shapes of modes Label nodes by whether mode-or+to get N-and N+Same idea for other graphs(eg planar graph trampol

9、ine)1/31/20237Eigenvectors of L(1D mesh)Eigenvector 1 (all ones)Eigenvector 2Eigenvector 31/31/202382nd eigenvector of L(planar mesh)1/31/20239Computing v2 and l l2 of L(G)using Lanczos Given any n-by-n symmetric matrix A(such as L(G)Lanczos computes a k-by-k“approximation T by doing k matrix-vector

10、 products,k=2)of nodes Bipartite model:use bipartite graph for directed graph Multi-object,Multi-Constraint model:use when single structure may involve multiple computations with differing costs1/31/202335Myth 3:Partition Quality is Paramount When structure are changing dynamically during a simulati

11、on,need to partition dynamically Speed may be more important than quality Partitioner must run fast in parallel Partition should be incrementalChange minimally relative to prior one Must not use too much memory Example from Touheed,Selwood,Jimack and Bersins 1 M elements with adaptive refinement on

12、SGI Origin Timing data for different partitioning algorithms:Repartition time from 3.0 to 15.2 secsMigration time:17.8 to 37.8 secsSolve time:2.54 to 3.11 secs1/31/202336Load Balancing in General In some communities,load balancing is equated with graph partitioning Some load balancing problems do no

13、t fit this model Made several assumptions about the problem Task costs(node weights)are known Communication volumes(edge weights)are known Dependencies are knownFor basic partitioning techniques covered in class,the dependencies were only between iterations What if we have less information?1/31/2023

14、37Load Balancing in General Spectrum of solutions Static-all information available before starting Semi-Static-some info before starting Dynamic-little or no info before starting Survey of solutions How each one works Theoretical bounds,if any When to use itEnormous and diverse literature on load ba

15、lancing Computer Science systems operating systems,compilers,distributed computing Computer Science theory Operations research(IEOR)Application domains1/31/202338Understanding Load Balancing ProblemsLoad balancing problems differ in:Tasks costs Do all tasks have equal costs?If not,when are the costs

16、 known?Before starting,when task created,or only when task ends Task dependencies Can all tasks be run in any order(including parallel)?If not,when are the dependencies known?Before starting,when task created,or only when task ends Locality Is it important for some tasks to be scheduled on the same

17、processor(or nearby)to reduce communication cost?When is the information about communication between tasks known?1/31/202339Task Cost Spectrum1/31/202340Task Dependency Spectrum1/31/202341Task Locality Spectrum(Data Dependencies)1/31/202342Spectrum of SolutionsOne of the key questions is when certai

18、n information about the load balancing problem is knownLeads to a spectrum of solutions:Static scheduling.All information is available to scheduling algorithm,which runs before any real computation starts.(offline algorithms)Semi-static scheduling.Information may be known at program startup,or the b

19、eginning of each timestep,or at other well-defined points.Offline algorithms may be used even though the problem is dynamic.Dynamic scheduling.Information is not known until mid-execution.(online algorithms)1/31/202343Approaches Static load balancing Semi-static load balancing Self-scheduling Distri

20、buted task queues Diffusion-based load balancing DAG scheduling Mixed ParallelismNote:these are not all-inclusive,but represent some of the problems for which good solutions exist.1/31/202344Static Load Balancing Static load balancing is use when all information is available in advance Common cases:

21、dense matrix algorithms,such as LU factorization done using blocked/cyclic layoutblocked for locality,cyclic for load balance most computations on a regular mesh,e.g.,FFTdone using cyclic+transpose+blocked layout for 1Dsimilar for higher dimensions,i.e.,with transpose sparse-matrix-vector multiplica

22、tionuse graph partitioningassumes graph does not change over time(or at least within a timestep during iterative solve)1/31/202345Semi-Static Load Balance If domain changes slowly over time and locality is important use static algorithm do some computation(usually one or more timesteps)allowing some

23、 load imbalance on later steps recompute a new load balance using static algorithm Often used in:particle simulations,particle-in-cell(PIC)methodspoor locality may be more of a problem than load imbalance as particles move from one grid partition to another tree-structured computations(Barnes Hut,et

24、c.)grid computations with dynamically changing grid,which changes slowly1/31/202346Self-Scheduling Self scheduling:Keep a centralized pool of tasks that are available to run When a processor completes its current task,look at the pool If the computation of one task generates more,add them to the poo

25、l Originally used for:Scheduling loops by compiler(really the runtime-system)Original paper by Tang and Yew,ICPP 19861/31/202347When is Self-Scheduling a Good Idea?Useful when:A batch(or set)of tasks without dependencies can also be used with dependencies,but most analysis has only been done for tas

26、k sets without dependencies The cost of each task is unknown Locality is not important Using a shared memory multiprocessor,so a centralized pool of tasks is fine1/31/202348Variations on Self-Scheduling Typically,dont want to grab smallest unit of parallel work.Instead,choose a chunk of tasks of siz

27、e K.If K is large,access overhead for task queue is small If K is small,we are likely to have even finish times(load balance)Four variations:Use a fixed chunk size Guided self-scheduling Tapering Weighted Factoring Note:there are more1/31/202349Variation 1:Fixed Chunk Size Kruskal and Weiss give a t

28、echnique for computing the optimal chunk size Requires a lot of information about the problem characteristics e.g.,task costs,number Results in an off-line algorithm.Not very useful in practice.For use in a compiler,for example,the compiler would have to estimate the cost of each task All tasks must

29、 be known in advance1/31/202350Variation 2:Guided Self-Scheduling Idea:use larger chunks at the beginning to avoid excessive overhead and smaller chunks near the end to even out the finish times.The chunk size Ki at the ith access to the task pool is given by ceiling(Ri/p)where Ri is the total numbe

30、r of tasks remaining and p is the number of processors See Polychronopolous,“Guided Self-Scheduling:A Practical Scheduling Scheme for Parallel Supercomputers,IEEE Transactions on Computers,Dec.1987.1/31/202351Variation 3:Tapering Idea:the chunk size,Ki is a function of not only the remaining work,bu

31、t also the task cost variance variance is estimated using history information high variance=small chunk size should be used low variant=larger chunks OK See S.Lucco,“Adaptive Parallel Programs,PhD Thesis,UCB,CSD-95-864,1994.Gives analysis(based on workload distribution)Also gives experimental result

32、s-tapering always works at least as well as GSS,although difference is often small1/31/202352Variation 4:Weighted Factoring Idea:similar to self-scheduling,but divide task cost by computational power of requesting node Useful for heterogeneous systems Also useful for shared resource NOWs,e.g.,built

33、using all the machines in a building as with Tapering,historical information is used to predict future speed“speed may depend on the other loads currently on a given processor See Hummel,Schmit,Uma,and Wein,SPAA 96 includes experimental data and analysis1/31/202353Distributed Task Queues The obvious

34、 extension of self-scheduling to distributed memory is:a distributed task queue(or bag)When are these a good idea?Distributed memory multiprocessors Or,shared memory with significant synchronization overhead Locality is not(very)important Tasks that are:known in advance,e.g.,a bag of independent one

35、sdependencies exist,i.e.,being computed on the fly The costs of tasks is not known in advance1/31/202354Theoretical ResultsMain result:A simple randomized algorithm is optimal with high probabilityAdler et al 95 show this for independent,equal sized tasks“throw balls into random binstight bounds on

36、load imbalance;show p log p tasks leads to“good balanceKarp and Zhang 88 show this for a tree of unit cost(equal size)tasks parent must be done before children,tree unfolds at runtimechildren“pushed to random processorsBlumofe and Leiserson 94 show this for a fixed task tree of variable cost tasksth

37、eir algorithm uses task pulling(stealing)instead of pushing,which is good for localityI.e.,when a processor becomes idle,it steals from a random processoralso have(loose)bounds on the total memory requiredChakrabarti et al 94 show this for a dynamic tree of variable cost tasksworks for branch and bo

38、und,I.e.tree structure can depend on execution orderuses randomized pushing of tasks instead of pulling,so worse localityOpen problem:does task pulling provably work well for dynamic trees?1/31/202355Engineering Distributed Task QueuesA lot of papers on engineering these systems on various machines,

39、and their applicationsIf nothing is known about task costs when createdorganize local tasks as a stack(push/pop from top)steal from the stack bottom(as if it were a queue),because old tasks likely to cost moreIf something is known about tasks costs and communication costs,can be used as hints.(See W

40、en,UCB PhD,1996.)Part of Multipol(cs.berkeley.edu/projects/multipol)Try to push tasks with high ratio of cost to compute/cost to pushEx:for matmul,ratio=2n3 cost(flop)/2n2 cost(send a word)Goldstein,Rogers,Grunwald,and others(independent work)have all shown advantages of integrating into the languag

41、e frameworkvery lightweight thread creationCILK(Leicerson et al)(supertech.lcs.mit.edu/cilk)1/31/202356Diffusion-Based Load Balancing In the randomized schemes,the machine is treated as fully-connected.Diffusion-based load balancing takes topology into account Locality properties better than prior w

42、ork Load balancing somewhat slower than randomized Cost of tasks must be known at creation time No dependencies between tasks1/31/202357Diffusion-based load balancing The machine is modeled as a graph At each step,we compute the weight of task remaining on each processor This is simply the number if

43、 they are unit cost tasks Each processor compares its weight with its neighbors and performs some averaging Markov chain analysis See Ghosh et al,SPAA96 for a second order diffusive load balancing algorithm takes into account amount of work sent last time avoids some oscillation of first order schem

44、es Note:locality is still not a major concern,although balancing with neighbors may be better than random1/31/202358DAG Scheduling For some problems,you have a directed acyclic graph(DAG)of tasks nodes represent computation(may be weighted)edges represent orderings and usually communication(may also

45、 be weighted)not that common to have the DAG in advance Two application domains where DAGs are known Digital Signal Processing computations Sparse direct solvers(mainly Cholesky,since it doesnt require pivoting).More on this in another lecture.The basic offline strategy:partition DAG to minimize com

46、munication and keep all processors busy NP complete,so need approximations Different than graph partitioning,which was for tasks with communication but no dependencies See Gerasoulis and Yang,IEEE Transaction on P&DS,Jun 93.1/31/202359Mixed ParallelismAs another variation,consider a problem with 2 l

47、evels of parallelism course-grained task parallelism good when many tasks,bad if few fine-grained data parallelism good when much parallelism within a task,bad if littleAppears in:Adaptive mesh refinement Discrete event simulation,e.g.,circuit simulation Database query processing Sparse matrix direc

48、t solvers1/31/202360Mixed Parallelism Strategies1/31/202361Which Strategy to Use1/31/202362Switch Parallelism:A Special Case1/31/202363Simple Performance Model for Data Parallelism1/31/2023641/31/202365Values of Sigma(Problem Size for Half Peak)1/31/202366Modeling Performance To predict performance,

49、make assumptions about task tree complete tree with branching factor d=2 d child tasks of parent of size N are all of size N/c,c1 work to do task of size N is O(Na),a=1 Example:Sign function based eigenvalue routine d=2,c=4(on average),a=1.5 Example:Sparse Cholesky on 2D mesh d=4,c=4,a=1.5 Combine t

50、hese assumptions with model of data parallelism1/31/202367Simulated Efficiency of Eigensolver Starred lines are optimal mixed parallelism Solid lines are data parallelism Dashed lines are switched parallelism1/31/202368Simulated efficiency of Sparse Cholesky Starred lines are optimal mixed paralleli

51、sm Solid lines are data parallelism Dashed lines are switched parallelism1/31/202369Actual Speed of Sign Function Eigensolver Starred lines are optimal mixed parallelism Solid lines are data parallelism Dashed lines are switched parallelism Intel Paragon,built on ScaLAPACK Switched parallelism worthwhile!

展开阅读全文
温馨提示:
1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2: 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
3.本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!