《分布式系统》研究生教学课件
分布式系统研究生教学课件,分布式系统,分布式,系统,研究生,教学,课件
Distributed SystemsDistributed SystemsArchitecturesArchitecturesYingchi MaoOutlinevArchitectural stylesvSystem architecturesvArchitectures versus middlewarevSelf-management in distributed systemsWhat is a Distributed System?A distributed system is:va collection of independent computers that appears to its users as a single coherent systemDefinition of a Distributed System(II)vIndependent hardware installationsvUniform software layer(middleware)vNote:the middleware layer extends over multiple machinesMiddlewarevGeneral structure of a distributed system as middleware.Middleware and InteroperabilityInteroperability provided by:vProtocols used by each middleware layer vInterfaces offered to applications vIndependent hardware installationsvUniform software layer(middleware)Architectural StylesvDistributed systems are complex pieces of software to master complexity:good organizationvDifferent ways to look at organization of distributed systems two obvious onesSoftware architecture logical organization of software components how the various software components are organized and how they should interactSystem architecture their physical realization the instantiation of software components on real machinesArchitectural StylesvA architectural style is formulated in terms of components,the way that components are connected to each other,the data exchanged between components,and finally show these elements are jointly configured into a system.vA component is a modular unit with well-defined required and provided interfaces that is replaceable within its environment.vA connector,which is generally described as a mechanismthat mediates communication,coordination,or cooperation among components.E.g.,Remote procedure call,Message passing,or streaming datavUsing components and connectors,different architectural stylesSeveral Architecture StylesvUsing components and connectors,we can come to various configurations,in turn have been classified into architectural styles.Layered architecturesObject-based architecturesData-centered architecturesEvent-based architecturesArchitectural styles(1/4):Layered stylevObservation:Layered style is used for client-server systemArchitectural styles(2/4):object basedvBasic idea:Organize into logically different components,and subsequently distribute those components over the various machines.vObservation:object-based style for distributed object systems.In essence,each object corresponds to what we have defined as a component and these components are connected through a(remote)procedure call mechanism.Architectural styles(3/4):data-centeredvBasic idea:Processes communicate through a common(passive or active)repository.vAs important as the layered and object-based architecturesE.g.,a wealth of networked applications have been developed that rely on a shared distributed file system in which virtually all communication takes place through files.Likewise,Web-based distributed systemsArchitectural styles(3/4):data-centeredvshared data-space architectural style.Architectural Styles(4/4):event-basedvObservation:Decoupling processes in space(“anonymous”)and also time(“asynchronous”)has led to alternative styles:v(a)Publish/subscribe decoupled in space andv(b)Shared dataspace decoupled in space and timeShared data spacesvMany shared data spaces use a SQL-like interface to the shared repositoryData can be access using a description rather than an explicit referenceE.g.,filesGoogle Sawzall:Very large data sets often have a flat but regular structure and span multiple disks and machines.Examples include telephone call records,network logs,and web document repositories.Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs,coupled with infrastructure for evaluating these programs.OutlinevArchitectural stylesvSystem architecturesvArchitectures versus middlewarevSelf-management in distributed systemsOutlinevArchitectural stylesvSystem architecturesvArchitectures versus middlewarevSelf-management in distributed systemsSystem architecturevDeciding on software components,their interaction,and their placement leads to an instance of a software architecture,also called a system architecture.System arch vertical distributionvBasic client/server model Server processes offer services use by clients processes Client follow request/reply model in using services Clients/servers can be distributed across different machinesvTraditional three-layered view User-interface layer an applications user interface Processing layer application,i.e.without specific data Data layer data to manipulate through the applicationCentralized ArchitecturesBasic Client-Server Model:vServer:process implementing a certain servicevClient:uses the service by sending a request and waiting for the replyvClients and servers can be distributed across different machinesvClients follow request-reply model with respect to using servicesMain problem to deal with:unreliable communicationNote:often both roles simultaneously for different servicesDelivery FailuresvHow can a client tell that a request message was lost?Timeout is one approach.vHow can a client detect the difference between a request message that was lost,and a reply message that was lost?No great answer,usually can offer only“at most once”service,or“at least once”service.vDoes using a connection-oriented protocol like TCP help?Book is misleading.vTCP provides guarantees only in the absence of faults.Packets can be lost,but this can be thought of as“normal”operation.vIf you want to make sure that the data actually got there,and got processed,you need wait for an application-level acknowledgement from the receiver.Why doesnt TCP do this for you?Because it requires too much application knowledge.Do you want the ack when it gets to the app,or when written to disk,or RDBMS,etc.?IdempotencyvCan you categorize these into two categories?Read my account balance.Transfer$100 from savings to checking.Change block 100 of file A to read:“abcdef”.Copy block 100 of file A to block 200.IdempotentvAn operation can be repeated multiple times without harm,it is said to be idempotent.vSince some requests are idempotent and others are not it should be clear that there is no single solution for dealing with lost messages.3-Tier ArchitecturesvClient server is somewhat simplistic.A three-tier architecture has emerged:User interfaceProcessing(business logic)Data(database)vWhat are examples of each of these layers?Application Layering(1/2)vTraditional three-layered view:User-interface layer contains units for an applications user interfaceProcessing layer contains the functions of an application,i.e.without specific dataData layer contains the data that a client wants to manipulate through the application componentsvObservation:This layering is found in many distributed information systems,using traditional database technology and accompanying applications.Internet Search EnginevThe simplified organization of an Internet search engine into three different layers.vOther examples:Stock brokerage decision support:User interfaceAnalysisFinancial databasevData level is typically an RDBMS,so will include replication and consistency functionality.Logical Architecture vs.Physical ArchitecturevPhysical architecture may or may not match the logical architecture.vCould have just two types:Client machine containing interfaceServer machine running all elsevOr could have other partitions.2.2 System architectureMulti-Tiered ArchitecturesvThe simplest organization is to have only two types of machines:A client machine containing only the programs implementing(part of)the user-interface levelA server machine containing the rest,the programs implementing the processing and data levelMulti-Tiered ArchitecturesvLogically Single-tiered:dumb terminal/mainframe configuration Two-tiered:client/single server configuration Three-tiered:each layer on separate machinevPhysically Distributing components into client and server machines Traditional two-tiered configurations:vExamples:a:server-side has some control over UI.c:form checking.d:banking application just uploads transaction.e:Local cachevWhats good about moving things out to desktop machines?Whats bad?Thin clients are popular,why?Less management.Physical 3-TieredvObservation:server-side solutions are becoming increasingly more distributed as a single server is being replaced by multiple servers running on different machines.A server may sometimes need to act a client.vAn example of a server acting as a client.Web server,TPMAnother Description of 3-Tier Architecture3-Tier Example:Web Proxy ServerClientProxyserverWebserver ClientWebserver Process:Computer:3-Tier Example:Clients Invoke Individual ServersServerClientClientInvocationResultClientInvocationResultProcess:Computer:Horizontal vs.Vertical DistributionvPreviously,we have looked at what is known as vertical distribution.The different tiers correspond directly with the logical organization of applications.Multitiered client-server architectures are a direct consequence of dividing applications into a user-interface,processing components,and a data level.vertical fragmentation as used in distributed relational databasesvWe can also have horizontal distribution,what is that?A client or server may be physically split up into logically equivalent parts,but each part is operating on its own share of the complete data set,thus balancing the load.A class of modern architectures that support horizontal distribution,known as peer-to-peer.Things like replication and clusters.2.2 System architecturevAn example of horizontal distribution of a Web service.DisksvHorizontally distributed servers may talk to each other.Peer-to-PeervHow does it differ from previous?vCan all apps be done as P2P?vGenerally,always on an overlay network.vWhat is an overlay network?An overlay network is a logical network.Are neighbors in the overlay network connected by a real link?Are nodes that are close in the overlay network close in the physical network?Overlay network,that is,a network in which the nodes are formed by processes and the links represent the possible communication channels(which are usually realized as TCP connections).In general,a process cannot communicate directly with an arbitrary other process,but is required to send messages through the available communication channels.Distributed Hash Tables(1/2)vLets say that you have a lot of data things that you want to distribute over a P2P network.Assume that for each data object,there is an associated key that is an integer.vHow do you find something?Its on some node out there somewhere.Basic operation:map a key to a node.Distributed Hash Tables(2/2)vIn a DHT-based system,data items are assigned a random key from a large identifier space,such as a 128-bit identifier.By far the most-used procedure is to organize the processes through a DHT.vthe nodes are logically organized in a ring such that a data item with key k is mapped to the node with the smallest identifier id=k.This node is referred to as the successor of key k and denoted as succ(k),Decentralized ArchitecturesvIn the last couple of years we have been seeing an impressive growth in P2P systems Structured,DHT-based,P2P:nodes are organized following a specific distributed data structure Unstructured P2P:nodes have randomly selected neighbors Hybrid P2P:some nodes are appointed special functions in a well-organized fashionvNote:In virtually all cases,we are dealing with overlay networks:data is routed over connections setup between the nodes(cf.application-level multicasting).Structured P2P systemsvBasic Idea:Organize the nodes in a structured overlay network such as a logic ring,and make specific nodes responsible for services based only on their IDvThe system provides an operation LOOKUP(Key)to route the lookup request to the associated nodeChordMembership Management ChordvHow nodes organize themselves into an overlay network.vJoining the systemGenerate a random identifier idContact succ(id)and its predecessor and Insert itself in the ringeach data items whose key is now associated with node id,is transferred from succ(id)vLeaving the systemNode id informs its departure to its predecessor and successor,and transfers its data items to succ(id)Structured P2P Systems:Content Addressable Network(CAN)vOther example:Organize nodes in a d-dimensional space and let every node take the responsibility for data in a specific region.When a node joins split a region.Membership Management CANvHow nodes organize themselves into an overlay network.vA node P wants to join the systemPick an arbitrary point form the coordinate spaceContact node Q in whose region that point falls Q splits its region into two halves,and one half is assigned to the node PvLeaving the systemAssign to one of its neighbors A background process is periodically started to reparation the entire space.Unstructured P2P SystemsvObservation:Many unstructured P2P systems attempt to maintain a random graph:vBasic principle:Each node is required to be able to contact a randomly selected other node:Let each peer maintain a partial view of the network,consisting of c other nodesEach node P periodically selects a node Q from its partial viewP and Q exchange information and exchange members from their respective partial viewsvObservation:It turns out that,depending on the exchange,randomness,but also robustness of the network can be maintained.vIn general,much easier to leave/join the networkHybrid ApproachesvBasic idea:Distinguish two layers:(1)maintain random partial views in lowest layer;(2)be selective on who you keep in higher-layer partial view.vNote:lower layer feeds upper layer with random nodes;upper layer is selective when it comes to keeping references.vInteresting behaviors.Nodes on a grid.Each node maintains a list of nearest neighbors,using the Manhattan distance.Initially,the links are random.vComplete different ranking functions can be used,such as those based on semantic distance,to form semantic overlay networks.SuperpeersvObservation:Sometimes it helps to select a few nodes to do specific work:superpeervSome obvious examples Transience pick the most stable ones Search have them keep the indexes for scalable searches Organization have them monitor the state of the networkvSuperpeers can be static,or selected dynamically from the other peers.vHow do you pick a superpeer?vCan use leader election?Hybrid Architectures(1/2)vObservation:In many cases,client-server architectures are combined with peer-to-peer solutionsvExample:Edge-server architectures,which are often used for Content Delivery Networks:Client-server architectures and P2P solutionsE.g.Edge-server architectures often used for Content Delivery NetworksEdge-servers are placed at the edge of the network,responsible for caching,filtering,transcoding,Clients connect through the edge-serverHybrid Architectures(1/2)vViewing the Internet as consisting of a collection of edge servers.Hybrid architectures(2/2)vExample:Combining a P2P download protocol with a client-server architecture for controlling the downloads:Bittorrentclient/server to connect to the swarm and P2P from then onFiles are splits into chunks,peers swap chunks within a swarmGet a torrent from a web siteContact the track listed in the torrentGet a set of peers from the tracker and connect to the swarmHybrid architectures(2/2)vBasic idea:Once a node has identified where to download a file from,it joins a swarm of downloaders who in parallel get file chunks from the source,but also distribute these chunks amongst each other.OutlinevArchitectural stylesvSystem architecturesvArchitectures versus middlewarevSelf-management in distributed systemsArchitectures versus middlewarevWe have talked about the physical architecture.vDoes middleware also have an architectural style?If it does,how does it affect flexibility,extensibility?Sometimes,the“native”style may not be optimal.Can we build messaging over RPC?Can we build RPC over messaging?Architectures versus middlewarevA key goal for middleware is to provide distribution transparencyvTypically,however,middleware adopts particular architecture stylesMakes it simpler to develop applications for that styleMakes it hard/inefficient to do it with any other!vTo alternatives build different versions or make them easy to adapt dynamicallyvInterceptors:intercept the usual flow of control when invoking a remote objectInterceptorsvRequest level could handle replication.vMessage-level could handle fragmentation.Collaborative Distributed SystemsvComponents of Globule collaborative content distribution network:A component that can redirect client requests to other servers.A component for analyzing access patterns.A component for managing the replication of Web pages.Adaptive middlewarevTo deal with changing environments/demands adaptive middlewarevTo facilitate software adaptationvSeparation of concerns:Try to separate extra functionalities and later weave them together into a single implementation only toy examples so far.vComputational reflection:Let a program inspect itself at runtime and adapt/change its settings dynamically if necessary mostly at language level and applicability unclear.vComponent-based design:Organize a distributed application through components that can be dynamically replaced when needed highly complex,also many intercomponent dependencies.vObservation:Do we need adaptive software at all,or is the issue adaptive systems?OutlinevArchitectural stylesvSystem architecturesvArchitectures versus middlewarevSelf-management in distributed systemsSelf-managing Distributed SystemsvObservation:Distinction between system and software architectures blurs when automatic adaptivity needs to be taken into account:Self-configurationSelf-managingSelf-healingSelf-optimizingSelf-*vNote:There is a lot of hype going on in this field of autonomic computing.Feedback Control ModelvObservation:In many cases,self-*systems are organized as a feedback control system:System needs to be monitoredCollected measurements must be analyzed to decide on adaptationDifferent mechanisms must be used to enact changesNot unlike manual managementExample:Systems Monitoring with AstrolabevData collection and information aggregation in Astrolabe.vEach upper zone aggregated the lower zone.vMost interesting part is how to query.An SQL model is adopted.vFor example,an average:SELECT AVG(procs)AS avg_procs FROM hostinfovSuch a query would be running on a node.vInformation needs to be propagated.Done through gossiping.Example:Differentiating Replication Strategies in GlobulevGlobule:Collaborative CDN that analyzes traces to decide where replicas of Web content should be placed.Decisions are driven by a general cost model:cost=(w1 m1)+(w2 m2)+(wn mn)Globule origin server collects traces and does what-if analysis by checking what would have happened if page P would have been placed at edge server S.Many strategies are evaluated,and the best one is chosen.The dependency between prediction accuracy and trace lengthReferences on peer-to-peervEng Keong Lua,Jon Crowcroft,Marcelo Pias,Ravi Sharma and Steven Lim,A survey and comparison of peer-to-peer overlay network schemes,IEEE Communications Surveys&Tutorials,(7)2:22-73,Apr.,2005An excellent survey of modern peer-to-peer systems,covering structured as well as unstructured networks.This paper forms a good introduction for those wanting to get deeper into the subject but do not really know where to start.vAndroutsellis-Theotokis and Spinellis,2004S.Androutsellis-Theotokis and D.Spinellis,A survey of peer-to-peer content distribution technologies,ACM Comput.Surv.,vol.36,pp.335-371,2004.
收藏