《分布式系统》研究生教学课件
分布式系统研究生教学课件,分布式系统,分布式,系统,研究生,教学,课件
Distributed SystemsDistributed SystemsConsistency&ReplicationConsistency&ReplicationYingchi MaoOutlinevIntroduction(whats it all about)vData-centric consistencyvClient-centric consistencyvReplica managementvConsistency protocolsReplicationvReplication:Creating and using multiple copies of data or servicesvWhy replicate?Improve reliabilityData survivalAvailability Increase confidence:e.g.deal with byzantine failuresImprove performanceScalingReduce access timesData vs.ComputationvReplication:It could be data replication if the same data is stored on multiple storage devicesEg,Web site mirror,browser cache,DNSor computation replication if the same computing task is executed many times.What are the issues?vUpdating replicasConsistency(how to deal with updated data)(luckily)applications do not always require strict consistencyHow are updates distributed?vReplica managementHow many replicas?Where to place them?When to get rid of them?vRedirection/RoutingWhich replica should clients use?Scalability CONFILCT management overheadsExample:CostsvAs a scaling technique,may not always be applicable.PAccess replica N times per secondUpdate replica M times per secondvAs a scaling technique,may not always be applicable.What if N TLinearizability ConsistencyA data store is linearizable if it provides:vSequential consistency,and vWhen assuming some limited granularity global clock if TSop1(x)TSop2(y),then operation OP1(x)should precede OP2(y)Causal ConsistencyWrites that are potentially causally related must be seen by all processes in the same order.Concurrent writes may be seen in a different order by different processes.Causally related relationship:vA read is causally related to the write that provided the data the read got.vA write is causally related to a read that happened before this write in the same process.vIf write1 read,and read write2,then write1 write2.Concurrent not causally relatedCausal Consistency(Example)vThis sequence is allowed with a causally-consistent store,but not with sequentially or strictly consistent store.vNote:W1(x)a W2(x)b,but not W2(x)b W1(x)cCausal Consistency(More Examples)A correct sequence of events in a causally-consistent store.A violation of a causally-consistent store.FIFO ConsistencyCausally related writes done by a single process are seen by all other processes in the order in which they were issued,(but writes from different processes may be seen in a different order by different processes.)Grouping Operations vAs viewed by an external,data-centric process,what do locks do?They turn non-atomic operations into atomic ones(functionally).In other words,they group them.vSynchronization VariablesOperations are grouped via synchronization variables(locks).Each synchronization variable protects an associated data.Each kind of synchronization variable has some associated properties.Weak ConsistencyvIntuition:You dont care that reads and writes of a series of operations are immediately known to other processes.You just want the effect of the series itself to be known.vWeak consistency propertiesAccesses to synchronization variables associated with a data store are sequentially consistent.No operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhere.No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.Weak Consistency(Example)An invalid sequence for weak consistency.A valid sequence of events for weak consistency.Release ConsistencyvIntuition:Weak consistency uses the same primitive to synchronize before reading and writing the data introduce locking primitives acquire and releasevRules for release consistency:Before a read or write operation on shared data is performed,all previous acquires done by the process must have completed successfully.Before a release is allowed to be performed,all previous reads and writes by the process must have completedAccesses to synchronization variables are FIFO consistent(sequential consistency is not required).Release Consistency Example A valid event sequence for release consistency.Entry Consistency vIntuition:for release consistency data is synchronized when lock is released for all variables.vSolution:Variable specific synchronizationSynchronize at entry onlyvRules:An acquire access of a synchronization variable is not allowed to perform with respect to a process until all updates to the guarded shared data have been performed with respect to that process.Before an exclusive mode access to a synchronization variable by a process is allowed to perform with respect to that process,no other process may hold the synchronization variable,not even in nonexclusive mode.After an exclusive mode access to a synchronization variable has been performed,any other processs next nonexclusive mode access to that synchronization variable may not be performed until it has performed with respect to that variables owner.Entry Consistency(Example)A valid event sequence for entry consistency.Since P3 first does an acquire for y,it will read b when y is released by P1P2 will get a for x,but may get NIL when reading yStrictAbsolute time ordering of all shared accesses matters.LinearizabilityAll processes must see all shared accesses in the same order.Plus accesses are ordered according to a(nonunique)global timestampSequentialAll processes see all shared accesses in the same order.Accesses are not ordered in timeCausalAll processes see causally-related shared accesses in the same order.FIFOAll processes see writes from each other in the order they were issued.Nothing is guaranteed about writes from different processesWeakShared data can be counted on to be consistent only after a synchronisation is doneReleaseShared data are made consistent when a critical region is exited(or entered)EntryShared data pertaining to a critical region are made consistent when a critical region is entered.Consistency models not using explicit synchronization operations.Models with explicit synchronization operations.Summary data centric consistencyOutlinevIntroduction(whats it all about)vData-centric consistencyvClient-centric consistencyvReplica managementvConsistency protocolsClient-Centric Consistency ModelsvSystem modelvMonotonic readsvMonotonic writesvRead-your-writesvWrite-follows-readsvGoal:Show how we can perhaps avoid system wide consistency,by concentrating on what specific clients want,instead of what should be maintained by servers.Example:Consistency for Mobile UsersvExample:Consider a distributed database to which you have access through your notebook.Assume your notebook acts as a front end to the database.vAt location A you access the database doing reads and updates.vAt location B you continue your work,but unless you access the same server as the one at location A,you may detect inconsistencies:your updates at A may not have yet been propagated to Byou may be reading newer entries than the ones available at Ayour updates at B may eventually conflict with those at AvNote:The only thing you really want is that the entries you updated and/or read at A,are in B the way you left them in A.In that case,the database will appear to be consistent to you.vIdea:the database will appear to be consistent to the uservClient-centric consistency consistency for a single client,nothing about concurrent access by difference clientsExample:Basic ArchitecturevHow well does EC work for mobile clients?vClient-centric is for this.Consistent for a single client.Eventual ConsistencyvIf no updates take place for a long enough period time,all replicas will gradually(i.e.,eventually)become consistent.vSituations where eventual consistency models may make senseMostly read-only workloadsNo concurrent updatesvAdvantages/DrawbacksClient-centric ConsistencyvIdea:Guarantees some degree of data access consistency for a single client.vNotations:Xit Version of data item x at time t at local copy LiWS(xi t)all write operations at Li since initWS(xi t,xj t)indicates that it is known that WS(xi t)is part of WS(xj t)Monotonic-Read ConsistencyvDef:If a process reads the value of a data item x,any successive read operation on x by that process will always return that same or a more recent value.vIf youve seen a value of x at time t,youll never see anything older at a later time.vIntuition:Client“sees”only same or newer version of data.The set of write operations at L2 include those done at L1Monotonic reads ExamplesvAutomatically reading your personal calendar updates from different servers.Monotonic Reads guarantees that the user sees all updates,no matter from which server the automatic reading takes place.vReading(not modifying)incoming e-mail while you are on the move.Each time you connect to a different e-mail server,that server fetches(at least)all the updates from the server you previously visited.Monotonic-Write ConsistencyvDef:A write operation by a process on a data item x is completed before any successive write operation on x by the same process.vIntuition:Write happens on a copy only if its brought up to date with preceding write operations on same data(but possibly at different copies)Monotonic writes ExamplesvUpdating a program at server S2,and ensuring that all components on which compilation and linking depends,are also placed at S2.vMaintaining versions of replicated files in the correct order everywhere(propagate the previous version to the server where the newest version is installed).Read-Your-Writes ConsistencyvDef:The effect of a write operation by a process on data item x,will always be seen by a successive read operation on x by the same process.vIntuition:All previous writes are always completed before any successive readRead-Your-Writes-ExamplesvUpdating your Web page and guaranteeing that your Web browser shows the newest version instead of its cached copy.vPassword databaseWrites-Follow-Reads ConsistencyvDef:A write operation by a process on a data item x following a previous read operation on x by the same process,is guaranteed to take place on the same or a more recent value of x that was read.vIntuition:Any successive write operation on x will be performed on a copy of x that is same or more recent than the last read.Writes-Follow-Reads-ExamplesvExamples:news groupsSee reactions to posted articles only if you have the original posting(a read.pulls in.the corresponding write operation).Summary client centric consistencyvMain takeWe can avoid system-wide consistency,by concentrating on what specific clients want,instead of what should be maintained by servers.Relax consistency requirements even furthervOnly concerned about single client-viewOutlinevIntroduction(whats it all about)vData-centric consistencyvClient-centric consistencyvReplica managementvConsistency protocolsReplica ManagementvReplica server placementvContent replication and placementvUpdate distribution/propagationReplica Server PlacementvEssence:Figure out what the best K places are out of N possible locations.vSelect best location out of N-k for which the average distance to clients is minimal.Then choose the next best server.(Note:The first chosen location minimizes the average distance to all clients.)Computationally expensive.vSelect the k-th largest autonomous system and place a server at the best-connected host.Computationally expensive.vPosition nodes in a d-dimensional geometric space,where distance reflects latency.Identify the K regions with highest density and place a server in every one.Computationally cheap.ClusteringvOne idea,identify the K largest clusters,then put one server in each cluster.vHow do you find clusters?One way,divide space up into cells,pick K most populated ones.Replica-Server PlacementvChoosing a proper cell size for server placement.vTurns out that computing from average distance between two nodes and the number of replicas works well.Content replicationvModel:We consider objects(and dont worry whether they contain just data or code,or both)vDistinguish different processes:A process is capable of hosting a replica of an object or data:Permanent replicas:Process/machine always having a replicaServer-initiated replica:Process that can dynamically host a replica on request of another server in the data storeClient-initiated replica:Process that can dynamically host a replica on request of a client(client cache)Content Replication(contd.)vThe logical organization of different kinds of copies of a data store into three concentric rings.Content Replication:Server-Initiated ReplicasvKeep track of access counts per file,aggregated by considering server closest to requesting clientsvNumber of accesses threshold R replicate filevNumber of access between D and R (and more requesters at P than at Q)migrate file to PContent distributionvConsider only a client-server combination,Options:Propagate only notification/invalidation of update(often used for caches)Transfer data from one copy to another(distributed databases)Propagate the update operation to other copies(also called active replication)vNote:No single approach is the best,but depends highly on available bandwidth and read-to-write ratio at replicas.Propagating updates(1/3)IssuePush-basedPull-basedState of serverList of client replicas and cachesNoneMessages sentUpdate(and possibly fetch update later)Poll and updateResponse time at clientImmediate(or fetch-update time)Fetch-update timevPushing updates:server-initiated approach,in which update is propagated regardless whether target asked for it.vPulling updates:client-initiated approach,in which client requests to be updated.Propagating updates:Leases(2/3)vObservation:We can dynamically switch between pull and push using leases:Lease:A contract in which the server promises to push updates to the client until the lease expires.vIssue:Make lease expiration time dependent on systems behavior(adaptive leases):Age-based leases:An object that hasnt changed for a long time,will not change in the near future,so provide a long-lasting leaseRenewal-frequency based leases:The more often a client requests a specific object,the longer the expiration time for that client(for that object)will beState-based leases:The more loaded a server is,the shorter the expiration times becomevUnicasting or multicastingWith push-based,multicasting may be a good ideaWith pull-based,unicast is your only reasonable modelPropagating updates:Epidemic protocols(3/3)vProblem so far:server scalabilityServer needs to provide updates to all system participantsWhat if Internet-scale system(P2P)?vEpidemic protocols:Nodes periodically pair up and exchange state(or updates)Push and/or pull exchanges vIssuesGenerated traffic mapping on physical topologyWhen is an update distributed to everyone?Probabilistic guarantees FreeridingvUse:server-less environments,volatile nodes mobile networks,P2P systemsSummaryvIntroduction(whats it all about)vData-centric consistencyvClient-centric consistencyvReplica managementvConsistency protocolsConsistency ProtocolsvWe focus on those that enforce sequential consistencyvPrimary-Based ProtocolsRemote-write protocolLocal-write protocolvReplicated-Write ProtocolsActive replicationQuorum-based protocols Primary-Based Protocols(1/4)vPrimary-backup protocol with remote writes:all writes are blocking,forwarded to a fixed primary server;reads are localThe process that does the write may block for a long while;but this is fault tolerant and easy to implementA non-blocking approach trades fault tolerance for performancePrimary-Based Protocols(2/4)vPrimary-backup protocol with remote writes:all writes are blocking,forwarded to a fixed primary server;reads are localvExample:Traditionally applied in distributed databases and file systems that require a high degree of fault tolerance.Replicas are often placed on same LAN.Primary-Based Protocols(3/4)vPrimary-Backup protocol with local writes:a single copy is migrated between processes(fully distributed non-replicated version of the data store).Location information is the main problem in a widely distributed data store.Primary-Based Protocols(4/4)vPrimary-Backup protocol with local writes:Multiple successive writes can be done locallyvExample:Mobile computing in disconnected mode(ship all relevant files to user before disconnecting,and update later on).Replicated-Write ProtocolsvActive ReplicationvRequires a process,for each replica,that can perform the update on itvHow to enforce the update order?Totally-ordered multicast mechanism neededCan be implemented by Lamport timestampsCan be implemented by sequencervProblem of replicated invocationsIf an object A invokes another object B,all replicas of A will invoke B(multiple invocations)Active Replication(1/2)vEach replica has an associated process that carries out update operations.vUpdates are propagated by means of the write operation that causes the update Upgrades need to maintain operations order(Lamport timestamps or coordinator);vProblems of replicated invocations:multiple invocations of the same object can produce errorsActive Replication(2/2)vForwarding an invocation request from a replicated object.vReturning a reply to a replicated object.Active Replication Using MulticastvActive replicationFE multicasts request to each replicaImplementing Ordered MulticastvIncoming messages are held back in a queue until delivery guarantees can be metImplementing Ordered MulticastvIncoming messages are held back in a queue until delivery guarantees can be metvCoordination between machines needed to determine delivery orderFIFO-ordering:easy,use a separate sequence number for each processTotal-ordering:use Lamports timestampsCausal-ordering:use vector timestampsNetwork PartitionsvPrimary-based vReplicated-writevNeither works if network is partitioned since in both approaches coordination involves all RMsNetwork PartitionsvWell-known Solution:Quorum-based ProtocolsvIdea?Use MajorityvReadRetrieve number of replicas in read quorumSelect the one with the latest versionPerform a read on itvWriteRetrieve number of replicas in write quorumFind the latest version and increment itPerform a write on the entire write quorumQuorum-based protocols:vclients request and acquire permission of multiple server before accessing datavAssign a vote to each copy of a replicated object(say Vi)such that the sum of Vi equals to V.vEach operation has to obtain a read quorum(Vr)to read and a write quorum(Vw)to write an objectvThen the following rules have to be obeyed in determining the quorums:Vr+Vw V an object is not read and written by two transactions concurrently.Vw V/2 two write operations from two transactions cannot occur concurrently on the same objectReplicated-Write ProtocolsvQuorum-based protocols:Ensure that each operation is carried out in such a way that a majority vote is established:distinguish read quorum and write quorum:Cache Coherence ProtocolsvCaching can be analyzed according to different parametersv Coherence detection strategy(when)verification of consistency before cached data accessed no verification:data are assumed consistent verification after cached data used v Coherence enforcement strategy(how)no cached shared data(only at servers)servers send invalidation messages to all caches servers propagate updates vWrite-through cache clients modify cached data and forward updates to servers SummaryvAgain,we use replication for performance and reliabilityvReplication,however,introduces a few issuesThe problem of consistency,which we may pay in terms of performanceThe“details”of placement and management
收藏