《分布式系统》研究生教学课件
分布式系统研究生教学课件,分布式系统,分布式,系统,研究生,教学,课件
Distributed SystemsDistributed SystemsNamingNamingYingchi MaoOutlinevNames,Identifiers,and AddressesvFlat namingvStructured namingvAttributed-based namingNaming EntitiesvNames,identifiers,and addressesvName resolutionvName space implementationNamingvName:strings of bits of characters used to denote an entityWhat is an entity in a distributed system?Resources(hosts,printers,etc)processes,users,newsgroup,web pages,network connections,etc)Essence:Names are used to denote entities in a distributed system.vWhat properties are desired in name?Location independenceEasy to rememberAddressvSuppose I have the name of an entity.Can I now operate on it immediately?Suppose I have your name and want to access you?vTo operate on an entity,we need to access it at an access point.Access points are entities that are named by means of an address.Address:the name of an access pointvWhy not use the address of an entity as its name?An entity can offer more than one access points.An entity may change its access point.vNote:A location-independent name for an entity E,is independent from the addresses of the access points offered by E.vMachine readable:memory addresses:32/64 bit-string,48bits for Ethernet addresses)vsvUser-friendly names or human-friendly names(in Unix each file can have up to 255 bytes name).vNames unique?Permanent?vAddresses unique?Permanent?Names are valuable!NYT,August00IdentifiersvPure name:A name that has no meaning at all;it is just a random string.Pure names can be used for comparison only.vIdentifier:A name having the following properties:P1:Each identifier refers to at most one entityP2:Each entity is referred to by at most one identifierP3:An identifier always refers to the same entity(prohibits reusing an identifier)vObservation:An identifier need not necessarily be a pure name,i.e.,it may have content.vIDs are also a kind of name.What properties do they have?Refers to at most one entityEach entity has at most one IDID is permanentvExamples?What is your name?What is your address?What is your ID?SSN,phone number,passport number,street address,e-mail address.vCan we substitute one for the other?Use your phone number as your name?Use your SSN as your name?Use your name as your SSN?Phone number?IdentifiersCentral QuestionvHow to resolve names to addresses?Naming vs.LocatingvDNS works well for static situations,where addresses dont change very often.vWhat if we assume that it does?vSuppose we want to change ftp.cs.hhu.edu to ftp.ee.hhu.edu.How?Change the IP address for ftp.cs.hhu.edu.Make a“symbolic link”from ftp.cs.hhu.edu to ftp.ee.hhu.edu.In other words,put an entry in DNS that says ftp.cs.hhu.edu has been renamed to ftp.ee.hhu.edu.vCompare and contrast?If the first is over long distances,latency may be slow.Also,could be a bottleneck,since it is centralized.Adding indirection via a“symbolic link”can create very long chains.Chain management is an issue.vA better solution is to divide into separate naming and location service.How many levels of naming are used in typical Internet?Two-level mapping using identities and separate naming and location service.Direct,single level mapping between names and addresses.NamingserviceLocationserviceNaming vs.LocatingvA name is resolved when it is translated into data about the named resourcevBinding:the association between a name and the attributes of a resources,usually its addressvA name service stores a collection of one or more naming contexts(binding between names of resources and attributes of resources)vShould support:Name resolution:lookup attributes from a given nameAlso,create new bindings,deleting bindings etcvRequirements:Scale(number of names and administrative domains)A long lifetimeHigh availabilityFault isolationTolerance of mistrustOutlinevNames,Identifiers,and AddressesvFlat namingvStructured namingvAttributed-based namingFlat NamingvObservation:In many cases,identifiers are simply random bit strings,which we conveniently refer to as unstructured,or flat names.vProblem:Given an essentially unstructured,how can we locate its associated access point?Simple solutions(broadcasting)Forwarding PointersHome-based approachesDistributed Hash Tables(structured P2P)Hierarchical location serviceSimple SolutionsvBroadcasting:Simply broadcast the ID,requesting the entity to return its current address.Can never scale beyond local-area networks(think of ARP/RARP)Requires all processes to listen to incoming location requestsvForwarding pointers:Each time an entity moves,it leaves behind a pointer telling where it has gone to.Dereferencing can be made entirely transparent to clients by simply following the chain of pointersUpdate a clients reference as soon as present location has been foundGeographical scalability problems:Long chains are not fault tolerantIncreased network latency at dereferencingEssential to have separate chain reduction mechanismsvObject originally in P3,then moved to P4.P1 passes object reference to P2.Do we always need to go through the whole chain?Forwarding Pointers in SSPShortcutRedirecting a forwarding pointer,by storing a shortcut in a client stub.How to deal with broken chains?Use a home to solveHome-Based Approaches(1/3)vSingle-tiered scheme:Let the home location keep track where the entity isEntitys home address is registered at a naming service.The home registers the foreign address of the entityClients always contact the home first,and then continue with the foreign locationHome-Based Approaches(2/3)vLet a home keep track of where the entity is:An entitys home address is registered at a naming serviceThe home registers the foreign address of the entityClients always contact the home first,and then continues with the foreign locationHome-Based Approaches(3/3)vTwo-tiered scheme:Keep track of visiting entities:Check local visitor register firstFall back to home location if local lookup failsvProblems with home-based approachesHome address has to be supported for entitys lifetimeHome address is fixed=unnecessary burden when the entity permanently movesPoor geographical scalability(entity may be next to client)Question:How can we solve the“permanent move”problem?vExample:Consider the organization of many nodes into a logical ring(Chord)Each node is assigned a random m-bit identifier.Every entity is assigned a unique m-bit key.Entity with key k falls under jurisdiction of node with smallest id k(called its successor).vNonscalable Solution:Let node id keep track of succ(id)and start linear search along the ring.Distributed Hash TablesDHTs:Finger Tables(1/2)vPrinciple:Each node p maintains a finger table FTp with at most m entries:FTpi=succ(p+2i1)Note:FTpi points to the first node succeeding p by at least 2i1.vTo look up a key k,node p forwards the request to node with index j satisfyingq=FTpj k FTpj+1vIf p k FTp1,the request is also forwarded to FTp1DHTs:Finger Tables(2/2)Exploiting Network ProximityvProblem:The logical organization of nodes in the overlay may lead to erratic message transfers in the underlying Internet:node k and node succ(k+1)maybe very far apart.vTopology-aware node assignment:When assigning an ID to a node,make sure that nodes close in the ID space are also close in the network.Can be very difficult.vProximity routing:Maintain more than one possible successor,and forward to the closest.Example:in Chord FTpi points to first node in INT=p+2i1,p+2i 1.Node p can also store pointers to other nodes in INT.An additional advantage of having multiple successors for every table entry is that node failures failures need not immediately lead to failures of lookupsvProximity neighbor selection:When there is a choice of selecting who your neighbor will be(not in Chord),pick the closest one.Hierarchical Location Services(HLS)vBasic idea:Build a large-scale search tree for which the underlying network is divided into hierarchical domains.Each domain is represented by a separate directory node.vLeaf domains typically correspond to a local-area network or a cellvThe root(directory)node knows all the entitiesvEach entity currently in a domain D is represented by a location record in the directory node dir(D)which is the entitys current address or a pointerHLS:Tree OrganizationvThe address of an entity is stored in a leaf node,or in an intermediate nodevIntermediate nodes contain a pointer to a child if and only if the subtree rooted at the child stores an address of the entityvThe root knows about all entitiesvAn entity may have multiple addresses(e.g.,if it is replicated)HLS:Lookup OperationvBasic principles:Start lookup at local leaf nodeIf node knows about the entity,follow downward pointer,otherwise go one level upUpward lookup always stops at rootHLS:Insert OperationOutlinevNames,Identifiers,and AddressesvFlat namingvStructured namingvAttributed-based namingStructured NamingvName SpacevName ResolutionvThe Implementation of a Name SpacevExample:The Domain Name SystemName SpacevNames organized into a name spacevA name space a collection of valid names recognized by a particular servicevNames may have an internal structure that represents their position if a hierarchical name spacevMain advantages:Each part of a name is resolved relative to a separate contextPotentially infiniteDifferent contexts managed by different entitiesName SpacesvThe name space is the way that names in a particular system are organized.This also defines the set of all possible names.vExamples?Phone numbersCredit card numbersDNSHuman names in the USFiles in UNIX,WindowsURLsvNames are organized into what is commonly referred to as a name space.A name space can be represented as a labeled,directed graph with two types of nodes.Name Space(1/2)vA(hierarchical)name space can be represented as a labeled,directed graph(naming graph)with two types of nodes.a leaf node represents a(named)entity.A directory node refers to other nodes;nodes;stores a(directory)table of(edge label,node identifier)pairs.Name Space(2/2)vObservation:We can easily store all kinds of attributes in a node,describing aspects of the entity the node represents:Type of the entityAn identifier for that entityAddress of the entitys locationNicknames.vObservation:Directory nodes can also have attributes,besides just storing a directory table with(edge label,node identifier)pairs.Name ResolutionvLooking up a name(finding the“value”)is called name resolution.vProblem:To resolve a name we need a directory node.How do we actually find that(initial)node?vClosure mechanism(or where to start)The mechanism to select the implicit context from which to start name resolution.Examples,file systems,ZIP code,DNSwww.cs.vu.nl:start at a DNS name server/home/steen/mbox:start at the local NFS file server(possible recursive search)0031204447784:dial a phone number130.37.24.8:route to the VUs Web servervQuestion:Why are closure mechanisms always implicit?vObservation:A closure mechanism may also determine how name resolution should proceedName Linking(1/2)vHard link:What we have described so far as a path name:a name that is resolved by following a specific path in a naming graph from one node to another.vSoft link:Allow a node O to contain a name of another node:First resolve O s name(leading to O)Read the content of O,yielding nameName resolution continues with namevObservations:The name resolution process determines that we read the content of a node,in particular,the name in the other node that we need to go to.One way or the other,we know where and how to start name resolution given nameName Linking(2/2)vObservation:Node n5 has only one nameName Space ImplementationvBasic issue:Distribute the name resolution process as well as name space management across multiple machines,by distributing nodes of the naming graph.vConsider a hierarchical naming graph and distinguish three levels:Global level:Consists of the high-level directory nodes.Main aspect is that these directory nodes have to be jointly managed by different administrationsAdministrational level:Contains mid-level directory nodes that can be grouped in such a way that each group can be assigned to a separate administration.Managerial level:Consists of low-level directory nodes within a single administration.Main issue is effectively mapping directory nodes to local name servers.Name Space DistributionvAn example partitioning of the DNS name space,including Internet-accessible files,into three layers.Name Space DistributionName Server CharacteristicsvA comparison between name servers for implementing nodes from a large-scale name space partitioned into a global layer,as an administrational layer,and a managerial layer.ItemGlobalAdministrationalManagerialGeographical scale of networkWorldwideOrganizationDepartmentTotal number of nodesFewManyVast numbersResponsiveness to lookupsSecondsMillisecondsImmediateUpdate propagationLazyImmediateImmediateNumber of replicasManyNone or fewNoneIs client-side caching applied?YesYesSometimesIterative Name Resolutionvresolve(dir,name1,.,nameK)is sent to Server0 responsible for dirvServer0 resolves resolve(dir,name1)dir1,returning the identification(address)of Server1,which stores dir1.vClient sends resolve(dir1,name2,.,nameK)to Server1,etc.Recursive Name Resolutionvresolve(dir,name1,.,nameK)is sent to Server0 responsible for dirvServer0 resolves resolve(dir,name1)dir1,and sends resolve(dir1,name2,.,nameK)to Server1,which stores dir1.vServer0 waits for the result from Server1,and returns it to the client.Caching in Recursive Name ResolutionvRecursive name resolution of.Name servers cache intermediate results for subsequent lookups.vIs iterative or recursive better for caching?Server for nodeShould resolveLooks upPasses to childReceives and cachesReturns to requestercs#-#vu#ni#root#Interactive and recursive resolutionvInteractive client drives the resolutionCaching by clientsPotentially costly communicationsvRecursive the server doesHigher performance demand on serversMore effective cachingReduced communication costsScalability Issues(1/2)vSize scalability:We need to ensure that servers can handle a large number of requests per time unit high-level servers are in big trouble.vSolution:Assume(at least at global and administrational level)that content of nodes hardly ever changes.In that case,we can apply extensive replication by mapping nodes to multiple servers,and start name resolution at the nearest server.vObservation:It puts a higher performance demand on each name server.Name servers in the global layer of a name space support only iterative name resolution.vGeographical scalability:We need to ensure that the name resolution process scales across large geographical distances.vProblem:By mapping nodes to servers that may,in principle,be located anywhere,we introduce an implicit location dependency in our naming scheme.Scalability Issues(2/2)Example:Decentralized DNSvBasic idea:Take a full DNS name,hash into a key k,and use a DHT-based system to allow for key lookups.Main drawback:You cant ask for all nodes in a subdomain(but very few people were doing this anyway).vInformation in a node:Typically what you find in a DNS record,of which there are different kinds:Type of recordAssociated entityDescriptionSOAZoneHolds information on the represented zoneAHostContains an IP address of the host this node representsMXDomainRefers to a mail server to handle mail addressed to this nodeSRVDomainRefers to a server handling a specific serviceNSZoneRefers to a name server that implements the represented zonePTRNodeSymbolic link with the primary name of the represented nodeCNAMEHostContains the canonical name of a hostHINFOHostHolds information on the host this node representsTXTAny kindContains any entity-specific information considered usefulOutlinevNames,Identifiers,and AddressesvFlat namingvStructured namingvAttributed-based namingAttribute-Based NamingvObservation:In many cases,it is much more convenient to name,and look up entities by means of their attributes traditional directory services(a.k.a.,yellow pages).vProblem:Lookup operations can be extremely expensive,as they require to match requested attribute values,against actual attribute values inspect all entities(in principle).vSolution:Implement basic directory service as database,and combine with traditional structured naming system.X.500 Directory Servicevprovides directory service based on a description of properties instead of a full name(e.g.,yellow pages in telephone book)van X.500 directory entry is comparable to a resource record in DNS.Each record is made up of a collection of(attribute,value)pairsvCollection of all entries is a Directory Information Base(DIB)vEach naming attribute is a Relative Distinguished Name(RDN)vRDNs,in sequence,can be used to form a Directory Information Tree(DIT)X.500 Directory EntryvA simple example of a X.500 directory entry using X.500 naming conventions./C=NL/O=Hohai University/OU=School of CSIEis analogous to the DNS name AttributeAbbr.ValueCountryCCNLocalityLNanjingOrganizationOHohai UniversityOrganizationalUnitOUSchool of CSIECommonNameCNMain serverMail_Servers-202.119.112.48FTP_Server-220.250.64.25WWW_Server-58.240.39.2Example:LDAP
收藏