数据存储方案

资源ID：194705401 资源大小：600.42KB 全文页数：9页
资源格式： PDF 下载积分：10积分

快捷下载

会员登录下载

微信登录下载

三方登录下载：

微信扫一扫登录

下载资源需要10积分

邮箱/手机：
温馨提示：	用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

网站客服

侵权投诉

数据存储方案

引言引言文献是由 Rick Cattell 撰写的论文，论文讨论了可扩展的结构化数据的、非结构化的(包括基于键值对的、基于文档的和面向列的）数据存储方案(注:NOSQL是支撑大数据应用的关键所在。事实上，将 NOSQL 翻译为“非结构化不甚准确，因为 NOSQL 更为常见的解释是：Not Only SQL（不仅仅是结构化）,换句话说,NOSQL 并不是站在结构化 SQL 的对立面，而是既可包括结构化数据，也可包括非结构化数据)。论文信息论文信息Scalable SQL and NoSQL Data StoresRick Cattell Originally published in 2010,last revised December 2011摘要摘要ABSTRACTIn this paper，we examine a number of SQL and so called“NoSQL”data storesdesigned to scale simple OLTPstyle application loads over many servers。Originally motivated by Web 2.0 applications，these systems are designed to scale tothousands or millions of users doing updates as well as reads,in contrast to traditionalDBMSs and data warehouses。We contrast the new systems on their data model,consistency mechanisms,storagemechanisms，durability guarantees,availability，query support，and otherdimensions.These systems typically sacrifice some of these dimensions，e。g.database-wide transaction consistency，in order to achieve others,e。g.higheravailability and scalability.在这篇文献中，我们验证了许多 SQL 和所谓的NoSQL数据存储（它设计于支持简单的 OLTP 风格的应用,能够用于扩展在很多服务器上)它最先由 Web 2。0 应用引起,与传统的数据库管理系统和数据仓库对比，这些系统设计为可扩展到数以千计或数以百万计的用户做更新,同时读取。我们对比了新系统上的数据模型，一致性机制,存储机制，持久性保证，可用性，支持的查询以及其它属性，这些系统典型的牺牲（为了实现其它属性而去掉）了一些属性。如数据库常有的事务一致性,牺牲了这个是为了其它的属性,如高可用，可扩展。Note:Bibliographic references for systems are not listed,but URLs for moreinformation can be found in the System References table at the end of this paper.注：参考书没列出来（翻译省)Caveat:Statements in this paper are based on sources and documentation that maynot be reliable，and the systems described are“moving targets，”so some statementsmay be incorrect。Verify through other sources before depending on informationhere.Nevertheless，we hope this comprehensive survey is useful！Check for futurecorrections on the authors web site cattell。net/datastores。警告：一些提及的书可能不可用。尽管如此,我们还是希望这篇综合的文献对大家有帮助，我们网站：cattell。net/datastores.Disclosure:The author is on the technical advisory board of Schooner Technologiesand has a consulting business advising on scalable databases.透漏：作者是可扩展数据库商业顾问.1.OVERVIEW1.OVERVIEWIn recent years a number of new systems have been designed to provide goodhorizontal scalability for simple read/write database operations distributed over manyservers。In contrast,traditional database products have comparatively little or noability to scale horizontally on these applications。This paper examines and comparesthe various new systems。近年，很多系统的设计提供良好水平扩展，支持在多服务器上分布式读写。相比较传统的系统,一般为无扩展，规模小。本篇文献研究与对比很多不同的新系统(Yol 注,其实就是各种 NOSQL 设计进行对比,比如 Mongo 与 Hbase 分类,简介)Many of the new systems are referred to as“NoSQL”data stores.The definition ofNoSQL,which stands for“Not Only SQL”or“Not Relational”,is not entirely agreedupon.For the purposes of this paper，NoSQL systems generally have six keyfeatures：NoSQL 等于 Not Only SQL,或者 Not Relational（弱关系型数据库，与 mysql 比较起来），NoSQL 的 systems 一般有 6 重要特征：1。the ability to horizontally scale“simple operation”throughput over many servers,通过简单操作在多服务器上水平扩展的能力2.the ability to replicate and to distribute(partition）data over many servers,复制和分发（分区）数据在多个服务器的能力3.a simple call level interface or protocol（in contrast to a SQL binding)，一种简单的调用级接口或协议(相比较于 SQL 绑定)4。a weaker concurrency（并发性,并行性）model than the ACID transactions ofmost relational(SQL）database systems,对比大多数关系数据库（SQL）数据库管理系统 ACID 事务,它是一种较弱的并发模型5。efficient use of distributed indexes and RAM for data storage，有效地利用分布式的索引和 RAM 的数据存储6。and the ability to dynamically add new attributes to data records.动态地在数据记录中添加新的属性The systems differ in other ways,and in this paper we contrast those differences.They range in functionality from the simplest distributed hashing，as supported bythe popular memcached open source cache，to highly scalable partitioned tables，assupported by Googles BigTable 1.In fact,BigTable,memcached,and AmazonsDynamo 2 provided a“proof of concept”that inspired many of the data stores wedescribe here：这些系统在其他方面也有不同，在本文中我们对比了这些差异。它们的范围从简单的分布式哈希算法，如流行的开源 memcached 缓存，到高度可扩展的已分区表，如谷歌的 BigTable 1。事实上，BigTable，memcached 和亚马逊的 Dynamo2 提供”概念证明”，催动了许多我们在这儿描述的数据存储:Memcached demonstrated(论证,证明）that in-memory indexes can be highlyscalable,distributing and replicating objects over multiple nodes.Memcached 表明内存中索引可以是高度可伸缩、分布式和在多个节点上复制对象。Dynamo pioneered the idea of eventual consistency as a way to achieve higheravailability and scalability:data fetched are not guaranteed to be upto-date,but updates are guaranteed to be propagated to all nodes eventually.Dynamo 的先驱想了一个 idea，以实现更高的可用性和可伸缩性的最终一致性,那就是：获取数据不能保证是最新的，但保证这个最新能最终传播到所有节点。BigTable demonstrated that persistent record storage could be scaled tothousands of nodes，a feat that most of the other systems aspire to.BigTable 表明，持续的记录存储可以缩放到数千个节点，是其他系统最向往的。A key feature of NoSQL systems is“shared nothing”horizontal scaling replicatingand partitioning data over many servers.This allows them to support a large numberof simple read/write operations per second.This simple operation load is traditionallycalled OLTP（online transaction processing),but it is also common in modern webapplicationsNoSQL 系统的一个核心特征是”无共享的水平扩展复制和数据分区在多台服务器。这使他们能够支持大量的每秒简单的读写操作。这个简单的操作负荷传统上称为 OLTP（联机事务处理），但这在 web 应用程序中很常见.The NoSQL systems described here generally do not provide ACID transactionalproperties:updates are eventually propagated，but there are limited guarantees on theconsistency of reads。Some authors suggest a“BASE”acronym in contrast to the“ACID”acronym：通常这里描述的 NoSQL 系统不提供事务的 ACID 属性：更新最终传播,但一致性的读取有有限的保证.对比 ACID 的缩写，有些作者建议”BASE”的首字母缩略词，意义如下：BASE=Basically Available,Soft state，Eventually consistent基本可用，软状态，最终一致ACID=Atomicity，Consistency，Isolation，and Durability原子性、一致性、隔离和耐久性The idea is that by giving up ACID constraints，one can achieve much higherperformance and scalability.这其中的想法是通过放弃 ACID 约束,可以实现多更高的性能和可扩展性.However，the systems differ in how much they give up.For example，most of thesystems call themselves“eventually consistent”，meaning that updates are eventuallypropagated to all nodes,but many of them provide mechanisms for some degree ofconsistency，such as multi-version concurrency control(MVCC）。然而，系统在他们放弃多少有所不同.例如,大部分的系统调用自己”最终一致性，意味着更新最终传播到所有节点，但其中许多人提供一定程度的一致性的机制，例如多版本并发控制(MVCC)Proponents(n。（某事业、理论等的)支持者,拥护者)of NoSQL often cite EricBrewers CAP theorem 4,which states that a system can have only two out of threeof the following properties：consistency,availability,and partition-tolerance。TheNoSQL systems generally give up consistency。However,the tradeoffs arecomplex,as we will see.NoSQL 的拥护者经常援引 Eric Brewer 帽定理 4,其中指出，一个系统可以有只有 2/3 的以下属性:一致性、可用性和分区容忍性。NoSQL 系统通常会放弃一致性.然而，权衡取舍是复杂的正如我们将看到New relational DBMSs have also been introduced to provide better horizontal scalingfor OLTP,when compared to traditional RDBMSs。After examining the NoSQLsystems，we will look at these SQL systems and compare the strengths of theapproaches.The SQL systems strive to provide horizontal scalability withoutabandoning SQL and ACID transactions。We will discuss the tradeoffs(权衡取舍)here.此外介绍了新的关系型Dbms提供更好水平扩展用于 OLTP，相比传统的Rdbms。在检查后的 NoSQL 系统,我们将看看这些 SQL 系统，然后比较优势。SQL 系统极力在不放弃 SQL 和 ACID 事务的前提下提供水平可伸缩性。我们将在这里讨论权衡取舍In this paper,we will refer to both the new SQL and NoSQL systems as data stores，since the term“database system”is widely used to refer to traditional DBMSs.However，we will still use the term“data base to refer to the stored data in thesesystems。All of the data stores have some administrative unit that you would call adatabase:data may be stored in one file，or in a directory,or via some othermechanism that defines the scope of data used by a group of applications。Eachdatabase is an island unto itself,even if the database is partitioned and distributedover multiple machines:there is no“federated database”concept in these systems（aswith some relational and objectoriented databases），allowing multipleseparately-administered databases to appear as one.Most of the systems allowhorizontal partitioning of data,storing records on different servers according to somekey；this is called“sharding”.Some of the systems also allow vertical partitioning，where parts of a single record are stored on different servers.在本文中,我们将新 SQL 和 NoSQL 系统称为数据存储,因为”数据库系统”一词被广泛用于指传统 DBMS。但是，我们仍将使用”数据库”一词指在这些系统中存储的数据引用.数据存储的都是一些数据库的（行政,管理）单位，：数据可能存储在一个文件中，或在目录中，或通过定义范围的数据使用的其他一些机制的一组应用程序。每个数据库是一座孤岛本身，即使数据库分区并且分布在多台机器：在这些系统中有没有联邦的数据库”概念(如一些关系数据库和面向对象数据库)，允许多个单独管理的数据库，显示为一个(Yol 注:也就是不允许多个单独的显示为一个）.大多数系统允许根据一些键，进行水平分区存储数据，记录在不同的服务器，;这就被所谓”切分”。一些系统还允许进行垂直分区，单个记录的分成部分，分布存储在不同服务器上。1 1。1 Scope of this Paper1 Scope of this Paper此文献讨论范围Before proceeding，some clarification is needed in de fining“horizontal scalability”and“simple operations。These define the focus of this paper。在开始之前，在定义”横向扩展”和”操作简单”需要一些澄清。这些定义本文的重点。By“simple operations”,we refer to key lookups，reads and writes of one record or asmall number of records。This is in contrast to complex queries or joins，readmostly access,or other application loads.With the advent of the web,especially Web2.0 sites where millions of users may both read and write data，scalability for simpledatabase operations has become more important.For example,applications maysearch and update multi-server databases of electronic mail,personal profiles，webpostings,wikis，customer records,online dating records，classified ads，and manyother kinds of data。These all generally fit the definition of“simple operation”applications:reading or writing a small number of related records in each operation。“简单的操作,”指：我们是指关键的查找、读取和写入一条记录或记录的小数目。这是与复杂的查询或联接（joins)，只读主要访问，或其他应用程序加载相对比的。随着互联网的出现，特别是 Web 2。0 网站在那里数以百万计的用户可同时读取和写入数据,简单的数据库操作的可扩展性已变得更为重要。例如，应用程序可以搜索和更新多个服务器数据库上的电子邮件、个人配置文件、网络帖子、wiki、客户记录、在线约会记录，分类广告和许多其他类型的数据。这些一般都符合定义的应用程序”操作简单”:即读取或写入每个操作中的相关记录的小数目.The term“horizontal scalability means the ability to distribute both the data and theload of these simple operations over many servers，with no RAM or disk sharedamong the servers.Horizontal scaling differs from“vertical”scaling,where adatabase system utilizes（利用）many cores and/or CPUs that share RAM and disks.Some of the systems we describe provide both vertical and horizontal scalability,andthe effective use of multiple cores is important，but our main focus is on horizontalscalability，because the number of cores that can share memory is limited,andhorizontal scaling generally proves less expensive，using commodity（商品）servers。Note that horizontal and vertical partitioning are not related to horizontaland vertical scaling,except that they are both useful for horizontal scaling。“横向扩展”,（Yol 注:英文中 horizontal scalability 可以说成横向扩展，水平扩展，与纵向扩展，垂直扩展相对应）是指在多个服务器，进行数据分布式和简单操作的负载，这些服务器之间没有 RAM 共享或磁盘共享.水平扩展，有别于”垂直”扩展,垂直扩展是一个数据库系统利用多核和/或共享 RAM 和磁盘的 Cpu。一些我们所描述的系统同时提供纵向和横向的可扩展性，当然多个内核的有效利用是重要的,但我们的主要焦点是水平可伸缩性,因为可以共享内存的内核的数量是有限的，水平缩放一般提供便宜,商用的服务器。请注意，水平和垂直分区与水平和垂直扩展无关的，虽然他们都有益于水平扩展。1 1。2 Systems Beyond our Scope2 Systems Beyond our Scope超过我们范围的系统Some authors have used a broad definition of NoSQL,including any database systemthat is not relational。Specifically，they include:一些作者已经使用是广义定义的 NoSQL，包括任何不是关系型的如:Graph database systems：Neo4j and OrientDB provide efficient distributed storageand queries of a graph of nodes with references among them.图形数据库系统:Neo4j 和 OrientDB 提供了高效的分布式的存储和在相互引用的节点中查询。Object-oriented database systems：Objectoriented DBMSs（e.g.,Versant）alsoprovide efficient distributed storage of a graph of objects,and materialize theseobjects as programming language objects。面向对象数据库系统:面向对象的数据库管理系统(例如,Versant）也提供对象的高效的分布式的图存储，实现这些对象作为编程语言对象 Distributed objectoriented stores：Very similar to objectoriented DBMSs,systems such as GemFire distribute object graphs in-memory on multiple servers。分布式面向对象存储：非常类似于面向对象的数据库管理系统，像GemFire，在多个服务器内存上进行分布式对象的图形存储These systems are a good choice for applications that must do fast and extensivereferencefollowing（索引跟踪）,especially where data fits in memory。Programming language integration is also valuable.Unlike the NoSQL systems,thesesystems generally provide ACID transactions。Many of them provide horizontalscaling for reference-following and distributed query decomposition,as well。Due tospace limitations,however，we have omitted these systems from our comparisons.The applications and the necessary optimizations for scaling for these systems differfrom the systems we cover here,where key lookups and simple operationspredominate over reference-following and complex object behavior.It is possiblethese systems can scale on simple operations as well,but that is a topic for a futurepaper，and proof through benchmarks。对于那些应用程序是必须 do fast 和索引跟踪的需求，尤其是应用数据在内存中的情况，这些系统是一个不错的选择。编程语言集成也是有价值的（？这句没懂)。不像 NoSQL 系统，这些系统一般提供 ACID 事务。其中许多为提供索引跟踪和分布式查询分解,提供水平扩展.然而，由于篇幅的限制,我们省略了这些系统间的比较。应用程序和为这些系统的必要优化不是我们在这里要讨论的，我们重点是关键查询和操作简单而不是索引跟踪和复杂的对象行为。它是可能这些系统可以通过简单的操作进行扩展,但那是未来的文献再讨论并通过一些原则再证明的了。Data warehousing database systems provide horizontal scaling,but are also beyondthe scope of this paper。Data warehousing applications are different in importantways：数据仓库数据库系统提供水平扩展,但也超出了本文的范围。数据仓库应用程序是不同的重要途径（本小节以下略）They perform complex queries that collect and join information from manydifferent tables。The ratio of reads to writes is high:that is，the database is readonly orread-mostly.Thereare existing systems for data warehousing that scale wellhorizontally.Because the data is infrequently updated，it is possible toorganize or replicate the database in ways that make scaling possible.1.3 Data Model Terminology1.3 Data Model Terminology数据模型术语Unlike relational(SQL）DBMSs，the terminology(术语)used by NoSQL datastores is often inconsistent.For the purposes of this paper，we need a consistent wayto compare the data models and functionality。不像关系型数据库系统，NoSQL 数据存储的术语往往是不一致的.对于本文而言，我们需要以一致的方式进行比较的数据模型和功能All of the systems described here provide a way to store scalar values，like numbersand strings,as well as BLOBs。Some of them also provide a way to store morecomplex nested or reference values.The systems all store sets of attributevaluepairs,but use different data structures,specifically：所有这里描述的系统提供一种标量值，如数字、字符串，如 Blob 存储方式。其中有些还提供存储更复杂的嵌套或参考值的方法.系统所有存储组属性-值对，但使用了不同的数据结构，具体为:A“tuple”is a row in a relational table，where attribute names arepredefined in a schema,and the values must be scalar.The valuesare referenced by attribute name，as opposed to an array or list，where they are referenced by ordinal position.“元组”是一个关系表中的一行，在这里面，属性名称在 schema 预定义，值必须是标量.由属性名称做值的索引，而不像数组或列表中，值由它们的序号位置做索引。A“document”allows values to be nested documents or lists as well asscalar values,and the attribute names are dynamically defined for eachdocument at runtime。A document differs from a tuple in that theattributes are not defined in a global schema，and this wider range ofvalues are permitted.“文档允许将嵌套的文档或列表值作为标量值，并为每个文件在运行时动态定义的属性名称.文档不同于一个元组，它不是在全局schema 中定义的，它允许更宽范围的值。An“extensible record”is a hybrid between a tuple and a document，where families of attributes are defined in a schema，but newattributes can be added(within an attribute family)on a per-recordbasis。Attributes may be list-valued。“可扩展记录”（列存储)是元组和文档的混合，家族 families 的属性定义在 schema 中，但新的属性可以每个记录的基础上增加(属性属于这个属性家族）。属性可以是列表值。An“object is analogous to an object in programming languages,butwithout the procedural methods.Values may be references or nestedobjects。“对象”是类似于编程语言的对象，但不需要程序。值可以是引用或嵌套的对象。1 1。4 Data Store Categories4 Data Store Categories数据存储类别In this paper,the data stores are grouped according to their data model:策略根据他们的数据模型 Key-value Stores:These systems store values and an index to find them,based on aprogrammer-defined key.KV 存储，这类系统存储值和一个能找到这些值的索引，索引是由编程定义的 key决定的。Document Stores：These systems store documents,as just defined。The documentsare indexed and a simple query mechanism is provided.文档型:这个系统存储文档，如刚才定义的文件.文档编制索引，并提供了一个简单的查询机制 Extensible Record Stores:These systems store extensible records that can bepartitioned vertically and horizontally across nodes。Some papers call these“widecolumn stores。列存储:这些系统存储可扩展记录存储，可以跨节点被分成垂直和水平方向。一些文献称这些”宽（大）列存储”Relational Databases：These systems store（and index and query）tuples。Thenew RDBMSs that provide horizontal scaling are covered in this paper。关系数据库:这些系统存储(索引和查询)元组。提供水平扩展，本文中涉及了这部分。Data stores in these four categories are covered in the next four sections，respectively.We will then summarize and compare the systems.以上四个类别的数据存储都分别在接下来的四部分中。我们将总结并比较这些系统

注意事项

本文（数据存储方案）为本站会员（无***）主动上传，装配图网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知装配图网（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。