数据存储方案

上传人：无*** 文档编号：194705401 上传时间：2023-03-13 格式：PDF 页数：9 大小：600.42KB

收藏版权申诉举报下载

第1页 / 共9页

第2页 / 共9页

第3页 / 共9页

下载文档到电脑，查找使用更方便

10 积分

下载资源

资源描述：

《数据存储方案》由会员分享，可在线阅读，更多相关《数据存储方案（9页珍藏版）》请在装配图网上搜索。

1、引言引言文献是由 Rick Cattell 撰写的论文，论文讨论了可扩展的结构化数据的、非结构化的(包括基于键值对的、基于文档的和面向列的）数据存储方案(注:NOSQL是支撑大数据应用的关键所在。事实上，将 NOSQL 翻译为“非结构化不甚准确，因为 NOSQL 更为常见的解释是：Not Only SQL（不仅仅是结构化）,换句话说,NOSQL 并不是站在结构化 SQL 的对立面，而是既可包括结构化数据，也可包括非结构化数据)。论文信息论文信息Scalable SQL and NoSQL Data StoresRick Cattell Originally published in 2010,

2、last revised December 2011摘要摘要ABSTRACTIn this paper，we examine a number of SQL and so called“NoSQL”data storesdesigned to scale simple OLTPstyle application loads over many servers。Originally motivated by Web 2.0 applications，these systems are designed to scale tothousands or millions of users doing

3、 updates as well as reads,in contrast to traditionalDBMSs and data warehouses。We contrast the new systems on their data model,consistency mechanisms,storagemechanisms，durability guarantees,availability，query support，and otherdimensions.These systems typically sacrifice some of these dimensions，e。g.d

4、atabase-wide transaction consistency，in order to achieve others,e。g.higheravailability and scalability.在这篇文献中，我们验证了许多 SQL 和所谓的NoSQL数据存储（它设计于支持简单的 OLTP 风格的应用,能够用于扩展在很多服务器上)它最先由 Web 2。0 应用引起,与传统的数据库管理系统和数据仓库对比，这些系统设计为可扩展到数以千计或数以百万计的用户做更新,同时读取。我们对比了新系统上的数据模型，一致性机制,存储机制，持久性保证，可用性，支持的查询以及其它属性，这些系统典型的牺牲（为

5、了实现其它属性而去掉）了一些属性。如数据库常有的事务一致性,牺牲了这个是为了其它的属性,如高可用，可扩展。Note:Bibliographic references for systems are not listed,but URLs for moreinformation can be found in the System References table at the end of this paper.注：参考书没列出来（翻译省)Caveat:Statements in this paper are based on sources and documentation that ma

6、ynot be reliable，and the systems described are“moving targets，”so some statementsmay be incorrect。Verify through other sources before depending on informationhere.Nevertheless，we hope this comprehensive survey is useful！Check for futurecorrections on the authors web site cattell。net/datastores。警告：一些

7、提及的书可能不可用。尽管如此,我们还是希望这篇综合的文献对大家有帮助，我们网站：cattell。net/datastores.Disclosure:The author is on the technical advisory board of Schooner Technologiesand has a consulting business advising on scalable databases.透漏：作者是可扩展数据库商业顾问.1.OVERVIEW1.OVERVIEWIn recent years a number of new systems have been designed

8、 to provide goodhorizontal scalability for simple read/write database operations distributed over manyservers。In contrast,traditional database products have comparatively little or noability to scale horizontally on these applications。This paper examines and comparesthe various new systems。近年，很多系统的设

9、计提供良好水平扩展，支持在多服务器上分布式读写。相比较传统的系统,一般为无扩展，规模小。本篇文献研究与对比很多不同的新系统(Yol 注,其实就是各种 NOSQL 设计进行对比,比如 Mongo 与 Hbase 分类,简介)Many of the new systems are referred to as“NoSQL”data stores.The definition ofNoSQL,which stands for“Not Only SQL”or“Not Relational”,is not entirely agreedupon.For the purposes of this pape

10、r，NoSQL systems generally have six keyfeatures：NoSQL 等于 Not Only SQL,或者 Not Relational（弱关系型数据库，与 mysql 比较起来），NoSQL 的 systems 一般有 6 重要特征：1。the ability to horizontally scale“simple operation”throughput over many servers,通过简单操作在多服务器上水平扩展的能力2.the ability to replicate and to distribute(partition）data ove

11、r many servers,复制和分发（分区）数据在多个服务器的能力3.a simple call level interface or protocol（in contrast to a SQL binding)，一种简单的调用级接口或协议(相比较于 SQL 绑定)4。a weaker concurrency（并发性,并行性）model than the ACID transactions ofmost relational(SQL）database systems,对比大多数关系数据库（SQL）数据库管理系统 ACID 事务,它是一种较弱的并发模型5。efficient use of d

12、istributed indexes and RAM for data storage，有效地利用分布式的索引和 RAM 的数据存储6。and the ability to dynamically add new attributes to data records.动态地在数据记录中添加新的属性The systems differ in other ways,and in this paper we contrast those differences.They range in functionality from the simplest distributed hashing，as s

13、upported bythe popular memcached open source cache，to highly scalable partitioned tables，assupported by Googles BigTable 1.In fact,BigTable,memcached,and AmazonsDynamo 2 provided a“proof of concept”that inspired many of the data stores wedescribe here：这些系统在其他方面也有不同，在本文中我们对比了这些差异。它们的范围从简单的分布式哈希算法，如流行

14、的开源 memcached 缓存，到高度可扩展的已分区表，如谷歌的 BigTable 1。事实上，BigTable，memcached 和亚马逊的 Dynamo2 提供”概念证明”，催动了许多我们在这儿描述的数据存储:Memcached demonstrated(论证,证明）that in-memory indexes can be highlyscalable,distributing and replicating objects over multiple nodes.Memcached 表明内存中索引可以是高度可伸缩、分布式和在多个节点上复制对象。Dynamo pioneered th

15、e idea of eventual consistency as a way to achieve higheravailability and scalability:data fetched are not guaranteed to be upto-date,but updates are guaranteed to be propagated to all nodes eventually.Dynamo 的先驱想了一个 idea，以实现更高的可用性和可伸缩性的最终一致性,那就是：获取数据不能保证是最新的，但保证这个最新能最终传播到所有节点。BigTable demonstrated

16、that persistent record storage could be scaled tothousands of nodes，a feat that most of the other systems aspire to.BigTable 表明，持续的记录存储可以缩放到数千个节点，是其他系统最向往的。A key feature of NoSQL systems is“shared nothing”horizontal scaling replicatingand partitioning data over many servers.This allows them to suppo

17、rt a large numberof simple read/write operations per second.This simple operation load is traditionallycalled OLTP（online transaction processing),but it is also common in modern webapplicationsNoSQL 系统的一个核心特征是”无共享的水平扩展复制和数据分区在多台服务器。这使他们能够支持大量的每秒简单的读写操作。这个简单的操作负荷传统上称为 OLTP（联机事务处理），但这在 web 应用程序中很常见.T

18、he NoSQL systems described here generally do not provide ACID transactionalproperties:updates are eventually propagated，but there are limited guarantees on theconsistency of reads。Some authors suggest a“BASE”acronym in contrast to the“ACID”acronym：通常这里描述的 NoSQL 系统不提供事务的 ACID 属性：更新最终传播,但一致性的读取有有限的保证.

19、对比 ACID 的缩写，有些作者建议”BASE”的首字母缩略词，意义如下：BASE=Basically Available,Soft state，Eventually consistent基本可用，软状态，最终一致ACID=Atomicity，Consistency，Isolation，and Durability原子性、一致性、隔离和耐久性The idea is that by giving up ACID constraints，one can achieve much higherperformance and scalability.这其中的想法是通过放弃 ACID 约束,可以实现多更

20、高的性能和可扩展性.However，the systems differ in how much they give up.For example，most of thesystems call themselves“eventually consistent”，meaning that updates are eventuallypropagated to all nodes,but many of them provide mechanisms for some degree ofconsistency，such as multi-version concurrency control(M

21、VCC）。然而，系统在他们放弃多少有所不同.例如,大部分的系统调用自己”最终一致性，意味着更新最终传播到所有节点，但其中许多人提供一定程度的一致性的机制，例如多版本并发控制(MVCC)Proponents(n。（某事业、理论等的)支持者,拥护者)of NoSQL often cite EricBrewers CAP theorem 4,which states that a system can have only two out of threeof the following properties：consistency,availability,and partition-toleran

22、ce。TheNoSQL systems generally give up consistency。However,the tradeoffs arecomplex,as we will see.NoSQL 的拥护者经常援引 Eric Brewer 帽定理 4,其中指出，一个系统可以有只有 2/3 的以下属性:一致性、可用性和分区容忍性。NoSQL 系统通常会放弃一致性.然而，权衡取舍是复杂的正如我们将看到New relational DBMSs have also been introduced to provide better horizontal scalingfor OLTP,whe

23、n compared to traditional RDBMSs。After examining the NoSQLsystems，we will look at these SQL systems and compare the strengths of theapproaches.The SQL systems strive to provide horizontal scalability withoutabandoning SQL and ACID transactions。We will discuss the tradeoffs(权衡取舍)here.此外介绍了新的关系型Dbms提供

24、更好水平扩展用于 OLTP，相比传统的Rdbms。在检查后的 NoSQL 系统,我们将看看这些 SQL 系统，然后比较优势。SQL 系统极力在不放弃 SQL 和 ACID 事务的前提下提供水平可伸缩性。我们将在这里讨论权衡取舍In this paper,we will refer to both the new SQL and NoSQL systems as data stores，since the term“database system”is widely used to refer to traditional DBMSs.However，we will still use the

25、term“data base to refer to the stored data in thesesystems。All of the data stores have some administrative unit that you would call adatabase:data may be stored in one file，or in a directory,or via some othermechanism that defines the scope of data used by a group of applications。Eachdatabase is an

26、island unto itself,even if the database is partitioned and distributedover multiple machines:there is no“federated database”concept in these systems（aswith some relational and objectoriented databases），allowing multipleseparately-administered databases to appear as one.Most of the systems allowhoriz

27、ontal partitioning of data,storing records on different servers according to somekey；this is called“sharding”.Some of the systems also allow vertical partitioning，where parts of a single record are stored on different servers.在本文中,我们将新 SQL 和 NoSQL 系统称为数据存储,因为”数据库系统”一词被广泛用于指传统 DBMS。但是，我们仍将使用”数据库”一词指在

28、这些系统中存储的数据引用.数据存储的都是一些数据库的（行政,管理）单位，：数据可能存储在一个文件中，或在目录中，或通过定义范围的数据使用的其他一些机制的一组应用程序。每个数据库是一座孤岛本身，即使数据库分区并且分布在多台机器：在这些系统中有没有联邦的数据库”概念(如一些关系数据库和面向对象数据库)，允许多个单独管理的数据库，显示为一个(Yol 注:也就是不允许多个单独的显示为一个）.大多数系统允许根据一些键，进行水平分区存储数据，记录在不同的服务器，;这就被所谓”切分”。一些系统还允许进行垂直分区，单个记录的分成部分，分布存储在不同服务器上。1 1。1 Scope of this Paper1

29、 Scope of this Paper此文献讨论范围Before proceeding，some clarification is needed in de fining“horizontal scalability”and“simple operations。These define the focus of this paper。在开始之前，在定义”横向扩展”和”操作简单”需要一些澄清。这些定义本文的重点。By“simple operations”,we refer to key lookups，reads and writes of one record or asmall numbe

30、r of records。This is in contrast to complex queries or joins，readmostly access,or other application loads.With the advent of the web,especially Web2.0 sites where millions of users may both read and write data，scalability for simpledatabase operations has become more important.For example,applicatio

31、ns maysearch and update multi-server databases of electronic mail,personal profiles，webpostings,wikis，customer records,online dating records，classified ads，and manyother kinds of data。These all generally fit the definition of“simple operation”applications:reading or writing a small number of related

32、 records in each operation。“简单的操作,”指：我们是指关键的查找、读取和写入一条记录或记录的小数目。这是与复杂的查询或联接（joins)，只读主要访问，或其他应用程序加载相对比的。随着互联网的出现，特别是 Web 2。0 网站在那里数以百万计的用户可同时读取和写入数据,简单的数据库操作的可扩展性已变得更为重要。例如，应用程序可以搜索和更新多个服务器数据库上的电子邮件、个人配置文件、网络帖子、wiki、客户记录、在线约会记录，分类广告和许多其他类型的数据。这些一般都符合定义的应用程序”操作简单”:即读取或写入每个操作中的相关记录的小数目.The term“horizo

33、ntal scalability means the ability to distribute both the data and theload of these simple operations over many servers，with no RAM or disk sharedamong the servers.Horizontal scaling differs from“vertical”scaling,where adatabase system utilizes（利用）many cores and/or CPUs that share RAM and disks.Some

34、 of the systems we describe provide both vertical and horizontal scalability,andthe effective use of multiple cores is important，but our main focus is on horizontalscalability，because the number of cores that can share memory is limited,andhorizontal scaling generally proves less expensive，using com

35、modity（商品）servers。Note that horizontal and vertical partitioning are not related to horizontaland vertical scaling,except that they are both useful for horizontal scaling。“横向扩展”,（Yol 注:英文中 horizontal scalability 可以说成横向扩展，水平扩展，与纵向扩展，垂直扩展相对应）是指在多个服务器，进行数据分布式和简单操作的负载，这些服务器之间没有 RAM 共享或磁盘共享.水平扩展，有别于”垂直”扩

36、展,垂直扩展是一个数据库系统利用多核和/或共享 RAM 和磁盘的 Cpu。一些我们所描述的系统同时提供纵向和横向的可扩展性，当然多个内核的有效利用是重要的,但我们的主要焦点是水平可伸缩性,因为可以共享内存的内核的数量是有限的，水平缩放一般提供便宜,商用的服务器。请注意，水平和垂直分区与水平和垂直扩展无关的，虽然他们都有益于水平扩展。1 1。2 Systems Beyond our Scope2 Systems Beyond our Scope超过我们范围的系统Some authors have used a broad definition of NoSQL,including any dat

37、abase systemthat is not relational。Specifically，they include:一些作者已经使用是广义定义的 NoSQL，包括任何不是关系型的如:Graph database systems：Neo4j and OrientDB provide efficient distributed storageand queries of a graph of nodes with references among them.图形数据库系统:Neo4j 和 OrientDB 提供了高效的分布式的存储和在相互引用的节点中查询。Object-oriented da

38、tabase systems：Objectoriented DBMSs（e.g.,Versant）alsoprovide efficient distributed storage of a graph of objects,and materialize theseobjects as programming language objects。面向对象数据库系统:面向对象的数据库管理系统(例如,Versant）也提供对象的高效的分布式的图存储，实现这些对象作为编程语言对象 Distributed objectoriented stores：Very similar to objectorie

39、nted DBMSs,systems such as GemFire distribute object graphs in-memory on multiple servers。分布式面向对象存储：非常类似于面向对象的数据库管理系统，像GemFire，在多个服务器内存上进行分布式对象的图形存储These systems are a good choice for applications that must do fast and extensivereferencefollowing（索引跟踪）,especially where data fits in memory。Programmin

40、g language integration is also valuable.Unlike the NoSQL systems,thesesystems generally provide ACID transactions。Many of them provide horizontalscaling for reference-following and distributed query decomposition,as well。Due tospace limitations,however，we have omitted these systems from our comparis

41、ons.The applications and the necessary optimizations for scaling for these systems differfrom the systems we cover here,where key lookups and simple operationspredominate over reference-following and complex object behavior.It is possiblethese systems can scale on simple operations as well,but that

42、is a topic for a futurepaper，and proof through benchmarks。对于那些应用程序是必须 do fast 和索引跟踪的需求，尤其是应用数据在内存中的情况，这些系统是一个不错的选择。编程语言集成也是有价值的（？这句没懂)。不像 NoSQL 系统，这些系统一般提供 ACID 事务。其中许多为提供索引跟踪和分布式查询分解,提供水平扩展.然而，由于篇幅的限制,我们省略了这些系统间的比较。应用程序和为这些系统的必要优化不是我们在这里要讨论的，我们重点是关键查询和操作简单而不是索引跟踪和复杂的对象行为。它是可能这些系统可以通过简单的操作进行扩展,但那是未来

43、的文献再讨论并通过一些原则再证明的了。Data warehousing database systems provide horizontal scaling,but are also beyondthe scope of this paper。Data warehousing applications are different in importantways：数据仓库数据库系统提供水平扩展,但也超出了本文的范围。数据仓库应用程序是不同的重要途径（本小节以下略）They perform complex queries that collect and join information fr

44、om manydifferent tables。The ratio of reads to writes is high:that is，the database is readonly orread-mostly.Thereare existing systems for data warehousing that scale wellhorizontally.Because the data is infrequently updated，it is possible toorganize or replicate the database in ways that make scalin

45、g possible.1.3 Data Model Terminology1.3 Data Model Terminology数据模型术语Unlike relational(SQL）DBMSs，the terminology(术语)used by NoSQL datastores is often inconsistent.For the purposes of this paper，we need a consistent wayto compare the data models and functionality。不像关系型数据库系统，NoSQL 数据存储的术语往往是不一致的.对于本文而

46、言，我们需要以一致的方式进行比较的数据模型和功能All of the systems described here provide a way to store scalar values，like numbersand strings,as well as BLOBs。Some of them also provide a way to store morecomplex nested or reference values.The systems all store sets of attributevaluepairs,but use different data structures,

47、specifically：所有这里描述的系统提供一种标量值，如数字、字符串，如 Blob 存储方式。其中有些还提供存储更复杂的嵌套或参考值的方法.系统所有存储组属性-值对，但使用了不同的数据结构，具体为:A“tuple”is a row in a relational table，where attribute names arepredefined in a schema,and the values must be scalar.The valuesare referenced by attribute name，as opposed to an array or list，where t

48、hey are referenced by ordinal position.“元组”是一个关系表中的一行，在这里面，属性名称在 schema 预定义，值必须是标量.由属性名称做值的索引，而不像数组或列表中，值由它们的序号位置做索引。A“document”allows values to be nested documents or lists as well asscalar values,and the attribute names are dynamically defined for eachdocument at runtime。A document differs from a

49、tuple in that theattributes are not defined in a global schema，and this wider range ofvalues are permitted.“文档允许将嵌套的文档或列表值作为标量值，并为每个文件在运行时动态定义的属性名称.文档不同于一个元组，它不是在全局schema 中定义的，它允许更宽范围的值。An“extensible record”is a hybrid between a tuple and a document，where families of attributes are defined in a sche

50、ma，but newattributes can be added(within an attribute family)on a per-recordbasis。Attributes may be list-valued。“可扩展记录”（列存储)是元组和文档的混合，家族 families 的属性定义在 schema 中，但新的属性可以每个记录的基础上增加(属性属于这个属性家族）。属性可以是列表值。An“object is analogous to an object in programming languages,butwithout the procedural methods.Valu

51、es may be references or nestedobjects。“对象”是类似于编程语言的对象，但不需要程序。值可以是引用或嵌套的对象。1 1。4 Data Store Categories4 Data Store Categories数据存储类别In this paper,the data stores are grouped according to their data model:策略根据他们的数据模型 Key-value Stores:These systems store values and an index to find them,based on aprogra

52、mmer-defined key.KV 存储，这类系统存储值和一个能找到这些值的索引，索引是由编程定义的 key决定的。Document Stores：These systems store documents,as just defined。The documentsare indexed and a simple query mechanism is provided.文档型:这个系统存储文档，如刚才定义的文件.文档编制索引，并提供了一个简单的查询机制 Extensible Record Stores:These systems store extensible records that

53、can bepartitioned vertically and horizontally across nodes。Some papers call these“widecolumn stores。列存储:这些系统存储可扩展记录存储，可以跨节点被分成垂直和水平方向。一些文献称这些”宽（大）列存储”Relational Databases：These systems store（and index and query）tuples。Thenew RDBMSs that provide horizontal scaling are covered in this paper。关系数据库:这些系统存储(索引和查询)元组。提供水平扩展，本文中涉及了这部分。Data stores in these four categories are covered in the next four sections，respectively.We will then summarize and compare the systems.以上四个类别的数据存储都分别在接下来的四部分中。我们将总结并比较这些系统

展开阅读全文

温馨提示:
1: 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2: 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3.本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 装配图网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

备案号:蜀ICP备2024067431号-1 川公网安备51140202000466号

本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知装配图网，我们立即给予删除！

数据存储方案

最新文档

相关资源

相关搜索