《高级体系结构》PPT课件

资源ID：22828620 资源大小：4.63MB 全文页数：167页
资源格式： PPT 下载积分：14.9积分

快捷下载

会员登录下载

微信登录下载

三方登录下载：

微信扫一扫登录

下载资源需要14.9积分

邮箱/手机：
温馨提示：	用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

网站客服

侵权投诉

《高级体系结构》PPT课件

1Advanced Computer ArchitectureThe School of Information Science and Engineering 高性能计算机体系结构乔百友（ 83681250)东北大学信息学院计算机系统研究所 2Advanced Computer ArchitectureThe School of Information Science and Engineering 高性能计算机体系结构参考教材 Computer Architecture: A Quantitative Approach, Hennessy and Patterson, 机械工业高等计算机系统结构并行性可扩展性可编程性清华大学出版社 Parallel Computer Architecture -A Hardware/software Approach 机械工业计算机系统结构张晨曦等，高等教育出版社， 2008 并行计算机系统结构与可扩展计算，古志民、孙贤和清华大学出版社 2009 并行计算机体系结构，陈国良等著，高等教育出版社，2002 3Advanced Computer ArchitectureThe School of Information Science and Engineering 主要内容 1.高性能计算与高性能计算机 2.指令流水技术基础 (复习 ) 3.指令级并行性 4.指令的动态调度与分支预测 5.互连网络：拓扑结构，寻径技术， 6.并行处理基础：模型，性能，自动并行化 7.共享存储器多处理机： cache一致性，同步 8.大规模并行处理：主动消息，多线程 4Advanced Computer ArchitectureThe School of Information Science and Engineering4 高性能计算与高性能计算机 5Advanced Computer ArchitectureThe School of Information Science and Engineering国家高性能计算中心（合肥） 52021-4-21 1. 高性能计算的意义 6Advanced Computer ArchitectureThe School of Information Science and Engineering6 1. 高性能计算的意义（ 1） 7Advanced Computer ArchitectureThe School of Information Science and Engineering 1. 高性能计算的意义（ 2）高性能计算HPC（ High Performance Compute）高性能计算 -并行计算并行计算 (Parallel Computing）高端计算 (High-end Parallel Computing)高性能计算 (High Performance Computing)超级计算 (Super Computing) 8Advanced Computer ArchitectureThe School of Information Science and Engineering8 1. 高性能计算的意义（ 3） 9Advanced Computer ArchitectureThe School of Information Science and EngineeringNortheastern University Sep. 6, 2012 MossoGoogle App EngineRails One SalesforceGmailGliffyJoyentAmazone Web SvcsNirvanixXcalibreAkamai PaaS SaaSIaaS Cloud Computing 云计算是并行计算 (Parallel Computing)、分布式计算(Distributed Computing)和网格计算 (Grid Computing)的发展，或者说是这些计算机科学概念的商业实现云计算是虚拟化 (Virtualization)、效用计算 (Utility Computing)、IaaS(基础设施即服务 )、PaaS(平台即服务 )、 SaaS(软件即服务 )等概念混合演进并跃升的结果 10Advanced Computer ArchitectureThe School of Information Science and Engineering10 1. 高性能计算的意义（ 4） 11Advanced Computer ArchitectureThe School of Information Science and Engineering11 1. 高性能计算的意义（ 5） 12Advanced Computer ArchitectureThe School of Information Science and Engineering12 1. 高性能计算的意义（ 6） 13Advanced Computer ArchitectureThe School of Information Science and Engineering 天气预报1990年 10次台风登陆，福建、浙江两省损失79亿元，死亡 950余人。天气预报模式为非线性偏微分方程，预报台风暴雨过程，计算量为 10141016次浮点运算，需要 10GFlops100GFlops的巨型机。用途：局部灾害性天气预报。 14Advanced Computer ArchitectureThe School of Information Science and Engineering 石油工业地震勘探资料处理油藏数值模拟测井资料处理地震勘探由数据采集、数据处理和资料解释三阶段组成。目前采用的三维地震勘探比较精确的反映地下情况，但数据量大，处理周期长。100平方公里的三维勘探面积，道距 25米， 60次覆盖， 6秒长记录， 2毫秒采样，一共采集2.8810 10个数据，约为 116GB。 15Advanced Computer ArchitectureThe School of Information Science and Engineering 叠加后数据为 4.8108个数据。用二维叠加深度偏移方法精确的产生地下深度图像，需要进行251012FLOP，采用 100MFLOPs机器计算 250天，1GFLOPs机计算 25天， 10GFLOPs机器 35分。考虑到机器持续速度常常是峰值速度的 10-30%，所以需要 100GFlops的机器。 Cray T932/32约为60GFLOPs。 16Advanced Computer ArchitectureThe School of Information Science and Engineering 航空航天研究三维翼型对飞机性能的影响。数值模拟用时间相关法解 Navier-Stoker方程，网格分点为 1204050，需内存 160MB， 6亿计算机上解12小时，如果在数分钟内完成设计，则需要千亿次计算机。 17Advanced Computer ArchitectureThe School of Information Science and Engineering 核武器核爆炸数值模拟，推断出不同结构与不同条件下核装置的能量释放效应。压力：几百万大气压温度：几千万摄氏度能量在秒级内释放出来。设计一个核武器型号，从模型规律、调整各种参数到优选，需计算成百上千次核试验。LosAlamos实验室要求计算一个模型的上限为8-10小时。 18Advanced Computer ArchitectureThe School of Information Science and Engineering 千万次机上算椭球程序的计算模型需要 40-60CPU小时。二维计算，每方向上网格点数取 100，二维计算是一维的 200倍，三维是一维的 33000倍。若每维设 1000网格点，则三维计算是一维的几十万倍之多。此时对主存储器容量要数十、数百亿字单元（ 64位）。另外还有 I/O能力的要求，可视化图形输出计算空气动力学：千亿次 /秒（ 1011）图像处理：百亿次 /秒（ 10 10）AI：万亿次 /秒（ 1012） 19Advanced Computer ArchitectureThe School of Information Science and Engineering 20Advanced Computer ArchitectureThe School of Information Science and Engineering 21Advanced Computer ArchitectureThe School of Information Science and Engineering 22Advanced Computer ArchitectureThe School of Information Science and Engineering 23Advanced Computer ArchitectureThe School of Information Science and Engineering23 1. 高性能计算的意义（ 7） 24Advanced Computer ArchitectureThe School of Information Science and Engineering24 1. 高性能计算的意义（ 8） 25Advanced Computer ArchitectureThe School of Information Science and Engineering25 1. 高性能计算的意义（ 9） 26Advanced Computer ArchitectureThe School of Information Science and Engineering26 1. 高性能计算的意义（ 10） 27Advanced Computer ArchitectureThe School of Information Science and Engineering272021-4-21 1. 高性能计算的意义（ 11） 28Advanced Computer ArchitectureThe School of Information Science and Engineering28 1. 高性能计算的意义（ 12） 29Advanced Computer ArchitectureThe School of Information Science and Engineering29 1. 高性能计算的意义（ 13） 30Advanced Computer ArchitectureThe School of Information Science and Engineering30 1. 高性能计算的意义（ 14） 31Advanced Computer ArchitectureThe School of Information Science and Engineering31 1. 高性能计算的意义（ 15） 32Advanced Computer ArchitectureThe School of Information Science and Engineering32 1. 高性能计算的意义（ 16） 33Advanced Computer ArchitectureThe School of Information Science and Engineering33 1. 高性能计算的意义（ 17） 34Advanced Computer ArchitectureThe School of Information Science and Engineering 高性能计算的战略地位（中国）国家大力发展高性能计算军事：银河、神威等民用：曙光、联想等高性能计算已应用到国民经济的多个行业石油、气象、军事、科研等国产高性能计算机曾进入 TOP500前 10位，总数也大幅增加安装的计算机总数在增长（ 28台） 35Advanced Computer ArchitectureThe School of Information Science and Engineering 我国超级计算机发展年谱型号面世时间每秒运算速度（峰值）银河 1983年 1亿次曙光一号 1992年 6.4亿次银河 1994年 10亿次银河 1997年 130亿次神威 1999年 3840亿次深腾 1800 2002年 1万亿次曙光 4000A 2004年 11万亿次神威 3000A 2007年 18万亿次深腾 7000 2008年 106.5万亿次曙光 5000A 2008年 230万亿次天河一号 2009年 1206万亿次 36Advanced Computer ArchitectureThe School of Information Science and Engineering36 1. 高性能计算的意义（ 18） 37Advanced Computer ArchitectureThe School of Information Science and Engineering37 1. 高性能计算的意义（ 19） 38Advanced Computer ArchitectureThe School of Information Science and Engineering 230万亿次的浮点运算6600枚巴塞罗那型四核处理器 (主频1.9GHz)。30720颗计算核心，122.88TB内存，磁盘 700TB采用低延迟的 20Gb的网络互联IBM“Roadrunner走鹃 ”TOP500中第一 39Advanced Computer ArchitectureThe School of Information Science and Engineering 峰值速度和实测速度分别为每秒 1206.19万亿次和 563.1万亿次 CPU/GPU混合异构系统 6144个通用处理器；5120个加速处理器内存总容量 98TB通信带宽 40Gbps共享磁盘总容量为 1PB 。共享磁盘总容量为1PB Top500中第五位美国克雷公司 “ 美洲虎 ” （ Jaguar）第一， 1.76千万亿次，中国 “ 星云 ” 第二 40Advanced Computer ArchitectureThe School of Information Science and Engineering 41Advanced Computer ArchitectureThe School of Information Science and Engineering 高性能计算与高性能计算机高性能计算的意义高性能计算的内在含义高性能计算的应用需求高性能计算的战略地位高性能计算发展与现状高性能计算机的发展高性能计算的现状高性能计算面临的主要问题功耗存储 (memory wall) 编程 (programming wall) 高性能计算的未来 Petaflops超级计算机展望若干新技术中国高性能计算的机遇和挑战 42Advanced Computer ArchitectureThe School of Information Science and Engineering 高性能计算机高性能计算机由多个计算单元组成，运算速度快、存储容量大、可靠性高的计算机系统。也称为：巨型计算机、超级计算机并行计算机由多个处理单元组成的计算机系统，这些处理单元相互通讯和协助，能够高速、高效地求解大型复杂问题。 43Advanced Computer ArchitectureThe School of Information Science and Engineering 其发展历程可以简单的分为两个时代专用时代包括向量机， MPP系统， SGI NUMA 系统， SUN大型 SMP系统，也包括我国的神威，银河，曙光 1000等。之所以称为 “ 专用 ” ，并不是说它们只能运行某种应用，是指它们的组成部件是专门设计的，它们的 CPU板，内存板， I/O板，操作系统，甚至 I/O系统，都是不能在其它系统中使用的。由于技术上桌面系统与高端系统的巨大差异，和用户群窄小。普及时代高性能计算机价格下降，应用门槛降低，应用开始普及。两个技术趋势起到重要作用。商品化趋势使得大量生产的商品部件接近了高性能计算机专有部件标准化趋势使得这些部件之间能够集成一个系统中，其中 X86处理器、以太网、内存部件、 Linux都起到决定性作用。集群系统是高性能计算机的一种，它的技术基础和工业基础都是商品化和标准化。 44Advanced Computer ArchitectureThe School of Information Science and Engineering 高性能计算机系统结构并行向量机 SMP DSM（ NUMA） MPP，节点可以是单处理器的节点，也可以是SMP， DSM Cluster Constellation 45Advanced Computer ArchitectureThe School of Information Science and Engineering 并行计算机系统类型 Flynn分类：SISD, SIMD, MIMD, MISD 结构模型：PVP, SMP, MPP, DSM, COW 访存模型：UMA, NUMA, COMA, CC-NUMA, NORMA 46Advanced Computer ArchitectureThe School of Information Science and Engineering 并行计算机分类Flynn分类Flynn(1972)提出指令流、数据流和多倍性概念，把不同的计算机分为四大类： SISD（ Single-Instruction Single-Data） SIMD（ Single-Instruction Multi-Data） MISD（ Multi-Instruction Single-Data） MIMD（ Multi-Instruction Multi-Data）现代高性能计算机都属于 MIMD。 MIMD从结构上和访存方式上，又可以分为：结构模型： PVP, SMP, MPP, DSM, COW 访存模型： UMA, NUMA, COMA, CC-NUMA, NORMA 47Advanced Computer ArchitectureThe School of Information Science and Engineering 结构模型 48Advanced Computer ArchitectureThe School of Information Science and Engineering 对称多处理机系统(SMP) SMP 对称式共享存储 :任意处理器可直接访问任意内存地址 ,且访问延迟、带宽、机率都是等价的 ; 系统是对称的；微处理器 : 一般少于 64个 ; 处理器不能太多 , 总线和交叉开关的一旦作成难于扩展；例子 : IBM R50, SGI Power Challenge, SUN Enterprise, 曙光一号 ; 49Advanced Computer ArchitectureThe School of Information Science and Engineering 分布式共享存储系统(DSM) DSM 分布共享存储 : 内存模块物理上局部于各个处理器内部 ,但逻辑上 (用户 )是共享存储的 ; 这种结构也称为基于 Cache目录的非一致内存访问 (CC-NUMA)结构 ;局部与远程内存访问的延迟和带宽不一致 ,3-10倍高性能并行程序设计注意 ; 与 SMP的主要区别： DSM在物理上有分布在各个节点的局部内存从而形成一个共享的存储器；微处理器 : 16-128个 ,几百到千亿次 ; 代表 : SGI Origin 2000, Cray T3D; 50Advanced Computer ArchitectureThe School of Information Science and Engineering 大规模并行计算机系统(MPP) MPP 物理和逻辑上均是分布内存能扩展至成百上千个处理器(微处理器或向量处理器 ) 采用高通信带宽和低延迟的互联网络 (专门设计和定制的 ) 一种异步的 MIMD机器；程序系由多个进程组成，每个都有其私有地址空间，进程间采用传递消息相互作用；代表 :CRAY T3E(2048), ASCI Red(3072), IBM SP2, 曙光 1000 51Advanced Computer ArchitectureThe School of Information Science and Engineering 集群系统(Cluster) Cluster 每个节点都是一个完整的计算机各个节点通过高性能网络相互连接网络接口和 I/O总线松耦合连接每个节点有完整的操作系统曙光 2000、 3000、 4000, ASCI Blue Mountain 52Advanced Computer ArchitectureThe School of Information Science and Engineering 访存模型UMA: NORMA:NUMA: 多处理机（单地址空间共享存储器） UMA: Uniform Memory Access NUMA: Nonuniform Memory Access多计算机（多地址空间非共享存储器） NORMA: No-Remote Memory Access 53Advanced Computer ArchitectureThe School of Information Science and Engineering 结构模型访存模型UMA: NUMA: NORMA: 54Advanced Computer ArchitectureThe School of Information Science and Engineering 多处理机 64-byte line size 10 clock cycles latency; Write Back update policy 138Advanced Computer ArchitectureThe School of Information Science and Engineering 139Advanced Computer ArchitectureThe School of Information Science and Engineering Intel Multi-core Plan 140Advanced Computer ArchitectureThe School of Information Science and Engineering Intel Multi-core Plan 141Advanced Computer ArchitectureThe School of Information Science and Engineering Intelstera-scalechip 142Advanced Computer ArchitectureThe School of Information Science and Engineering Cell from IBM and Sony 143Advanced Computer ArchitectureThe School of Information Science and Engineering Cell from IBM and Sony 144Advanced Computer ArchitectureThe School of Information Science and Engineering Intel 80核芯片 (2007) 80个处理核心 1 Teraflop 100亿次运算 /瓦特主频 3.1GHz 面积 300mm，各 CPU内核与内存 1对 1地连接，分别拥有256MBps的内存带宽 32MB的片上静态 RAM 。单芯片整体的内存带宽达到了 1TB/s 13.75mm * 22 mm 145Advanced Computer ArchitectureThe School of Information Science and Engineering IBM POWER7(2010) 146Advanced Computer ArchitectureThe School of Information Science and Engineering Niagara from SUN 147Advanced Computer ArchitectureThe School of Information Science and EngineeringGPU TransformCPUApplication Rasterize Shade VideoMemory(Textures)Xformed, Lit Vertices (2D) Graphics State Render-to-tex tureAssemblePrimitivesVertices (3D) Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth)Programmable vertex processor! Programmable pixel processor! FragmentProcessorGPUFundamentals:TheModernGraphicsPipelineVertexProcessor Geometryroces or 148Advanced Computer ArchitectureThe School of Information Science and Engineering GPUFundamentals:TheModernGraphicsPipeline 149Advanced Computer ArchitectureThe School of Information Science and Engineering 150Advanced Computer ArchitectureThe School of Information Science and Engineering 151Advanced Computer ArchitectureThe School of Information Science and Engineering 152Advanced Computer ArchitectureThe School of Information Science and Engineering For a specific program compiled to run on a specific machine “A”, the following parameters are provided: Thetotalinstructioncountoftheprogram. Theaveragenumberofcyclesperinstruction(averageCPI). Clockcycleofmachine“A” How can one measure the performance of this machine running this program? Intuitivelythemachineissaidtobefasterorhasbetterperformancerunningthisprogramifthetotalexecutiontimeisshorter. Thustheinverseofthetotalmeasuredprogramexecutiontimeisapossibleperformancemeasureormetric: Performance A=1/ExecutionTimeAHowtocompareperformanceofdifferentmachines?Whatfactorsaffectperformance?Howtoimproveperformance? 153Advanced Computer ArchitectureThe School of Information Science and Engineering A program is comprised of a number of instructions, I Measuredin: instructions/program The average instruction takes a number of cycles per instruction (CPI) to be completed. Measuredin:cycles/instruction IPC(InstructionsPerCycle)=1/CPI CPU has a fixed clock cycle time C=1/clockrate Measuredin: seconds/cycle CPU execution time is the product of the above three parameters as follows: CPUTime=ICxCPIxCCCPUtime=Seconds =InstructionsxCyclesxSeconds Program ProgramInstructionCycle 154Advanced Computer ArchitectureThe School of Information Science and Engineering CPUtime =Seconds =InstructionsxCyclesxSecondsProgram ProgramInstructionCycleCPIIPC Clock Cycle CInstruction Count IProgramCompilerOrganization(Micro-Architecture)TechnologyInstruction SetArchitecture (ISA) X X X X X X X X X 155Advanced Computer ArchitectureThe School of Information Science and Engineering CompilerProgrammingLanguageApplicationDatapathControlTransistors Wires PinsISAFunctionUnits Cyclespersecond(clockrate).Megabytespersecond.Executiontime:Targetworkload,SPEC95,SPEC2000,etc.Each metric has a purpose, and each can be misused.(millions)ofInstructionspersecondMIPS(millions)of(F.P.)operationspersecondMFLOP/s 156Advanced Computer ArchitectureThe School of Information Science and Engineering The most popular and industry-standard set of CPU benchmarks. SPECmarks, 1989: 10programsyieldingasinglenumber(“SPECmarks”). SPEC92, 1992: SPECInt92(6integerprograms)andSPECfp92(14floatingpointprograms). SPEC95, 1995: SPECint95 (8 integer programs): go, m88ksim, gcc, compress, li, ijpeg, perl, vortex SPECfp95 (10 floating-point intensive programs): tomcatv, swim, su2cor, hydro2d, mgrid, applu, turb3d, apsi, fppp, wave5 Performance relative to a Sun SuperSpark I (50 MHz) which is given a score of SPECint95 = SPECfp95 = 1 SPEC CPU2000, 1999: CINT2000(11integerprograms).CFP2000(14floating-pointintensiveprograms) PerformancerelativetoaSunUltra5_10(300MHz)whichisgivenascoreofSPECint2000=SPECfp2000=100 157Advanced Computer ArchitectureThe School of Information Science and Engineering Top20SPECCPU2000Results(AsofMarch2002)# MHz Processor int peak int base MHz Processor fp peak fp base 1 1300 POWER4 814 790 1300 POWER4 1169 1098 2 2200 Pentium 4 811 790 1000 Alpha 21264C 960 776 3 2200 Pentium 4 Xeon 810 788 1050 UltraSPARC-III Cu 827 7014 1667 Athlon XP 724 697 2200 Pentium 4 Xeon 802 7795 1000 Alpha 21264C 679 621 2200 Pentium 4 801 7796 1400 Pentium III 664 648 833 Alpha 21264B 784 6437 1050 UltraSPARC-III Cu 610 537 800 Itanium 701 7018 1533 Athlon MP 609 587 833 Alpha 21264A 644 5719 750 PA-RISC 8700 604 568 1667 Athlon XP 642 59610 833 Alpha 21264B 571 497 750 PA-RISC 8700 581 52611 1400 Athlon 554 495 1533 Athlon MP 547 50412 833 Alpha 21264A 533 511 600 MIPS R14000 529 49913 600 MIPS R14000 500 483 675 SPARC64 GP 509 37114 675 SPARC64 GP 478 449 900 UltraSPARC-III 482 42715 900 UltraSPARC-III 467 438 1400 Athlon 458 42616 552 PA-RISC 8600 441 417 1400 Pentium III 456 43717 750 POWER RS64-IV 439 409 500 PA-RISC 8600 440 39718 700 Pentium III Xeon 438 431 450 POWER3-II 433 42619 800 Itanium 365 358 500 Alpha 21264 422 383 20 400 MIPS R12000 353 328 400 MIPS R12000 407 382Source: http:/ Top 20 SPECfp2000Top 20 SPECint2000 158Advanced Computer ArchitectureThe School of Information Science and Engineering Amdahls Law: The performance gain from improving some portion of a computer is calculated by: Speedup = Performance for entire task using the enhancement Performance for the entire task without using the enhancementor Speedup = Execution time without the enhancement Execution time for entire task using the enhancement 159Advanced Computer ArchitectureThe School of Information Science and Engineering The performance enhancement possible due to a given design improvement is limited by the amount that the improved feature is used Amdahls Law:PerformanceimprovementorspeedupduetoenhancementE: Execution Time without E Performance with E Speedup(E) = - = - Execution Time with E Performance without E SupposethatenhancementEacceleratesafractionFoftheexecutiontimebyafactorSandtheremainderofthetimeisunaffectedthen: Execution Time with E = (1-F) + F/S) X Execution Time without E Hence speedup is given by: Execution Time without E 1Speedup(E) = - = - (1 - F) + F/S) X Execution Time without E (1 - F) + F/S 160Advanced Computer ArchitectureThe School of Information Science and Engineering Before:ExecutionTimewithoutenhancementE:Unaffected, fraction: (1- F)After: ExecutionTimewithenhancementE:EnhancementEacceleratesfractionFofexecutiontimebyafactorofSAffected fraction: FUnaffected, fraction: (1- F) F/SUnchanged Execution Time without enhancement E 1Speedup(E) = - = - Execution Time with enhancement E (1 - F) + F/S 161Advanced Computer ArchitectureThe School of Information Science and Engineering For the RISC machine with the following instruction mix given earlier:Op Freq Cycles CPI(i) %TimeALU 50% 1 .5 23%Load 20% 5 1.0 45%Store 10% 3 .3 14%Branch 20% 2 .4 18% If a CPU design enhancement improves the CPI of load instructions from 5 to 2, what is the resulting performance improvement from this enhancement:Fraction enhanced = F = 45% or .45Unaffected fraction = 100% - 45% = 55% or .55Factor of enhancement = 5/2 = 2.5Using Amdahls Law: 1 1Speedup(E) = - = - = 1.37 (1 - F) + F/S .55 + .45/2.5 CPI = 2.2 162Advanced Computer ArchitectureThe School of Information Science and Engineering Suppose that enhancement Ei accelerates a fraction Fi of the execution time by a factor Si and the remainder of the time is unaf

注意事项

本文（《高级体系结构》PPT课件）为本站会员（san****019）主动上传，装配图网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知装配图网（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。