您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 公司方案 > Solaris 操作环境性能监控及调试
SUN基础知识及故障诊断培训SolutionArchitectGreaterChinaClientServicesa265265@sina.com我们首先要做的•问客户正确的问题–打没打补丁?–在哪个环节出现问题?吞吐量,响应时间,网络,I/O,应用,…....–明白客户怎么和怎样提出要求•测试和想法是否合理?•‘它应该跑得更快...’,‘很多的CPU空闲...'•开始的支持和建议–收集基本的性能参数–使用工具进行分析,将图形化结果展现给客户,观察所谓的‘峰值'–识别资源的限制–用自己的发现提出自己的建议Solaris性能分析的目的•识别Solaris操作系统的瓶颈•做出增加性能的适当建议•并不作为应用的调试和Kernel调试•从基本的统计数据中发现有用的东西性能管理Section1性能管理的定义•性能管理就是:测量,分析和优化计算资源,以提供给最终用户可认可的服务一般术语•Throughput–Countofnumberoftransactions•Latency–Timeittakestodosomething•Utilization–Amountofresourcesconsumedduringaction性能的层面•应用•操作系统•硬件•网络•商务BusinessApplicationOSNetworkResponsetimeTransactionsUptimeApplicationsDatabaseSwapMutexesRunqueueLatencyErrorUtilizationHardwareCPURAMI/O(Disks)Solaris性能分析Section39基本的原则•系统是一系列的资源–CPU(s)–memory–busses–disks,diskcontrollers–networks–operatingsystems–DBMSsystems(especiallylocksandinternallatches)•性能的降低是以上一个或几个资源枯竭所造成的。10从哪做起•一旦瓶颈被发现,按下列顺序调试:–应用–数据库–硬件–Solaris内核参数潜在的瓶颈问题•Disk•Network•Memory•CPU磁盘性能分析Section3硬盘瓶颈•无足够空间•响应时间长•不良的规划•RAID设定•文件系统问题•数据库问题确认硬盘瓶颈•使用sar,iostat•察看响应时间,磁盘使用率,队列长度,请求分配•了解磁盘缓存是否工作相关的访问时间DeviceRealTimeSecondRel.timeCPURegister2nsec2x10-92sec.CPUcache20nsec2x10-920sec.MainMemory2000nsec.2x10-82-3min.Disk20msec2x10-37monthiostat-x30extendeddevicestatisticsdevicer/sw/skr/skw/swaitactvsvc_t%w%bfd00.00.00.00.00.00.0279.000sd00.00.00.00.00.00.00.000sd10.10.90.76.30.00.172.101sd40.00.00.00.00.00.00.000sd60.00.00.00.00.00.00.000nfs10.00.00.00.00.00.00.700nfs20.00.00.10.40.00.0298.100nfs30.00.00.20.00.00.035.400Wait–队列长度,等待磁盘处理的请求个数svc_t–平均服务时间(毫秒)%w–等待处理请求次数的百分比%b–忙时百分比30%良好60%有问题iostat•I/Osize=(kr/s)/(r/s)•srv_t接近磁盘寻址时间•Wait:#OS对磁盘请求等待,0warning•Actv:#在磁盘处理中而未完成的请求个数iostat•r/s,w/s:Averagereads/writespersecond.•Kr/s,Kw/s:AverageKbread/writtenpersecond.•wait:Timespentbyaprocesswhilewaitingforblock(egdisk)I/Otocomplete.•actv:Numberofactiverequestsinthehardwarequeue.•%w:Occupancyofthewaitqueue.•%b:Occupancyoftheactivequeuewiththedevicebusy.•svc_t:Servicetime(ms).Includeseverything:waittime,activequeuetime,seekrotation,transfertime.•us/sy:User/systemCPUtime(%).•wt:WaitforI/O(%).•id:Idletime(%).使用iostat查问题The%busycolumnshowsdiskutilization.Disksutilizedover65%areaproblem;disksover30%-35%canbeimprovedbyRAIDmehtods.Anydiskutilizedmorethan95%isaSERIOUSproblem.Theservicetimeshowshowlonganindividualdiskrequestwaits.Itshouldbeclosetotheaverageseektimeofthedisk.Spikesofhighsvc_tarenotaproblem.Thisdiskisatypicalproblemdisk.Theactvcolumnshowsthatthereare20outstandingI/Orequestsagainstthisdrive,inadditiontobeingfullybusyandveryslow.Thisdiskisnotaproblem,despiteitshighservicetime,becauseitisnotverybusy.ItisoftenusefultocomputeaverageI/Osizefromr/sandKr/s.#iostat-xc2extendeddevicestatisticscpudevicer/sw/skr/skw/swaitactvsvc_t%w%bussywtidmd041.00.0382.70.00.00.614.9041415900c0t3d043.92.8405.44.10.00.714.7145c1t0d021.92.8203.84.10.00.313.9031c2t0d021.92.8201.64.10.00.313.7031c7t2d00.00.00.00.00.00.00.000c7t3d00.00.10.02.00.00.0159.001c7t4d0115.92.8228.44.10.019.7311.7199TypicalI/OsizeonthisdiskisKr/r;inthiscasetypicalI/Osizeis224.8/115=2K.svc_tincludestimewaitinginthequeue.Whensvc_tishigh,theservicetimeisapproxsvc_t/actv,about15msinthiscase.actvandwaitarerespectivelythenumberofrequestspendinginthediskitselfandintheOSwaitingtogettothedisk.wait0isawarningsign,especiallyifthediskisinadiskarray.iostat•磁盘名称(-n)•I/O每个分区(-p)•可显示磁带的I/Oshown•CPU占用资源(-c)•显示每个磁盘的信息(-x)#iostat-xcpn60extendeddevicestatisticscpudevicer/sw/skr/skw/swaitactvsvc_t%w%bussywtidrmt048.30.014382.70.00.00.60.0091415900c0t6d00.00.00.00.00.00.00.000c11t14d00.20.11.41.20.00.018.600c11t14d0s00.00.10.41.20.00.024.900c11t14d0s10.00.00.00.00.00.00.000c11t14d0s20.00.00.00.00.00.00.000c11t14d0s30.00.00.00.00.00.00.000c11t14d0s40.00.00.00.00.00.00.000c11t14d0s60.10.01.00.00.00.010.200c11t14d0s70.00.00.00.00.00.00.000statsforwholedisk解决在iostat找到的问题•采用RAID模式,对一个磁盘的访问转化为对多个磁盘的访问•当磁盘过忙时使用RAID-0•当应用为multi-stream读,使用RAID-1*•当应用为读方式时用RAID-5*•多文件的建立/删除:使用PrestoServePrestoServe:在数据真正写入磁盘之前,就返回给用户写完信息怎么处理磁盘瓶颈•负载平衡(striping,partitioning)•增加磁盘的数量•分配SWAP区•将关键数据出存在低柱面•相关数据放在同一分区•不要将磁盘存满•增加memory(RAIDControllerMEM/UFS/DBcache)数据库和文件系统•默认的newfs参数不适合数据库–UFSsinglewriterlockvs.multiplewrites–使用多个数据库文件或其它...on•使用裸设备•使用forcedirection方式mount磁盘系统•UFS现在的logging,directI/O,cocurrentI/O:Perfasvxfs+quickI/ORawvs.UFSFastEliminatessinglewriterissueDifficulttomanageDifficulttobackupNocacheorbufferingSlowerEasiertomanageEasiertobackupListedinvfstab,mount,df,lsCachedinRAM大型数据库的优化建议setmaxphys=8388608#LargeSCSItransferssetufs_LW=4194304#increasewritethrottleforlargesystemssetufs_HW=67108864setmaxpgio=65536#speeduppagescannersetfastscan=65536#speeduppagescanner网络性能分析Section3netstatIerrsmeansthatthesystemwasunabletoprocessinputpacketsbeforethebufferoverflowed.UsuallythisisbecausetheCPUsorthereceivingprocessesweretoobusytohandlemoredata.(seenetstat-sformoreinfo)Oerrsusuallymeanthatthereisaproblemofthephysicalnetwork,forexamplenoiseinthewiresorbrokenrouters.Thisformwatchesasinglenetworkinterfaceovertime#netstat-iNameMtuNet/DestAddressIpktsIerrsOpktsOerrsCollis
本文标题:Solaris 操作环境性能监控及调试
链接地址:https://www.777doc.com/doc-3737907 .html