ASM Internals. By Riyaj Shamsudeen. OraInternals Riyaj Shamsudeen

Please download to get full document.

View again

of 58
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Recruiting & HR

Published:

Views: 0 | Pages: 58

Extension: PDF | Download: 0

Share
Related documents
Description
ASM Internals By Riyaj Shamsudeen OraInternals Riyaj Shamsudeen Me 23+ years as DBA OakTable member Oracle ACE Director Specializes in RAC, performance tuning and Internals. Slowly in to BigData
Transcript
ASM Internals By Riyaj Shamsudeen OraInternals Riyaj Shamsudeen Me 23+ years as DBA OakTable member Oracle ACE Director Specializes in RAC, performance tuning and Internals. Slowly in to BigData orainternals.wordpress.com Web: OraInternals Riyaj Shamsudeen 2 WARNING Most of the topics in this presentations are from my research. Writing about internals have issues: a. I completely misunderstood the data and trace files. b. Future version changed the feature, so, information is outdated. Tested in version 11g, , Linux and Solaris 11 platform. AGENDA ASM overview: Instance, asmb etc Tools: kfod, kfed, amdu Disk group, redundancy, AU ASM rebalance Asmcmd Conclusion Architecture ASM is an Oracle Instance with instance_type= ASM ASM manages disks, luns and externalizes files to RDBMS ASM instance is never opened. Simply in a mount state. Demo: v$instance OraInternals Riyaj Shamsudeen 5 Architecture ASM provides an extent map of files to RDBMS. RDBMS directly accesses the disk to perform I/O. ASM is not involved in I/O operation. Extending files or adding data files will involve refresh of extent map from ASM to RDBMS. Demo: v$instance OraInternals Riyaj Shamsudeen 6 Architecture: With ASM OraInternals Riyaj Shamsudeen 7 RDBMS I/O Truss of DBWR: ASM is not involved for RDBMS I/O to the devices. Write calls to file pointer 262: (truss output) /1: kaio(aiowrite, 262, 0x6DD3C000, 8192, 0xFC17E6380F4E4000) = 0... /1: kaio(aiowrite, 262, 0x7DF3F000, 49152, 0xFC17D7080BD8A000) = 0 File pointer 262 is a SCSI device (pfiles output) 262: S_IFCHR mode:0755 dev:291,0 ino: uid:601 gid:503 rdev:30,129 O_RDWR O_NONBLOCK O_DSYNC O_LARGEFILE FD_CLOEXEC OraInternals Riyaj Shamsudeen 8 RDBMS is a client (aka umbilical process) asmb process running in RDBMS instance makes a connection to ASM instance, as a foreground process for ASM instance. asmb process sleeps in a loop and a primary mechanism to detect ASM crash. If ASM instance crashes, asmb connection will die leading to an RDBMS instance crash. Demo: asm_connections.sql, asm_clients.sql OraInternals Riyaj Shamsudeen 9 RDBMS as a client Truss of a RDBMS startup shows that a LOCAL connection was made to the ASM instance. 1821: execve( /u02/app/11.2.0/grid/bin/oracle ,0x0e8e87f0,0x0e9ed510) Instance restart alone opens 6 different connections to ASM instance. You need to set processes parameter appropriately. grep execve truss_startup.lst grep grid 1821: execve( /u02/app/11.2.0/grid/bin/oracle , 0x0E8E87F0, 0x0E9ED510) argc = : execve( /u02/app/11.2.0/grid/bin/oracle , 0x0E8E8090, 0x0EA11970) argc = : execve( /u02/app/11.2.0/grid/bin/oracle , 0x0E8E8090, 0x0EA11970) argc = : execve( /u02/app/11.2.0/grid/bin/oracle , 0x0E8E7550, 0x0E99B220) argc = : execve( /u02/app/11.2.0/grid/bin/oracle , 0x0E8E7550, 0x0E99B220) argc = : execve( /u02/app/11.2.0/grid/bin/oracle , 0x0E8E7550, 0x0E99B220) argc = 2 OraInternals Riyaj Shamsudeen 10 Death of asmb process asmb process sleeps on ASM background timer with 5s sleep cycle. *** :38: WAIT #0: nam='asm background timer' ela= p1=0 p2=0 p3=0 obj#=-1 tim= I killed the connection from ASM instance, resulting in asmb process death, followed by RDBMS instance crash NOTE: ASMB terminating Errors in file /u01/app/oracle/diag/rdbms/solrac/solrac1/trace/solrac1_asmb_1492.trc: ORA-15064: communication failure with ASM instance ORA-03113: end-of-file on communication channel Process ID: Session ID: 30 Serial number: 3 ASMB (ospid: 1492): terminating the instance due to error Demo: Killing asmb connection OraInternals Riyaj Shamsudeen 11 ASM extent pointer array v$sgastat shows the extent pointer array in the RDBMS. This array is retrieved from ASM instance. select * from gv$sgastat where name like '%ASM extent% ; INST_ID POOL NAME BYTES shared pool ASM extent pointer array shared pool ASM extent pointer array For large databases, this area will be bigger. To improve instance startup performance, only minimal extent mapping is retrieved initially. More data added to this array on need basis. OraInternals Riyaj Shamsudeen 12 Minimal ASM parameters Instance_type= ASM ASM instances named +ASMx SGA components are: db_cache_size =64M # To cache metadata blocks shared_pool_size=128m # for various structures for ASM large_pool_size =64M # for extent map operations I usually, set processes parameter to *# of databases. 11g+ supports automatic memory management and you can set memory_target =512M and let Oracle manage it. Demo: Parameters, v$sgastat, show sga OraInternals Riyaj Shamsudeen 13 ASM disks During ASM startup, ASM instance scans the disks to identify all ASM disks. Parameter asm_diskstring identifies the disks to scan. asm_diskstring accepts wildcard parameters and null is default. To improve ASM startup time, set this parameter properly. For example, Following value for asm_diskstring will search for all devices matching the wildcard and has read write permissions. asm_diskstring = /dev/rdsk/c2t*d0s1 Demo: show parameter asm_diskstring OraInternals Riyaj Shamsudeen 14 kfod Kfod h for help kfod utility can be used to check all devices that qualifies asm_diskstring. $ kfod status=true asm_diskstring='/dev/mapper/' disks=all verbose=true Disk Size Header Path User Group ================================================================================ 1: Mb MEMBER /dev/mapper/asmdisk1p1 oracle oinstall 2: Mb MEMBER /dev/mapper/asmdisk2p1 oracle oinstall ORACLE_SID ORACLE_HOME ================================================================================ +ASM1 /u01/app/12.1.0/grid KFOD-00311: Error scanning device /dev/mapper/control ORA-27041: unable to open file Linux-x86_64 Error: 13: Permission denied Additional information: 42 KFOD-00311: Error scanning device /dev/mapper/36000c29d5fb1e04764ebbedd94bb6acd ORA-27041: unable to open file Linux-x86_64 Error: 13: Permission denied OraInternals Riyaj Shamsudeen 15 ASM disks - RAC A lun must be visible in all nodes of a cluster with proper permissions for ASM to consider a lun. This means that lun path need not be the same, but lun should exist and visible through asm_diskstring parameter. For example, same device have different names in two nodes: node1 /dev/rdsk/c2t9d0s1 node2 /dev/rdsk/c2t11d0s1 ASM identifies Lun even if configuration changes later Metadata kept in every disk header. Demo: show parameter asm_diskstring OraInternals Riyaj Shamsudeen 16 kfed disk header kfed utility can be used to dump the metadata block(s) of the device. Without any parameter, kfed reads disk header. $ kfed read /dev/rdsk/c2t9d0s1 kfbh.endian:.. kfbh.type:.. kfbh.block.blk: kfbh.block.obj: kfdhdb.compat: kfdhdb.dsknum: kfdhdb.grptyp: kfdhdb.hdrsts: kfdhdb.dskname: kfdhdb.grpname: kfdhdb.fgname: kfdhdb.capname: Demo: kfed read 1 ; 0x000: 0x01 1 ; 0x002: KFBTYP_DISKHEAD 0 ; 0x004: T=0 NUMB=0x ; 0x008: TYPE=0x8 NUMB=0x ; 0x020: 0x0b ; 0x024: 0x ; 0x026: KFDGTP_EXTERNAL 3 ; 0x027: KFDHDR_MEMBER DATA_0007 ; 0x028: length=9 DATA ; 0x048: length=4 DATA_0007 ; 0x068: length=9 ; 0x088: length=0 OraInternals Riyaj Shamsudeen 17 kfed..2 kfdhdb.secsize: kfdhdb.blksize: kfdhdb.ausize: kfdhdb.mfact: kfdhdb.dsksize: kfdhdb.pmcnt: kfdhdb.fstlocn: kfdhdb.altlocn: kfdhdb.f1b1locn: kfdhdb.redomirrors[0]: kfdhdb.redomirrors[1]: kfdhdb.redomirrors[2]: kfdhdb.redomirrors[3]: kfdhdb.dbcompat: 512 ; 0x0b8: 0x ; 0x0ba: 0x ; 0x0bc: 0x ; 0x0c0: 0x0001bc ; 0x0c4: 0x000007d0 2 ; 0x0c8: 0x ; 0x0cc: 0x ; 0x0d0: 0x ; 0x0d4: 0x ; 0x0d8: 0x ; 0x0da: 0x ; 0x0dc: 0x ; 0x0de: 0x ; 0x0e0: 0x0a Demo: kfed read OraInternals Riyaj Shamsudeen 18 kfed other blocks kfed can be used to read other blocks in the lun also. $kfed read /dev/rdsk/c2t9d0s1 aun=0 blkn=1 grep kfbh.type kfbh.type: 2 ; 0x002: KFBTYP_FREESPC $kfed read /dev/rdsk/c2t9d0s1 aun=0 blkn=2 grep kfbh.type kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL # ASM also stores backup disk header in the second allocation unit, last 2 blocks. $kfed read /dev/rdsk/c2t9d0s1 aun=1 blkn=254 more kfbh.type: kfbh.datfmt: 1 ; 0x002: KFBTYP_DISKHEAD 1 ; 0x003: 0x01 Demo: kfed read OraInternals Riyaj Shamsudeen 19 Corrupting header Minor header related repair possible $ kfed read /dev/mapper/asmdisk4p1 more kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD $ dd if=/dev/zero of=/dev/mapper/asmdisk4p1 bs=1m count=1 1+0 records in 1+0 records out $ kfed read /dev/mapper/asmdisk4p1 more kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 0 ; 0x001: 0x00 kfbh.type: 0 ; 0x002: KFBTYP_INVALID Demo: kfed read OraInternals Riyaj Shamsudeen 20 Kfed repair $ kfed repair /dev/mapper/asmdisk4p1 $ kfed read /dev/mapper/asmdisk4p1 more kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD Demo: kfed read OraInternals Riyaj Shamsudeen 21 amdu $ amdu -diskstring=/dev/mapper/asmdisk3p1 amdu_2017_01_14_07_36_15/ amdu can be used to extract files, even when the disks are corrupt. $ ls -lt amdu_2017_01_14_07_36_15/ total 4 -rw-r--r--. 1 oracle oinstall 1834 Jan 14 07:36 report.txt $ more amdu_2017_01_14_07_36_15/report.txt -*-amdu-*- ******************************* AMDU Settings ******************************** ORACLE_HOME = /u01/app/12.1.0/grid System name: Node name: Release: Linux rac1.localdomain el6uek.x86_64 Version: #2 SMP Fri Aug 8 21:59:01 PDT 2014 Machine: x86_64 amdu run: 14-JAN-17 07:36:15 Endianess: 1 OraInternals Riyaj Shamsudeen 22 amdu DISK REPORT N Disk Path: /dev/mapper/asmdisk3p1 Unique Disk ID: Disk Label: Physical Sector Size: 512 bytes Disk Size: 2047 megabytes Group Name: TEST Disk Name: TEST_0000 Failure Group Name: TEST_0000 Disk Number: 0 Header Status: 3 Disk Creation Time: 2017/01/11 23:49: Last Mount Time: 2017/01/14 07:32: Compatibility Version: 0x0a100000( ) Disk Sector Size: 512 bytes Disk size in AUs: 2047 AUs Group Redundancy: 2 Metadata Block Size: 4096 bytes AU Size: bytes Stride: AUs... OraInternals Riyaj Shamsudeen 23 V$asm_disk V$asm_disk shows all the disks that ASM has visibility and access. Header_status shows the state of the disk. select header_status, name from v$asm_disk; HEADER_STATU NAME MEMBER MEMBER DATA_0001 DATA_0002 Header status Member Candidate Former Provisioned Demo: asm_disks.sql Meaning Disk is part of the disk group Available to add Was part of another disk group Linux specific, ASMLIB configured OraInternals Riyaj Shamsudeen 24 Multipathing & ASM ASM does not provide any multi-pathing solutions, but leverages the implemented solution. Multi-pathing solution should: 1. Provide single block device interface to a lun with multiple paths. 2. Handle the failover and load balancing between multiple paths. 3. externalize just one path to ASM. ASM does not handle it properly if a disk is seen twice while scanning the devices. OraInternals Riyaj Shamsudeen 25 ASM disk group As the name suggests, it is a group of ASM disks Essentially, ASM hides the disks underneath as an abstraction layer and provides files to the RDBMS/ACFS clients. Three types of redundancy implementations: External, normal, and high. With external redundancy ASM assumes that SAN takes care of redundancy. With normal redundancy, there are two copies managed by ASM. Three copies managed by ASM in the case of high redundancy. Demo: asm_disk_group.sql, asm_disks.sql OraInternals Riyaj Shamsudeen 26 ASM disk group Picture of a Disk group with Normal redundancy. Two failure groups are allocated since this is a mirrored disk group. ASM does not mirror disks, rather extents are kept in two separate failure groups. Disk group DATA1 Failure group D1 Failure group D2 OraInternals Riyaj Shamsudeen 27 Example Construct the failure groups such a way that one component failure affects at the most one failure group. create diskgroup DATA normal redundancy Failure group fl1 disk /dev/rdsk/c3t11d3s4, /dev/rdsk/c3t11d4s4, /dev/rdsk/c3t11d5s4, /dev/rdsk/c3t11d6s4 Failure group fl2 disk /dev/rdsk/c4t12d3s4, /dev/rdsk/c4t12d4s4, /dev/rdsk/c4t12d5s4, /dev/rdsk/c4t11ds4 Failure group fl3 disk /dev/rdsk/c5t13d3s4, /dev/rdsk/c5t13d4s4, /dev/rdsk/c5t13d5s4, /dev/rdsk/c5t13ds4 Failure group fl4 disk /dev/rdsk/c6t14d3s4, /dev/rdsk/c6t14d4s4, /dev/rdsk/c6t14d5s4, /dev/rdsk/c6t14ds4 ; OraInternals Riyaj Shamsudeen 28 Redundancy & I/O In the case of Normal redundancy, there will be two write calls from the host side (by database). This could potentially be an issue if you go from external to normal redundancy. ASM tries to keep nearly same number of primary and secondary extents in each disk(lun). This provides an uniform distribution of I/O activity in all luns. But, ASM does not know anything about striping & mirroring in the SAN. Double SAME methodology in play, generally. OraInternals Riyaj Shamsudeen 29 I/O Errors Normal redundancy (DB) IS read error? Read secondary extent Success Copy from secondary to primary extent Failure Signal ASM to offline disk. Failure. Signal ASM. Offline tablespace. Write alert and continue Write alert and continue OraInternals Riyaj Shamsudeen 30 I/O Errors Normal redundancy (ASM) Disk offline message or write errors in ASM Copy the extent to new AU in the same disk. Failure Check if sufficient partner disks alive. Yes Disk offline, drop (later) and then reblance. Success. Mark original AU invalid. Continue No Disk group offline Write alert and continue OraInternals Riyaj Shamsudeen 31 Fast Mirror Resync Disk goes offline, if ASM encounters errors. But, in 11g, ASM doesn t drop the disk for 3.6 hours. After 3.6 hours, disks are dropped if it is not available. You can modify disk_repair_time from 3.6 hours. ALTER DISKGROUP DATA SET ATTRIBUTE 'DISK_REPAIR_TIME'= 10H'; Idea here is that transient failures do not trigger massive resilvering activities. Changes to the extents are tracked in a bitmap, and this bitmap is used to copy the extents once the disks are available. This is truly useful, say, if a controller fails, as the disks are fine. OraInternals Riyaj Shamsudeen 32 Failures and corruption ASM also reads only a primary extent normally. This means that the corruption in the secondary extent will not be noticed until the primary extent is not accessible. But, writes will write to both extents and so, can detect corruption. Hardware failures will be detected immediately though since each disk will have an approximately equal number of primary & secondary extents. OraInternals Riyaj Shamsudeen 33 Diskgroup check If there are any disk errors, checking diskgroup might be a first step to take. Returns with no errors if the disk group is good. Checks for ASM metadata consistency: Verifies file extent maps and allocation tables. Verifies the directories, files, and aliases are correct. Reads metadata and backup, and verifies them. OraInternals Riyaj Shamsudeen 34 Use same size luns If a lun fails, then ASM will induce rebalance and will copy the extents from primary or secondary. Database will continue to read from the available mirror and will not see any errors. For these reasons, it is important to have same size luns in a disk group. We will discuss rebalance operation later. OraInternals Riyaj Shamsudeen 35 How many disk groups? 2 or 3 (DATA, FRA,CRS) One disk group for database files (say DATA) and another group for flash recovery area (say FRA) is the recommended approach. ASM follows SAME methodology. For example, If there are 5 disks in a disk group (assuming external redundancy), file will be spread on all the available luns. OraInternals Riyaj Shamsudeen 36 Redundant copies If there are two disk groups configured at DB creation time, a control file and a redo log file member will be placed automatically in both disk groups. You could do this manually too, later. Even if you have many database instances using that ASM, still, just 2 or 3 ASM disk group is the recommended approach. There is an exception: If you have tier 1 and tier 2 storage architecture, then it makes sense to have more disk groups. OraInternals Riyaj Shamsudeen 37 ASM files Normal redundancy ASM files are allocated from mirrored extents between the failure groups. ASM file 1 v$asm_file, v$asm_alias, x$kffxp Disk group DATA1 v$asm_disk_group Failure group D1 Failure group D2 v$asm_disk Demo: asm_file_analysis.sql OraInternals Riyaj Shamsudeen 38 ASM files Normal redundancy - Exadata ASM files are allocated from mirrored extents between the failure groups. v$asm_file, v$asm_alias, x$kffxp Disk group DATA1 v$asm_disk_group Demo: asm_file_analysis.sql cell01 cell02 cell03 v$asm_disk OraInternals Riyaj Shamsudeen 39 Extents vs Files ASM files are allocated as series of extents. ASM extents are made up of one or more allocation units. DB file ASM file ASM extents ASM AU ASM extents are contained within an ASM disk though. OraInternals Riyaj Shamsudeen 40 Extent vs AU 1 extent = 1 AU up to extents. 1 extent=8 AUs after extents. This is one asm file and so extents are distributed between the devices (striping) OraInternals Riyaj Shamsudeen 41 Allocation_unit (AU) Allocation unit defines a smallest size disk segment that can be allocated, at disk group level. Allocation_unit defaults to 1MB. It can be increased in multiples of 2 i.e. 2,4,8,16MB etc while creating a diskgroup. (11g). Once a disk group is created with an allocation unit it can not be altered. In 10g, underscore parameters _asm_ausize can be used to modify the allocation_unit. Increased allocation_unit is useful in VLDB daabases. OraInternals Riyaj Shamsudeen 42 Striping File extents are striped. There are two types of striping: coarse and fine. With coarse striping, one allocation unit is the size of stripe. This is used for database files. With fine striping, 128KB is interleaved with 8 allocation units. This type of striping is used for online redo log files, control files, and spfiles. Striping is controlled by templates. Template can be altered, but be careful of implications. Demo: asm_templates.sql OraInternals Riyaj Shamsudeen 43 ASM files You don t need to specify complete file name while creating file from the database. ASM will generate a system defined unique file name if you don t specify complete path. create tablespace ts_small datafile '+DATA' size 10M; select file_name from dba_data_files where tablespace_name='ts_small FILE_NAME DATA/solrac/datafile/ts_small Demo: cr_ts_small.sql OraInternals Riyaj Shamsudeen 44 ASM Directory You can create directory structure in ASM and use that for file names (ASM instance). SQL alter diskgroup data add directory '+DATA/app'; SQL alter diskgroup data add directory '+DATA/app/oracle'; A new file with user defined file name can be added to the database. SQL alter tablespace ts_small add datafile '+DATA/app/oracle/ts_small_02.dbf' size 10M; User defined files are simply alias: $ asmcmd ls -lt '+DATA/app/oracle/ts_small_02.dbf' Type Redund Striped Time Sys Name N ts_small_02.dbf = +DATA/SOLRAC/DATAFILE/TS_SMALL Demo:add_directory, al_ts_small, drop_directory, drop tablespace, OraInternals Riyaj Shamsudeen 45 Rebalance Addition or deletion of asm disk from a disk group will trigger a rebalance operation. Extents are moved from existing disks to new disks, rebalancing the disk usage. OraInternals Riyaj Shamsudeen 46 Processing details RBAL is triggered when there is addition/deletion/resize of disks. RBAL acts as a co-ordinator process, updates metadata that ASM rebalance is underway. Determines the extent to move and the target disk. Hands off the work to ARBx process. ARBx process moves the extent and replies back to RBAL after the successful completion. This goes on until RBAL completes the rebalance operation. OraInternals Riyaj Shamsudeen 47 Asm_power_limit Asm_power_limit controls the speed of rebalance operation. This parameter controls number of ARBx process performing the rebalance operation. Each ARBx process locks just one extent at a time and moves the extent to another disk. You can increase asm_power_limit parameter to improve rebalance operation speed. It is not uncommon to disable the rebalance during busy hours and increase the limit to higher value during off
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks