Archiving and Tiered Storage for A DMF ILMI Best Practices White Paper - PDF

Please download to get full document.

View again

of 23
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Arts & Culture

Published:

Views: 5 | Pages: 23

Extension: PDF | Download: 0

Share
Related documents
Description
Data Management Forum Archiving and Tiered Storage for A DMF ILMI Best Practices White Paper October 1, 2004 Copyright 2004 SNIA Data Management Forum Author: Aloke Guha, Chief Technology Officer,
Transcript
Data Management Forum Archiving and Tiered Storage for A DMF ILMI Best Practices White Paper October 1, 2004 Copyright 2004 SNIA Data Management Forum Author: Aloke Guha, Chief Technology Officer, COPAN Systems Co-Chair, SNIA ILM Technical Work Group Table of Contents 1. INTRODUCTION PURPOSE AND SCOPE DEFINING DATA ARCHIVE AND ARCHIVING COMPARISON OF ARCHIVE MANAGEMENT AND HSM THE ROLE OF ARCHIVING IN ILM STORAGE ARCHITECTURES FOR ARCHIVING TWO-TIER STORAGE ARCHITECTURE THREE-TIER STORAGE ARCHITECTURE ARCHIVAL PROCESS ARCHIVE MANAGEMENT ATTRIBUTES METHODS OF ARCHIVING GENERAL ARCHITECTURE FIT WITH ILM MICROSOFT EXCHANGE LOTUS NOTES DOMINO SERVER SMTP (UNIX) COST HARDWARE SOFTWARE MANAGEMENT AVAILABLE PRODUCTS FOR ARCHIVING SOLUTIONS REFERENCES Table of Figures Figure 1. Comparison of Archive Management and HSM... 6 Figure 2. Two-Tier Storage Architecture... 8 Figure 3. Three-Tier Storage Architecture Figure 4: Functional View of Archive Process for Storage Networking Industry Association Page 2 of 2 1. Introduction The SNIA definition for ILM: The policies, processes, practices, services and tools used to align the business value of information with the most appropriate and cost-effective infrastructure from the time information is created through its final disposition. Information is aligned with business requirements through management policies and service levels associated with applications, metadata and data. This definition includes how data is archived and maintained within the IT infrastructure. Specifically, this paper considers archiving for a horizontal application, . This white paper is intended to provide examples of archiving application data on different tiers of storage so as to ensure the most appropriate and cost-effective infrastructure at any time of existence of the data. It generalizes on how application data should be archived using best practices so that data that is meant for long-term preservation is still accessible when required by the application. This paper is one in a series of white papers that include Data Recovery [1], Security [2] and Archiving, produced by the SNIA Data Management Forum s ILM Initiative specifically for IT Administrators as the intended audience. It is written with the intent of providing IT Administrators with usable application-specific guidelines on deploying ILM solutions using today s technologies Each white paper defines best practices for a specific dimension of ILM solutions. In this paper, the topic is data archiving for general applications such as Microsoft Exchange, Lotus Notes, and Unix Sendmail Purpose and Scope This paper represents an attempt to formalize the general definition of archive as well as its management. Because there have been many confusing definitions of what constitutes an archiving process, different approaches to creating archives and how archived data is accessed are included. Historically, hierarchical storage management (HSM) techniques have been used to create archives on tiered storage. This paper therefore contrasts archive and hierarchically managed storage With increasing recent interest in maintaining corporate and business data for regulatory reasons, new requirements are being imposed on archived data. This paper therefore elaborates on the storage, data and information management needs of archives. In addition, new emerging storage technologies and approaches to managing storage, and appropriate best practices using these technologies are also presented Storage Environment This paper outlines how archive uses the storage infrastructure below the application level and the capabilities required for purposes of archiving on the storage infrastructure. Much of today s archive solutions consider different storage subsystems, such as to DAS, SAN, NAS and CAS This paper focuses on solutions leveraging the benefits of networked storage. Unless otherwise noted, these apply equally to SAN and NAS-based solutions. Given the nature and access of archival storage, specifically, scale and cost, many archive storage solutions are comprised of a combination of different storage media, both disk and tape. As will be explained, the creation and management of the archive 2004 Storage Networking Industry Association Page 3 of 3 requires data management practices that extend beyond simple storage space management This paper provides archive best practices that may be applied to several different server implementations. While this paper addresses general approaches to archiving, approaches that are specific to certain applications are covered elsewhere in application notes. For purposes of illustration, examples are cited from Microsoft Exchange 5.5, 2000, and , Lotus Domino Server version 5, 6 and 6.5 2, and SMTP mail servers such as BSD Unix Sendmail Defining Data Archive and Archiving Data Archive An archive is a collection of data that is maintained as a long-term record of a business, an application, or an information state. Archives are typically kept for auditing, regulatory, analysis or reference purposes rather than for application or data recovery Characterization of Archives All archives are not created equal. The storage infrastructure and the management of an archive depend on the activity level on the archive, i.e., the frequency of access to the archive. For example, if the archived data is not expected to be read except in a contingency situation, such as tax records that are referenced only in the case of an audit, it can be maintained on a storage device such as removable media. However, data integrity on the media even for such infrequently accessed archived data is still important. Active Archives: unlike deep vault storage, many data archives have need for frequent access to data, perhaps many times a day. We refer to these archives as Active Archives. Examples of active archives include reference library data, physical process simulation results, multimedia content, seismic data processing, etc. Applications using active archives cannot tolerate the long access times, and therefore, require storage solutions with fast response times and data integrity that satisfies cost constraints. Deep Archives: deep archives are long-term vaulted data that have very infrequent access. Unlike active archives, they are more tolerant of long access delays. Retention: another distinguishing characteristic of archives is the lifetime of the archive. This can vary greatly, from a few years to, theoretically, infinity. Implications of Long Term Archives: one of the most serious challenges of maintaining and managing a long term archive, say a 100-year archive, is how the data is reliably maintained far beyond the useful life of the original IT infrastructure that created and supported it. This infrastructure includes the storage and the access components that encompass the network and processing hardware and software. In fact, the archive may span 20 or more turns of technology changes Storage Networking Industry Association Page 4 of 4 Archiving Process The description that follows applies to general archiving and not to archiving alone. It is provided as essential background material for discussion of archiving. Historically, the archive process consists of creating a data archive through a copy or move operation of the data for purposes of retention. In the copy-based approach, the data is copied to the target, usually a lower performance lower-cost, secondary storage system 5, and then maybe deleted from the primary storage location. In the move-based approach, the data is moved to the secondary storage target and links or references to the data are either maintained or deleted. Both options are possible in practice but result in requiring different access mechanisms to the archived data, and are therefore worth further discussion and analysis Referential Links to Moved Data: Hierarchical Storage Management of Archive Data When the referential links to the moved data are not deleted, then the original application can access the data transparently, whether it is on the primary storage system or on the secondary storage system. In this case, a hierarchical storage management (HSM) system has been created for the original application. By itself, the HSM system does not provide an archive solution because it does not guarantee retention of the data and because the application or its users can modify or delete the data independent of any external control. An archive is created, if and only if, there is a mechanism to control the retention of the data independent of the application. Thus, an archive generation requires an archive management function that guarantees the existence and integrity of the archive independent of the application. Both the application and the archive management access the same data in their respective namespaces but with different access control rights to the data No Referential Links to Moved Data: Application-Independent Access to Archive Data When the referential links to the moved data are deleted, then the application cannot access the data without the aid of the archive management function. In this case, all access to the archived data requires requests to the archive management. The application does not have visibility to the archive data Comparison of Archive Management and HSM It is important to note that the two archive approaches described earlier have been and are in use. Since an archive can reside on a single storage platform, it is not necessary that an archive must use tiered storage. However, for economic reasons, it is common to move the data when archived to a more cost-effective storage platform distinct from the primary storage on which it was first created. This is common between practical archiving and HSM. The goal of an archive is to retain data with integrity independent of then application that created it. HSM is used to leverage the cost benefits of lower cost storage platforms. The distinctions between archive management and HSM data are summarized in the 5 When we refer to a storage system that is used in conjunction with the primary storage systems and is typically lower performance and lower cost than the primary storage, then we will refer to is as the secondary storage system Storage Networking Industry Association Page 5 of 5 table in Figure 1. Note that while there are HSM products that may provide some archive management functions, this paper discussion and refers to HSM as a practice. Attributes Archive Management HSM Access Method Application can access data directly or indirectly through Archive Management Access Control Data Immutability State of Data Data Copies Use of Tiered Storage Management Function Only read access by application; application cannot modify or delete records Guaranteed during the retention period Data is not in operational state but used for reference Archive Management may maintain a second copy of the application data. The data archived may exist under Application control Can use tiered storage for cost-effective retention but not necessary Manages retention, access and integrity of data, usually on tiered storage, usually set by policy Figure 1. Comparison of Archive Management and HSM Application can access data directly. HSM is transparent to Application No limitations Not guaranteed Data is in operational state Only one instance of data is maintained under control of the Application Uses tiered storage Manages transparent migration of data between tiers, usually set by policy The archive management function can be provided as an independent software function, within and in conjunction with the archive storage system, or as extensions to the original application. In the remainder of this paper, we will focus on the archive management system, how archives are created and managed, recommendations on what attributes of archives need to be supported, and how archiving is accomplished. Further, since most archives are maintained on a tier of storage distinct from where it was created, we also discuss the tiered storage architectures that are used for archiving The Role of Archiving in ILM Archiving is an important aspect of an overall ILM strategy within the data center. Here we consider the ILM perspectives on data archiving for applications. Because ILM advocates the use of the most appropriate and cost-effective infrastructure, the area of archiving is a direct manifestation of ILM in practice. Archiving specific records 6 implies that a certain set of records are not expected to be accessed frequently, and therefore, it is appropriate to locate those records, by 6 In this paper, for simplicity, records and messages and the associated attachments are used interchangeably Storage Networking Industry Association Page 6 of 6 whatever means is most effective, on lower cost and lower access performance storage devices when different tiered storage is available. ILM is usually associated with the use of tiered storage, and therefore relies on migration techniques and tools used in HSM to move data between the storage tiers. By moving a portion of the records to lower performance storage, better utilization of primary storage is possible across all active data applications. Another concomitant benefit is that by reducing the size of the records under the application s native database, in the case of Microsoft Exchange and Lotus Domino, the performance of the server will also be improved. There are therefore a number of areas of consideration in the archiving process within the context of ILM: 1) The use of tiered storage that provides variable performance, scale of storage and cost options. 2) The archival process that includes the selection of records, set by policy, that need to be moved from one storage tier to another, and the process that moves records between tiers of storage 3) The management of the archived , including the retention, access, security and integrity. We note that HSM and Archive Management share the first two properties listed above. However, as noted earlier, data that is moved for creation of an archive may actually be a copy of the data still under application control, unlike in the HSM case where there is only one copy of the data. Data migration for archiving can be driven by some of the same criteria as HSM. However, archiving may also be driven by additional criteria, such as those specified by compliance needs. Archiving is not intended for protection of primary operational data. In many cases, the archive may be the only instance of the original data. Therefore, it is a requirement to protect the archive data and metadata. More discussion on protecting the archive is provided in Section 4 on Archive Management Attributes Storage Networking Industry Association Page 7 of 7 2. Storage Architectures for Archiving A key aspect to archiving for long-term data is the use of tiered storage. Tiering refers to use of storage systems of different performance, scale and cost. It is not necessary that archive data must use tiered storage. However, since archive data is a growing data asset and not frequently accessed as operational data, use of tiers of storage is usually preferred to reduce cost of storing and managing the archive. This aspect of using tiered storage is common with HSM (Section 1.3). Beyond HSM, archive management has many other issues to consider. The following characterize archive storage architectures: The first tier uses a high performance disk storage system for operational data access A secondary tier is used to create a lower overall cost per unit data with different performance characteristics, as well as highly scalable storage capacity. A tertiary tier may also be used if better performance versus capacity are possible Archive management that includes retention, compliance, and security must be maintained for the archive across all tiers Given the plethora of storage systems, there are many different storage tier combinations that can be created. These include primary disk (e.g., Fibre Channel), secondary disk that uses lower cost lower-performance (e.g., ATA) disk systems of different scale, size and cost, and tape systems such as automated tape libraries. Different combinations of tiered storage are required when considering different criteria which include cost, scale, performance, compliance support, etc. From a connectivity perspective, we broadly classify tiered storage into two categories: two-tier and three-tier architectures As mentioned earlier, these tiered architectures apply to both archiving as well as HSM Two-Tier Storage Architecture Figure 2. Two-Tier Storage Architecture Figure 2 shows two instances of two-tiered storage architecture: disk to tape (D2T), and disk to disk (D2D) where the second disk can be implemented using different technologies such a disk arrays, NAS, CAS (content addressable storage) or MAID (massive array of idle disks) Storage Networking Industry Association Page 8 of 8 Disk to Tape (D2T) is the traditional tiered storage architecture, as in historical HSM (Figure 1A). The bulk of the data is maintained on the tape, which might comprise an automated tape library and vaulted tape media. The D2T architecture is considered most suitable where long access times to retrieve archive data are acceptable, i.e., very low activity archives. Tape also provides a high density of storage in footprint terms and is usually the lowest cost media compared to traditional disk storage. Disk to Disk (D2D) is an emerging storage architecture (Figure 1B). It allows much higher performance than D2T. Its cost is usually higher than tape but lower than enterprise disk. There are a number of possible disk-based appliances that can be used as the secondary disk tier. These include: Using standard RAID array for fast access at the block level Using NAS for file-based access Using CAS to access data by content Using MAID storage [4,5] that can be accessed in any of the above presentations, disk, NAS or CAS, as well as virtual tape Besides differences in presentation, each of the disk-based appliances provides different performance, cost, scale and accessibility features. Regardless of implementation, D2D is the preferred tiered storage architecture for active archives because of inherent performance advantages over tape. Best Practice Considerations Two-tiered storage architectures provide a simpler data management architecture, compared to the three-tiered model, requiring the fewest data movements. The choice of whether to use tape or disk is dictated by needs of cost and performance. Archive management for retention and compliance is a feature orthogonal to the tiered storage architecture, and needs to be provided by an archive management function that can be either embedded external to the storage or can be associated with the tiered storage system. More detail on the functionality of the Archive Manager is provided in Section Three-Tier Storage Architecture Figure 3 shows a typical instance of the three-tiered storage architecture: disk to staging disk to tape (D2D2T). As before, the first tier enterprise disk is for operational data. The motivation for using two secondary storage tiers is to more cost-effectively scale storage capacity Storage Networking Industry Association Page 9 of 9 Figure 3. Three-Tier Storage Architecture The most common three-tier architecture uses secondary disk for an intermediate stage to tape. Data placed on the secondary disk can be accessed faster than from tape, if the retrieval is expected to be from the most recently archived data. This configuration is also typically used for backup and recovery storage where there is evidence of temporal locality in restores. Because the scale, capacity and cost of the staging disk are not competitive with that of tape, when cost is a consideration, bulk of the archive is kept on tape. Periodic migration
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks