Hierarchical storage management
Encyclopedia
Hierarchical storage management (HSM) is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

 stored) than slower devices, such as optical disc
Optical disc
In computing and optical disc recording technologies, an optical disc is a flat, usually circular disc which encodes binary data in the form of pits and lands on a special material on one of its flat surfaces...

s and magnetic tape drive
Tape drive
A tape drive is a data storage device that reads and performs digital recording, writes data on a magnetic tape. Magnetic tape data storage is typically used for offline, archival data storage. Tape media generally has a favorable unit cost and long archival stability.A tape drive provides...

s. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into cache
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

s for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.

In a typical HSM scenario, data files which are frequently used are stored on disk drives, but are eventually migrated to tape if they are not used for a certain period of time, typically a few months. If a user does reuse a file which is on tape, it is automatically moved back to disk storage. The advantage is that the total amount of stored data can be much larger than the capacity of the disk storage available, but since only rarely-used files are on tape, most users will usually not notice any slowdown.

HSM is sometimes referred to as tiered storage.

HSM (originally DFHSM, now DFSMShsm) was first implemented by IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

 on their mainframe computer
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...

s to reduce the cost of data storage, and to simplify the retrieval of data from slower media. The user would not need to know where the data was stored and how to get it back; the computer would retrieve the data automatically. The only difference to the user was the speed at which data was returned.

Later, IBM ported HSM to its AIX operating system
AIX operating system
AIX AIX AIX (Advanced Interactive eXecutive, pronounced "a i ex" is a series of proprietary Unix operating systems developed and sold by IBM for several of its computer platforms...

, and then to other Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....

 operating systems such as Solaris, HP-UX
HP-UX
HP-UX is Hewlett-Packard's proprietary implementation of the Unix operating system, based on UNIX System V and first released in 1984...

 and Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

.

HSM was also implemented on the DEC VAX/VMS
OpenVMS
OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is a computer server operating system that runs on VAX, Alpha and Itanium-based families of computers. Contrary to what its name suggests, OpenVMS is not open source software; however, the source listings are available for purchase...

 systems and the Alpha/VMS systems. The first implementation date should be readily determined from the VMS System Implementation Manuals or the VMS Product Description Brochures.

Recently, the development of Serial ATA
Serial ATA
Serial ATA is a computer bus interface for connecting host bus adapters to mass storage devices such as hard disk drives and optical drives...

 (SATA) disks has created a significant market for three-stage HSM: files are migrated from high-performance Fibre Channel
Fibre Channel
Fibre Channel, or FC, is a gigabit-speed network technology primarily used for storage networking. Fibre Channel is standardized in the T11 Technical Committee of the InterNational Committee for Information Technology Standards , an American National Standards Institute –accredited standards...

 Storage Area Network
Storage area network
A storage area network is a dedicated network that provides access to consolidated, block level data storage. SANs are primarily used to make storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear like locally attached devices...

 devices to somewhat slower but much cheaper SATA disks arrays totalling several terabytes or more, and then eventually from the SATA disks to tape.

The newest development in HSM is with hard disk drives and flash memory
Flash memory
Flash memory is a non-volatile computer storage chip that can be electrically erased and reprogrammed. It was developed from EEPROM and must be erased in fairly large blocks before these can be rewritten with new data...

, with flash memory being over 30 times faster than disks, but disks being considerably cheaper.

Conceptually, HSM is analogous to the cache
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

 found in most computer CPUs
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

, where small amounts of expensive SRAM
Static random access memory
Static random-access memory is a type of semiconductor memory where the word static indicates that, unlike dynamic RAM , it does not need to be periodically refreshed, as SRAM uses bistable latching circuitry to store each bit...

 memory running at very high speeds is used to store frequently used data, but the least recently used
Cache algorithms
In computing, cache algorithms are optimizing instructions – algorithms – that a computer program or a hardware-maintained structure can follow to manage a cache of information stored on the computer...

 data is evicted to the slower but much larger main DRAM
Dram
Dram or DRAM may refer to:As a unit of measure:* Dram , an imperial unit of mass and volume* Armenian dram, a monetary unit* Dirham, a unit of currency in several Arab nationsOther uses:...

 memory when new data has to be loaded.

In practice, HSM is typically performed by dedicated software, such as IBM Tivoli Storage Manager
IBM Tivoli Storage Manager
IBM Tivoli Storage Manager is a centralized, policy-based, enterprise class, data backup and recovery package. The software enables the user to insert objects not only via backup, but also through space management and archive tools...

, CommVault http://www.commvault.com/products-archive.html, VERITAS
VERITAS Software
Veritas Software Corp. was an international software company that was founded in 1983 as Tolerant Systems, renamed Veritas Software Corp. in 1989, and merged with Symantec in 2005. It was headquartered in Mountain View, California...

 Enterprise Vault, Sun Microsystems SAMFS/QFS, Quantum StorNext
StorNext File System
StorNext File System is a shared disk file system made by Quantum Corporation. It is installed on hosts that are connected to the same disk array in a storage area network . Client systems are not required to run the same operating system to access a shared filesystem containing StorNext data...

, or EMC Legato OTG DiskXtender
EMC Corporation
EMC Corporation , a Financial Times Global 500, Fortune 500 and S&P 500 company, develops, delivers and supports information infrastructure and virtual infrastructure hardware, software, and services. EMC is headquartered in Hopkinton, Massachusetts, USA.Former Intel executive Richard Egan and his...

.

Use Cases

HSM is often used for deep archival storage of data to be held long term at low cost. Automated tape robots can silo large quantities of data efficiently with low power consumption.

Some HSM software products allow the user to place portions of data files on high-speed disk cache and the rest on tape. This is used in applications that stream video over the internet -- the initial portion of a video is delivered immediately from disk while a robot finds, mounts and streams the rest of the file to the end user. Such a system greatly reduces disk cost for large content provision systems.

Tiered storage

Tiered storage is a data storage
Data storage device
thumb|200px|right|A reel-to-reel tape recorder .The magnetic tape is a data storage medium. The recorder is data storage equipment using a portable medium to store the data....

 environment consisting of two or more kinds of storage delineated by differences in at least one of these four attributes: Price, Performance, Capacity and Function.

Any significant difference in one or more of the four defining attributes can be sufficient to justify a separate storage tier.

Examples:
  • Disk
    Hard disk
    A hard disk drive is a non-volatile, random access digital magnetic data storage device. It features rotating rigid platters on a motor-driven spindle within a protective enclosure. Data is magnetically read from and written to the platter by read/write heads that float on a film of air above the...

     and Tape
    Tape drive
    A tape drive is a data storage device that reads and performs digital recording, writes data on a magnetic tape. Magnetic tape data storage is typically used for offline, archival data storage. Tape media generally has a favorable unit cost and long archival stability.A tape drive provides...

    : Two separate storage tiers identified by differences in all four defining attributes.
  • Old technology disk and new technology disk: Two separate storage tiers identified by differences in one or more of the attributes.
  • High performing disk storage and less expensive, slower disk of the same capacity and function: Two separate tiers.
  • Identical Enterprise class disk configured to utilize different functions such as RAID
    RAID
    RAID is a storage technology that combines multiple disk drive components into a logical unit...

     level or replication: A separate storage tier for each set of unique functions.


Note: Storage Tiers are NOT delineated by differences in vendor, architecture, or geometry except where those differences result in clear changes to Price, Performance, Capacity and Function.

See also

  • Archive
    Archive
    An archive is a collection of historical records, or the physical place they are located. Archives contain primary source documents that have accumulated over the course of an individual or organization's lifetime, and are kept to show the function of an organization...

  • Automated Tiered Storage
    Automated Tiered Storage
    Automated Tiered Storage is the automated progression or demotion of data across different tiers of storage devices and media. This movement of data is automatic to the different types of disk according to performance and capacity requirements....

  • Backup
    Backup
    In information technology, a backup or the process of backing up is making copies of data which may be used to restore the original after a data loss event. The verb form is back up in two words, whereas the noun is backup....

  • Computer data storage
  • Data proliferation
    Data proliferation
    Data proliferation refers to the prodigious amount of data, structured and unstructured, that businesses and governments continue to generate at an unprecedented rate and the usability problems that result from attempting to store and manage that data...

  • Disk storage
    Disk storage
    Disk storage or disc storage is a general category of storage mechanisms, in which data are digitally recorded by various electronic, magnetic, optical, or mechanical methods on a surface layer deposited of one or more planar, round and rotating disks...

  • Information Lifecycle Management
    Information Lifecycle Management
    Information Lifecycle Management refers to a wide-ranging set of strategies for administering storage systems on computing devices. Specifically, four categories of storage strategies may be considered under the auspices of ILM.-Policy:...

  • Information repository
    Information repository
    An information repository is an easy way to deploy a secondary tier of data storage that can comprise multiple, networked data storage technologies running on diverse operating systems, where data that no longer needs to be in primary storage is protected, classified according to captured metadata,...

  • Magnetic tape data storage
    Magnetic tape data storage
    Magnetic tape data storage uses digital recording on to magnetic tape to store digital information. Modern magnetic tape is most commonly packaged in cartridges and cassettes. The device that performs actual writing or reading of data is a tape drive...

  • Repository (disambiguation)
  • Storage virtualization
    Storage Virtualization
    Storage virtualization or storage virtualisation is a concept and term used within computer science. Specifically, storage systems may use virtualization concepts as a tool to enable better functionality and more advanced features within the storage system.Broadly speaking, a 'storage system' is...


Implementations

  • DFSMShsm for z/OS
  • Atempo Digital Archive (ADA) (HSM available on Windows
    Microsoft Windows
    Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

    , Mac
    Macintosh
    The Macintosh , or Mac, is a series of several lines of personal computers designed, developed, and marketed by Apple Inc. The first Macintosh was introduced by Apple's then-chairman Steve Jobs on January 24, 1984; it was the first commercially successful personal computer to feature a mouse and a...

    , Linux
    Linux
    Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

    , UNIX
    Unix
    Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

     featuring end-user initiated Archive) by Atempo
  • http://cern.ch/castorCASTOR by CERN
    CERN
    The European Organization for Nuclear Research , known as CERN , is an international organization whose purpose is to operate the world's largest particle physics laboratory, which is situated in the northwest suburbs of Geneva on the Franco–Swiss border...

    ]
  • IBM Tivoli Storage Manager
    IBM Tivoli Storage Manager
    IBM Tivoli Storage Manager is a centralized, policy-based, enterprise class, data backup and recovery package. The software enables the user to insert objects not only via backup, but also through space management and archive tools...

     for Space Management (HSM available on UNIX
    Unix
    Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

     (AIX, HP UX, Solaris) & Linux
    Linux
    Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

    )
  • IBM Tivoli Storage Manager HSM for Windows formerly OpenStore for File Servers (OS4FS) (HSM available on Microsoft Windows Server)
  • HPSS
    High Performance Storage System
    High Performance Storage System is a flexible, scalable policy-based Hierarchical Storage Management product developed by IBM in collaboration with five DOE National Labs...

     by IBM
    IBM
    International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

  • QuickSilver by IBM
    IBM
    International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

  • OpenArchive OpenSource Archive Manager by GRAU DATA for Windows/Linux
  • GRAU Archive Manager (GAM) by GRAU DATA for Windows/Linux
  • Automated Tiered Storage (SmartMove) by Enigma Data Solutions for Windows/Linux
  • DataGlobal ERS
  • EMC
    EMC Corporation
    EMC Corporation , a Financial Times Global 500, Fortune 500 and S&P 500 company, develops, delivers and supports information infrastructure and virtual infrastructure hardware, software, and services. EMC is headquartered in Hopkinton, Massachusetts, USA.Former Intel executive Richard Egan and his...

     DiskXtender
    DiskXtender
    EMC DiskXtender is an automated, policy-based, file system-centric solution for migrating inactive data off higher-cost storage to lower-cost disk, tape, or optical devices ....

    , formerly Legato DiskXtender, formerly OTG DiskXtender
  • Caminosoft Managed Server by Caminosoft Corporation for Netware/Windows/Linux
  • Moonwalk
    Moonwalk (software)
    Moonwalk is Enterprise HSM data management software developed by Moonwalk Inc.-Overview:The Moonwalk software allows scheduled data copy, move and migration of files. Files to be actioned are can be classified by location, name, size, age, owner and/or attributes.File migration to archival storage...

     http://www.moonwalkinc.com/ Moonwalk (Columbia and Eagle), for NetWare/Windows/Linux
  • SAM-QFS
    QFS
    QFS is an open source filesystem from Sun Microsystems. It is tightly integrated with SAM, the Storage and Archive Manager, and hence is often referred to as SAM-QFS. SAM provides the functionality of a Hierarchical Storage Manager....

  • StorFirst EAS (Enterprise Archival Software) by Seven10 Storage Software
  • CommVault QiNetix DataMigrator
  • Remote Storage Services by Microsoft (available on Server 2000 and Server 2003 only)
  • SGI
    Silicon Graphics
    Silicon Graphics, Inc. was a manufacturer of high-performance computing solutions, including computer hardware and software, founded in 1981 by Jim Clark...

     DMF http://www.sgi.com/products/storage/software/dmf.html (Data Migration Facility)
  • Symantec
    Symantec
    Symantec Corporation is the largest maker of security software for computers. The company is headquartered in Mountain View, California, and is a Fortune 500 company and a member of the S&P 500 stock market index.-History:...

     VERITAS
    VERITAS Software
    Veritas Software Corp. was an international software company that was founded in 1983 as Tolerant Systems, renamed Veritas Software Corp. in 1989, and merged with Symantec in 2005. It was headquartered in Mountain View, California...

     Enterprise Vault (KVS acquisition and Veritas acquisition)
  • QStar
    QStar Technologies Inc
    QStar Technologies was founded in 1987 in Bethesda, Maryland by Brian Swafford. Initially designed to support large format optical drives on the SunOS platform, Qstar Technologies produce an array of software and hardware for storage management solutions on 19 different operating systems, Windows,...

     http://www.QStar.com/pro2.html Network Migrator - Server and Client based Data Migrator for Windows, Linux, UNIX and Mac
  • Quantum StorNext Storage Manager http://www.quantum.com/stornext/
  • Compellent
    Compellent Technologies
    Originally named Compellent Technologies, Inc , founded in 2002, is a global provider of enterprise storage systems that gained recognition for automating data movement at the block level. The company is headquartered in Eden Prairie, Minnesota, USA...

     Data Progression Automated Tiered Storage
    Automated Tiered Storage
    Automated Tiered Storage is the automated progression or demotion of data across different tiers of storage devices and media. This movement of data is automatic to the different types of disk according to performance and capacity requirements....

     http://www.compellent.com
  • Hitachi Data Systems
    Hitachi Data Systems
    Hitachi Data Systems is a company providing mid-range and high-end storage systems, software and services. It is a wholly owned subsidiary of Hitachi Ltd. and part of the Hitachi Information Systems & Telecommunications Division....

     HNAS Intelligent Tiering http://www.hds.com
  • Hitachi Data Systems
    Hitachi Data Systems
    Hitachi Data Systems is a company providing mid-range and high-end storage systems, software and services. It is a wholly owned subsidiary of Hitachi Ltd. and part of the Hitachi Information Systems & Telecommunications Division....

     HDT ( Hitachi Dynamic Tiering)
  • Zarafa (software)
    Zarafa (software)
    Zarafa is the name of an European open source collaborative software, developed in Delft, the Netherlands. The company that develops Zarafa, previously known as Connectux, is also called Zarafa. The Zarafa groupware provides email storage on the server side and brings its own Ajax-based mail client...

    Zarafa Archiver (component of ZCP)
  • PoINT Software & Systems GmbH
  • OpenVMS
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK