Tuesday, August 24, 2010

EMC Centera/CAS/OBS

What is EMC Centera?

The EMC Centera is the world's first magnetic hard disk-based WORM data storage device, providing Write Once Read Many functionality to applications that require data to be stored on a non-rewriteable, non-erasable storage medium.

By using traditional magnetic hard disks as its storage medium, the EMC Centera offers greater performance over other archival media types such as optical and tape. EMC developed the Centera to address the storage of fixed content data, the fastest growing data type today.

What is Fixed Content Data?

Fixed content data is any digital asset that is created once, never modified, and must be retained for reference throughout its life cycle and retention period as required by regulatory agencies.

The recent explosion in the creation of fixed content has created a demand for a new category of storage devices designed to provide fast, secure on-line access to this data with long-term availability. The EMC Centera represents this new category of data storage devices known as Content Addressed Storage (CAS).

What is Content Addressed Storage?

Content Addressed Storage is a method of data storage that stores and retrieves a data object by its content address within the storage system, rather than by its actual file name at some physical location.

The benefit of a content addressable approach to storage is that an object is stored in such a way that it is authenticated and unalterable. In addition, objects cannot be deleted prior to the expiration of its defined retention period.

 

How Content Addressed Storage Works

When an application delivers a data object to the EMC Centera, the API calculates a 128-bit “claim check” that is uniquely derived from the objects binary representation. The metadata for the object, which includes filename, creation date, etc., is inserted into an XML file called a C-Clip Descriptor File (CDF), which in turn has its content address calculated. The Centera repository then stores the object and a mirror copy.

Once two copies of the object and CDF are stored in the repository, the Content Address is returned to the application. Future access to the data object occurs when the application submits the CDFs Content Address to the Centera repository via the API. The data is then returned back to the application. The Centera file system architecture eliminates directory structures, pathnames, and URL references to filenames and only uses the C-Clip Content Address as a reference.

How EMC Centera Provides WORM Functionality

The C-Clip Content Address of a data object assures the authenticity of that object. If an object is retrieved and altered, the Centera API produces a new CDF with a new content address for the altered object. The original object remains in its original form at its original content address and is still accessible by its original address.

This feature of Centera provides a level of versioning integrity that standard file servers and operating systems cannot provide. Additionally, Centera features an operational mode where an object cannot be deleted prior to the expiration date of a defined retention period. These non-rewriteable and non-erasable properties of the EMC Centera give the Write Once Read Many attributes required for compliance with SEC 17a-4, Sarbanes-Oxley, HIPAA, FDA, and many others.

Centera Hardware Architecture

Comprised of Redundant Arrays of Independent Nodes (RAIN), every node in a Centera contains a CPU, network interface, and 3TB of raw storage, and is interconnected with all other nodes in the cabinet via a private LAN. Each node executes an instance of CentraStar, the Centera operating software, in one of two operational modes to act as either a storage node or an access node.

The storage nodes provide the physical storage of data objects and C-Clip Descriptor Files and the access nodes provide the means for interaction between the application server and the storage nodes. Throughput and storage requirements of the application will determine how many access nodes vs. storage nodes must be configured at the time of installation of the EMC Centera.

Fault Tolerance

The EMC Centera is based around a “no single-point-of-failure” platform and can be serviced in a non-disruptive manner. Every component of the Centera has built-in redundancy. This includes hard drives, power supplies, AC power connectors, cooling fans, and network adapters and associated cable interconnects.

When drives fail, CentraStar, the Centera operating software, will transparently remove them from the cluster. Objects are regenerated from the current mirror to a new mirror to ensure that a fully redundant mirror copy of the content is always available. Data integrity checking runs in the background and continuously recalculates the content addresses of all the objects and compares the calculations to the content addresses originally stored in the C-Clip Descriptor File.
Scalability

A single Centera 19” rack cabinet can hold 8, 16, 24, or 32 nodes to provide 5.4TB – 43.2TB in mirrored mode, or 18.3TB – 73.4TB in parity-protected mode. For scalability beyond 73.4TB, multiple Centera cabinets can be configured as a single cluster, offering hundreds of Terabytes of total storage capacity in a single Centera storage pool.

When new storage units are added to the cluster and powered on, they are automatically “auto-discovered” and join the cluster. No reconfiguration or downtime is necessary to add capacity.

No comments:

Post a Comment