The Role of CASL in HPE Nimble Storage Architecture
The Cache Accelerated Sequential Layout (CASL) architecture from Hewlett Packard Enterprise (HPE) is a log-structured file system (LFS). It’s the foundation of the HPE Nimble Storage technology because it has the best properties of spinning media for sequential I/O and flash media for random I/O. CASL provides Nimble Storage with benefits such as high performance, efficient capacity, integrated data protection and simplified management.
Each architectural layer of CASL provides specific benefits for Nimble Storage. Those layers are as follows:
- RAID: redundant array of inexpensive disks
- Segment
- LFS: log-structured file system
- BUS: block uniquifier store
- BI: block index
- VM: volume manager
- SCSI: Small Computer System Interface
This post describes each layer in the CASL architecture, along with their benefits and interaction with other layers.
RAID Layer
A redundant array of inexpensive disks (RAID) is a method of storing data with significantly greater performance and fault tolerance than earlier storage mechanisms. It’s based on the fact that different media types have different failure characteristics. For example, a drive that has a spinning disk can stop working suddenly when the motor fails. On the other hand, media like solid-state drives (SSDs) tends to experience gradually increasing failure rates over time due to unrecoverable read errors (UREs).
This behavior is more closely related to the size of the media than its wear level. As a result, the URE rate generally increases with media size, eventually necessitating a rebuild of the entire raid. This process can take several days to several weeks, depending on the size of the RAID’s drives.
Traditional RAID is poorly suited to very large drives because they can’t tolerate more UREs than the RAID’s parity count. For example, RAID 5 can withstand a single URE, while RAID 6 can tolerate two UREs in parallel. Standard triple-parity RAID can tolerate three parallel UREs, but Nimble Storage’s Triple+ Parity RAID has much greater resiliency as it remains robust when URE rates are very high.
Triple+ Parity RAID can tolerate one URE for every drive in the RAID group, even when the data no longer has parity. This feature means that a Triple+ Parity RAID group won’t suffer data corruption if it loses three drives and the remaining drives experience simultaneous UREs. In comparison, a RAID 5 group suffers from data corruption when it loses one drive and any of the remaining drives has a URE.
Segment Layer
CASL implements a logical segment layer that sits on top of the RAID layer. The segment layer divides the disk into contiguous physical spaces called slots. These slots map to segments, which may then map to full RAID stripes. The relationship between a segment and stripe is therefore a 1:1 mapping.
The segment layer exposes slots to the layers above it as logical segments. It also implements segment identifiers (IDs) that allow the segment layer to track and manage slots. It assigns these IDs to allocated segments in sequential order, beginning with zero. The segment layer can thus map slots on the disk with their segment IDs. However, CASL doesn’t perform in-place overwrites, so it can’t reuse segment IDs. This feature means that the segment layer can’t reuse segments until they’re reclaimed through garbage collection.
LFS Layer
The LFS layer sits on top of the segment layer, where it organizes variable-length user data blocks. This feature allows the system to quickly identify a block’s location within the LFS and reliably return that block to the requester. Once the LFS layer fills all the segments, it transfers them to persistent storage.
This layer also performs data reduction and data integrity services. For example, it identifies errors like lost or misdirected writes, which occurs when a drive incorrectly signals that it has written a block. The LFS layer can use checksums and block identifiers to determine if the disk has actually read the requested block from the correct location. You can also determine whether that block contains valid data, which is a critical data integrity check that few other storage platforms implement.
The protection that the LFS layer provides against insidious errors isn’t provided by standard RAID, even though they can be highly damaging to a storage system. For example, storage devices that perform de-duplication rarely rely on metadata tracking to do so, making it vital for a storage system to prevent metadata corruption.
BUS Layer
The block uniquifier store (BUS) layer tracks the location of blocks after the LFS layer reads and writes the blocks. This process requires the BUS layer to assign a unique identifier called an SBN, which is different from an offset or LUN ID. Block tracking also uses the disk index (DI), which is a highly optimized indexing mechanism that maps the SBN to an LFS location.
The DI also maintains additional metadata for the block, including the block checksum, fingerprint value and reference count. All-flash arrays (AFAs) cache the DI in memcache, which is an in-memory data structure that stores indexes. Hybrid systems cache the DI in flash cache, even though they also have memcache.
In addition to the SBN, the BUS layer also assigns each block a unique 256-bit fingerprint for the purposes of the de-duplication. CASL uses a two-layer design that includes both short and long fingerprints. It uses a short fingerprint to quickly identify candidates for de-duplication, which it verifies with a complete comparison using the long fingerprint. The BUS layer also updates the fingerprint index (FI) to reflect updates or deletions of blocks.
The two-layer fingerprint design makes fingerprint matching trivial because the BUS layer assigns SBNs sequentially. It also improves spatial locality for lookups because duplicate blocks are often adjacent to each other, typically as the result of data cloning. Furthermore, this design prevents data loss even when the FI is lost.
Additional benefits of the two-layer design include the lamination of constraints based on DRAM-to-capacity ratios, as is the case for competing solutions. This design also increases the capacity that each controller can support while requiring fewer controllers for scaling to full capacity. Furthermore, it reduces controller costs and passes those savings to the end user. The design also reduces DRAM requirements.
BI Layer
The block index (BI) layer helps ensure that data processes such as cloning, de-duplication and snapshots use available space efficiently. Read and write requests initially contain SCSI commands when they enter the system, as the block IDs are assigned later. These commands provide information such as logical block addresses, LUN IDs and offsets. The BI layer thus provides a data structure that it maintains for volume generation, which is a view of the volume for a specific point in time.
It accomplishes this by mapping a volume and offsetting it to the SBN of each volume in the system. It also allows data processes to share blocks, since it’s possible for logical block addresses (LBAs) from different volumes to point to the same SBN. The BI layer also serves as a dirty block buffer (DBB) that tracks data blocks in the non-volatile dual in-line memory module (NVDIMM). The DBB flushes dirty blocks to disk based on the number of incoming blocks and system-defined watermarks.
VM Layer
The volume manager (VM) layer manages processes that enable garbage collection (GC) such as system checkpoints and recovery workflows. It also maintains several metadata objects and provides striped volume functionality, allowing the system to stripe a volume across multiple Nimble Storage pools.
GC is the process of freeing contiguous segments of storage space. CASL always writes in full segments, but the number of available blocks in an allocated segment decrease over time as blocks are updated or deleted. It makes intelligent decisions about which segments to reuse during GC based on utilization and other information from the BUS layer. The system copies the blocks into a clean segment while maintaining their sequentially, meaning blocks that were created together are written together. CASL then marks the old segment as available for new writes.
CASL’s GC is one of the main differentiators with other implementations of LFS. This technique is very lightweight in terms of resource usage and is an industry unique feature. It ensures that free, contiguous space is always available to accommodate write requests, even at high-capacity utilization rates. In comparison, other LFS implementations fill in holes in the file system at random when utilization is high, resulting in a random I/O request for each read or write. This approach has a lower performance than CASL, including slower response times and higher CPU load.
SCSI Layer
Small Computer System Interface (SCSI) is a set of standards for transferring data between computers and peripheral devices through physical connections. The SCSI layer of the CASL architecture sits on top of the VM layer and implements SCSI commands to satisfy read and write requests from the host. It also communicates with iSCSI protocols and the Fibre Channel.
All incoming write requests in an AFA system must land in non-volatile random-access memory (NVRAM) or NVDIMM regardless of their I/O size. NimbleOS protects these requests by copying them to the secondary controller’s NVRAM/NVDIMM before allowing the host to access them. This operating system (OS) implements NVRAM as a byte-addressable NVDIMM-N, which is a persistent memory implementation from the Joint Electron Device Engineering Council (JEDEC).
NVDIMM-N uses DDR4 DRAM and low-latency flash memory to create a supercapacitor that provides persistent memory backup. This technology allows the system to safely offload memory to the onboard flash module. It can then restore DRAM contents from flash after power is restored. CASL also continuously checks the capacitance level of the super capacitor to ensure a charge is always available and provides an alert when the charge is insufficient to store memory.
The SCSI layer of the CASL architecture generally performs writes similarly on hybrid platforms, although there are some differences. For example, hybrid systems can immediately cache blocks in the SSD flash cache layer, depending on the system’s caching policies. Segment sizes are also different between AFA and hybrid platforms. The primary difference in the read paths between the two systems is that the hybrid path includes an additional cache index (CI), allowing the system to quickly identify cache blocks from the SSD flash layer and retrieve them.
Getting Started with HPE Nimble Storage
Comport is a cloud storage vendor with over 30 years of experience. We have certifications in the top technologies and have formed partnerships with major providers such as HPE. Get started on your storage refresh by requesting a Data Storage Assessment by ComportSecure.