A Terabyte on Your Keychain
SSDs generate new products, double-digit growth and corporate acquisitions
It is now possible to purchase a one terabyte USB stick! The ability for well-heeled individuals to carry a terabyte of information on the same keychains holding their car keys paints a dramatic picture of the penetration solid-state drive (SSD) storage is going to make in the consumer space in 2013. In contrast to double-digit SSD product growth, the latest quarterly reports from the two major hard disk manufacturers show that demand for spinning disk technology remains flat. Consumers are buying trendy new SSD products while corporations are busily placing big bets on storage technology acquisitions. Overall, SSD market revenues are expected to more than double in 2013.
In addition to capacity, the USB 3.0 specifications of this somewhat bulky terabyte USB stick look great. With a 240 MB/s read speed and 160 MB/s write speed, this gadget has performance comparable to or exceeding many home and small business multi-disk RAID systems. Even better, the random IO performance lies far beyond that of even very high-end spinning disk devices. The convenience is extraordinary. Imagine plugging a terabyte of music into the USB port in your car. Going on a long car trip? No problem! Have a big HPC data set? Overnight mail will beat the Internet.
Reliability is a concern with any large storage device. Most people now trust the SSD storage in their cellphones and tablets to hold valuable personal information and irreplaceable personal photos. With flash devices, reliability and performance is all about the on-memory controller. Given the limited number of write cycles for NAND-based flash memory, the responsibility for robustness falls to the internal on-device controller that looks for errors and performs write leveling. Write leveling maximizes storage life by ensuring that that all the flash memory is used equally, prolonging its life. How quickly the controller can select blocks has a big impact on write performance. Similarly, how well the controller can handle
errors gates the safety of your data on the device. With modern controllers, SSD devices make perfect sense for portable devices due to the power efficiency and lack of any moving parts.
Cost is now the major concern, given that most consumers are comfortable with the reliability of SSD storage. My August 2012 article, “Storage: The Computing Catalyst,” covered the implications of SSD storage crossing the dollar-per-gigabyte barrier for HPC, enterprise and individual consumers. These prices have continued to drop. For example, most bargain shopper computer enthusiasts frequently find their inbox contains promotions for SSD storage devices as low as $0.50 per gigabyte.
Storage manufacturers now speak of the personal cloud, or “the Connected Life,” as well as the enterprise cloud. What the personal cloud really means is connectivity and convenience at home, which is why purchases of network attached storage (NAS) and backup storage companies are a hot item in the technology news. NAS devices represent convenience and data accessibility. Anyone who has felt those heart-wrenching pangs of fear caused by a hard disk failure understands the need for a robust backup.
Ultimately, most users don’t want to manage their own storage hardware. Dropbox, Google and other Internet cloud storage providers understand this and offer products that guarantee the safety of a customer’s data while making it accessible 24/7 from anywhere. Customers don’t need to worry about pesky hardware failures and backup issues.
The market opportunity for direct connect and personal cloud devices rests in the inability of Internet storage providers to bridge that critical “last mile” bandwidth bottleneck to the home and business. (The last mile is typically the speed “bottleneck” to the home or small business that limits the bandwidth of data that can be delivered to the customer.) The rapidly increasing capacity of commodity storage devices keeps widening the last mile bandwidth gap for Internet cloud providers. As a result, more and more consumers are gravitating to direct connect and personal cloud storage devices. Many of these devices provide their owners with a purported secure, 24/7 access to their data from anywhere on the Internet, as well as giving tablets and cellphones the same speed and access to terabytes of data as their PCs. Basically, the personal cloud promises the safety and convenience of a big-name Internet cloud storage provider with unbeatable bandwidth at home or the office. Last-mile bandwidth limitations only affect data accessibility when using a personal cloud device over the Internet, which is generally not a big deal, as most data consumption occurs locally within the home or office.
The advent of USB 3.0 and Thunder interfaces that transfer hundreds of megabytes of data per second means that direct attached devices for tablet and laptop users will become commonplace. Terabyte flash devices mean that tablet (and even cell phone) users will be able to access more storage and with higher speed than most commodity hard-disk based “enthusiast” PCs by simply plugging in a cable. Connecting the dots, new short-range gigabit WiFi standards, such as 802.11ac, mean that future devices won’t even require connecting a cable.
Last-mile bandwidth challenges will continue to plague cloud-based storage providers in the near future. Once the last-mile bottleneck is eliminated, it is likely that the convenience and reliability of these services will make Internet storage solutions the ultimate winner for the mass market. At this time, last-mile bandwidth and cost limitations have created a huge opportunity for convenient, “personal cloud” storage solutions.
The January 2013 buying frenzy in the storage technology space reflects this movement in the industry, which should accelerate as companies jockey for position to meet an expected growth in consumer demand. It is interesting to speculate whether Google’s filing with the FCC to build an experimental wireless network at its headquarters in Mountain View is part of an effort to eliminate last-mile bandwidth issues.
For consumers, the initial purchase price is paramount. For the enterprise, the profit and loss (P&L) is a function of electricity cost. Hard disk technologies currently compete against SSDs by capacity. In this context, the new helium-filled hard disks are interesting, as they offer a 23 percent decrease in power consumption and the ability to add two additional platters to a standard 3.5-inch drive, which means individual drive capacities can approach 6 TB. This translates to denser, more efficient storage arrays that require less space, power and cooling. Even so, SSDs are more efficient in terms of power and performance. Thus, enterprise and HPC customers need to continue playing a balancing game to find the right ratio of SSD to storage for their users to maximize their performance per dollar investment.
“Big Data” is a significant driver for high capacity storage. Many companies boast about the many terabytes of storage they have at their facilities and how much data their archives hold. The general usage pattern for big data is twofold:
1. stream large data sets to extract a reduced set of relevant information,
2. work in a non-streaming fashion on the reduced set.
Streaming data is a solvable problem, because the bandwidth of multiple devices can be tied together to provide as much streaming bandwidth as desired. Various redundant array of independent disks (RAID) levels use this strategy to increase performance. Similarly, cluster file-systems, such as Lustre and GPFS utilize multiple file-system servers to increase storage bandwidth. Very simply, the idea is to eliminate storage as a bottleneck by creating a storage system wide enough so the aggregate bandwidth saturates either the data link or system processing capability. Streaming solutions work well. Companies such as Netezza capitalize on bandwidth aggregation with products for big data that wrap a database API around the ability to perform searches on streamed data.
Hard disks excel at delivering streaming bandwidth. That coupled with a $0.05 versus $0.50 to $1.00 cost-per-gigabyte makes them the only viable choice for large data stores at this time. Hopefully, this low-cost per gigabyte will continue as SSDs erode the market share of the disk manufacturers. Meanwhile, SSDs provide the ability to perform order-of-magnitude more random data accesses than hard disks, as well as high streaming bandwidth, which makes them ideal for solving the second part of the big data usage pattern.
The question for systems designers is how much of the big data workload is streaming versus random access? As discussed in my March 2013 article, “Caching in on Solid-state Storage,” tiered storage solutions attempt to provide a unified solution. In addition to big data workloads, these tiered solutions also work for generic workloads.
With terabyte SSD devices entering the mainstream, it is now easier and more cost-effective to overprovision with SSD cache than at any time in the past. In the near future, many home and small business user might find that an SSD-based RAID system might be all they need.
As these SSD capacity increases are incorporated into products, many enterprise and big data customers will eventually find that they rely more on solid-state cache and less on their hard disks. For example, I have been recently working on a social media analysis project using the new leadership-class TACC Stampede supercomputer. The data for this analysis came from a repository containing hundreds of terabytes of raw data. As it turned out, this repository actually contained around 200 gigabytes of relevant data. Thus, one streaming reduction operation created a data set that would easily fit in cache, or even on a somewhat bulky terabyte USB stick that can be carried on a keychain.
To reduce electricity costs, enterprise centers might institute policies and redesign applications to localize data access to one or a few directories. Not only can such policies increase the effectiveness of the SSD cache in tiered storage, but they also facilitate strategies used by tape storage archives, such as bulk transfers of entire directories. The idea is simple: turn on the big hard disks to perform a high-bandwidth streaming operation or bulk transfer of the relevant directories to SSD, then turn off the hard disk. This strategy bets that the storage requirements for many application workflows can be satisfied by keeping a few directories in active storage. Enforcing this into a data management policy can help profitability. The upside in the cost analysis is decreased electrical use. The downside lies in an increased cost resulting from hard disk failures and the addition of spin-up latency in application workflows.
Flash storage devices are only the start of the transition to solid-state storage. The market should start seeing products based on three-bit NAND this year. Controllers are coming out that will support magneto-resistive RAM (MRAM), which is faster than flash, has a longer endurance, is bit-addressable as opposed to NAND’s block addressability, and is non-volatile. Meanwhile, denser 20 nm NAND chips are becoming available that can double SSD capacity. These higher-density, higher-performance technologies will further accelerate changes in the storage market and the challenges of the last-mile bottleneck.
Rob Farber is an independent HPC expert to startups and fortune 100 companies, as well as government and academic organizations. He may be reached at editor@ScientificComputing.