New Algorithm May Help Data Centers Better Control Power Costs
New Algorithm May Help Data Centers Better Control Power Costs
Tackling green data center development challenges
From “afterthought” to “dominant factor,” data center energy management has soared in importance over the past two decades, says PBS Works/Altair Engineering CTO Bill Nitzberg. “For cloud providers, where profits depend on consolidating large numbers of users, power use is an even more dominant factor,” Nitzberg explains.
To keep power costs down and paying clients happy, a three-person international research team has developed — and tested — a straightforward yet novel algorithm that optimizes server operations by balancing power with performance.
“The development of energy-efficient ‘green’ data centers is a challenging problem,” team member Marios Dikaiakos, Ph.D., chairman of computer science and director of the Laboratory for Internet Computing at the University of Cyprus, writes in a new paper on the topic. “A typical server farm may contain thousands of servers which require large amounts of power to operate and keep cool.”
Minimizing energy consumption while meeting customer demands, the four-step algorithm finds “the number of operating servers that maximizes profit,” Dikaiakos explains.
“This research offers first steps toward improving one of the top overhead drains on data centers,” says Bryan Ward, who established and directs the cloud computing practice at global technology services giant Serco. “Energy efficiency, cooling and facility costs are the top three most significant profit-eroding factors data center operators face.”
That assessment agrees with Nitzberg’s experience. “Energy management similar to that described in this research has delivered as much as a 20 percent reduction in costs,” he explains.
|Figure 1: System model. Jobs whose average job size is 1=_ enter the system at rate _ and abandon the system at rate _ while waiting. Courtesy of M. Mazzucco, D. Dyachuk and M. Dikaiakos|
Most solutions to data center profit erosion optimize performance at the expense of power consumption or vice versa, Dikaiakos and his team discovered.
“For a hosting center, it is more important to fulfill service level agreements (the revenue makers), and then reduce energy consumption,” says team member Michele Mazzucco, Ph.D., a software engineering research fellow at the University of Tartu, Estonia. “But, most prior studies have tried to optimize energy first, even at a cost of performance degradation.”
Case in point: so-called “dynamic frequency scaling,” which raises or lowers CPU frequencies rather than turning servers on and off. Lower CPU frequencies consume less energy, but also degrade microprocessor speed.
Other approaches address performance, but fail to mitigate energy consumption. A well-known “queuing model,” for instance, lines up customers for server use based on service level agreements, but doesn’t account for energy waste, says Mazzucco.
“During system reconfigurations, servers consume energy without producing any revenue,” he explains. “Powering servers on and off is not instantaneous. Hence, there is some waste.”
That’s “astute” thinking, says Serco’s Ward, noting that server control is a careful balance between downtime and service.
Another way of looking at the issue, however, suggests that no server should ever be idle. Data centers should simply do a better job using their assets — a maxim that seems to guide Amazon.com, says Bassam Tabbara, CTO and co-founder of cloud storage provider Symform.
“Amazon’s spot instances enable customers to bid on the use of underutilized resources in their data center,” Tabbara explains. “In fact, it’s rumored Amazon has a policy against turning off servers.”
Data center providers measure server farm performance as average revenue earned per unit time, a function of several complex variables including power consumption, job income and one of the greatest uncertainties server farmers face: fluctuations in customer traffic.
At online encyclopedia Wikipedia, for instance, incoming traffic changes as much as 70 percent per day, according to a 2009 study published in the journal Computer Networks. Idle servers waiting for customers to return consume just 35 percent less power than their active cohorts. If service providers could shut them down in the interim, they could save energy.
However, predicting when traffic will return is an imprecise science with a bottom-line wallop. A one-second downloading delay for an online retailer grossing $37 million annually can cost $2.5 million in lost sales, according to a study by Web performance analyst Gomez.com. Google reports that an extra half-second delay in search page generation can set off a 20 percent traffic drop.
In short, clients are in a hurry, putting the onus on providers to “ensure the time users wait does not exceed their patience,” explains team member Dmytro Dyachuk, a University of Saskatchewan, Canada doctoral student specializing in “energy aware” cloud computing.
Satisfying impatient customers is critical to performance, which makes a measure of customer patience — the so-called “probability of abandonment” — a critical performance measure Dyachuk and colleagues incorporate into their algorithm.
The probability of abandonment depends on two different cases: all jobs in the system are served without any lines, making the probability of abandonment zero; or all servers are busy and some jobs are waiting in a line, or queue, making it probable customers may abandon ship.
Algorithm-optimized servers that considered the probability of abandonment performed well under the team’s tests. As the number of jobs and queue length steadily increased, revenues increased. But, more significantly, the percent of customers abandoning their jobs actually decreased.
That’s no surprise to Serco’s Ward, who says he expects the algorithm to work well under scenarios the researchers considered. Simulated scenarios, however, don’t fully account for evolving consumer demands and growing businesses.
“A more cognitive control approach, such as a neural net capable of learning complex and evolving patterns, could yield better results,” Ward explains. “And, while the paper’s simplified problem scope enables a valuable proof of concept, a scalable solution will be the key to real-world success.”
Lighting the LAMP
In energy consumption simulations on a data center equipped with 250 2-Gb RAM servers running Xeon Dual Core CPUs, the algorithm out-performed other power conservation approaches.
With Tsung — an open source load measurement tool — and Wordpress software running on a LAMP (Linux, Apache, MySQL and PHP) stack, the researchers analyzed blog reader behavior — from navigation to key word search — and load fluctuation, as the number of blog readers increased and decreased.
They calculated the average power consumed per unit time — a function of idle servers, active servers and how many users occupied the system — and developed metrics to measure wasted energy and the cost of wear and tear during on/off cycles.
“Hardware components tend to degrade faster with frequent power on/off cycles than with continuous operation,” Mazzucco explains.
Along with fixed parameters such as electricity cost — $0.1 per kilowatt-hour — and an average job size — 0.1 seconds — the team assumed their server farm had a power usage effectiveness (PUE) of 1.7. PUE is the primary metric used to evaluate server farm efficiency and “1.7 is the average for state of the art data centers,” Mazzucco explained.
Varying the job rate from 1,000 to 11,000 jobs/second and the total job load per every 16.5 hour run from 119 to 653 million jobs, the allocation algorithm “produced revenues that grew with the load” while “reducing the provider’s electricity bill,” the team reported.
Those positive results represent a “solid start at managing data center costs,” Serco’s Ward explains. But, until “server hardware design matures to allow for finer power management control,” he expects data center operators to invest in tighter controls over server cooling and “green” improvements to data center buildings.
Mike Martin is a science and technology writer based in Columbia, MO. He may be reached at editor@ScientificComputing.com.