Future Tech

Rising costs biggest issue for datacenter operators as demand grows

Tan KW
Publish date: Wed, 31 Jul 2024, 05:54 AM
Tan KW
0 460,799
Future Tech

Datacenter operators face multiple challenges such as power and cooling requirements, while staffing issues persist and many are not tracking the right sustainability metrics. On the plus side, they can count on strong and growing demand for digital services.

Owning and operating datacenters has become increasingly expensive, with prices historically high for energy, equipment, labor, construction, and infrastructure upgrades, according to Uptime Institute.

Uptime notes that the more operators learn about AI, the less they trust the technology. It says that part of the problem is likely due to a media focus on some highly publicized failures of GenAI systems. Surely not...

In its 14th annual Global Data Center Survey [PDF], Uptime finds that cost is the primary concern for bit barn operators, driven by high prices (which in turn were driven by high inflation) and the need to continue to invest in infrastructure to meet demand.

Cost came ahead of other issues such as forecasting future capacity and improving energy performance, with 44 percent of respondents indicating they were very concerned and another 36 percent somewhat concerned.

However, Uptime also reports findings that may surprise some. It says that while AI and high-performance compute workloads attract significant media attention, their broader impact on the datacenter industry will take time to materialize.

It also says that while server rack power densities are increasing to support more powerful hardware running more demanding workloads such as AI training, the average remains below 8 kW. Most facilities do not have racks above 30 kW, and those that do have only a few, it maintains.

In other words, while operators like Digital Realty have been offering 70 kW racks since last year, and Nvidia's rack-scale DGX GB200 NVL72 AI monster needs 120 kW, these are still largely the exception. This is expected to change in coming years, Uptime adds.

Meanwhile, Uptime found that the number of operators that have experimented with AI themselves to automate tasks and improve efficiency has increased, and includes some of the world's largest colocation companies.

But while 91 percent of survey respondents believe that AI will be widely used internally in the datacenter within the next five years, trust in AI for operational decision-making has actually gone down for the third year in a row. Uptime found that 42 percent of respondents indicated they would not trust AI to make operational decisions.

Some reasons for this growing negative view of AI include the lack of transparency and accountability, potential cybersecurity risks introduced by additional network connections, and the risk that AI-based control mechanisms introduce additional points of failure.

Uptime notes that the more operators learn about AI, the less they trust the technology. It says that part of the problem is likely due to a media focus on some highly publicized failures of GenAI systems. Surely not.

When it comes to industry benchmarks and metrics, the shortcomings of PUE (power usage effectiveness) have long been recognized, but it remains the standard for gauging facility energy efficiency.

In the 2024 survey results, the industry average PUE of 1.56 points to "a continuing trend of inertia," Uptime says, though it conceals changes happening under the surface such as newer facility and equipment designs that have demonstrated greater efficiency gains. One reason for this is a large number of ageing, legacy bit barn facilities.

Following rapid improvements in the average PUE between 2007 and 2014, progress stalled as the ratio approached 1.5, even though data dormitory designs have not approached physical limits of efficiency, Uptime bemoans. Many recent builds have achieved a PUE of 1.3 or lower, and it is hoped the average will nudge down over time.

The report notes that the majority of operators are able to report on just two well-established metrics - power consumption and PUE - but says these are not adequate to track progress towards sustainability.

This is concerning because metrics such as water usage, renewable energy consumption, and Scope 1, 2, and 3 carbon dioxide emissions will be required by regulations that are pending or already passed.

As an example, the EU Energy Efficiency Directive (EED) will soon require operators to report renewable energy consumption and water usage, but these are currently collected by only about 40 percent of respondents to the survey.

Only a minority of organizations report on carbon emissions, despite this being required under climate reporting laws in the EU, UK, many Asian countries, and parts of the US. Uptime says that the majority of operators do not appear to have the data to make such submissions or substantiate their corporate net-zero goals.

As an aside, Uptime notes that some operators that do report Scope 3 emissions have made a striking observation: Scope 3 emissions - from the supply chain and other indirect sources - can be as large or even larger than emissions from Scopes 1 and 2. This is an unpalatable fact that Microsoft has discovered to its cost.

Meanwhile, Uptime reports that while outages are becoming more widespread and have a greater impact in absolute terms, the rise is less than the rate at which IT capacity itself is growing, and therefore reliability is actually increasing.

In other words, while large outages grab the headlines and have major consequences for the organizations involved (think CrowdStrike), Uptime's surveys indicate that most outages have a limited impact. Of those operators that experienced an outage, only 9 percent ranked it as either serious or severe, and this is the lowest so far seen, Uptime says.

But when major outages do occur, they can be very costly (think CrowdStrike). According to Uptime, 54 percent of respondents say their most recent significant outage cost more than $100,000, while there has also been an increase in those reporting an outage costing more than $1 million.

As in previous years, staffing issues remain a concern for bit barn operators. Uptime says that targeted recruitment initiatives have multiplied because of the recent growth in datacenter demand and competition for skilled workers, but these efforts have yet to meaningfully shrink a high and longstanding vacancy rate.

Survey respondents indicate that the most significant skills gaps affect electrical (33 percent) and mechanical (30 percent) roles, junior and mid-level operations (39 percent), and operations management (32 percent).

Staff shortages vary by region. Operators in North America and Europe said they had difficulty finding skilled junior-level operators, while for those in China and the Middle East there is a lack of experienced operations managers.

The Uptime survey was conducted in the first half of the year and the report is based on the response of 879 end users representing the owners and operators of datacenters, including those responsible for managing infrastructure at the world's largest IT organizations. ®

 

https://www.theregister.com//2024/07/30/rising_datacenter_costs/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment