Future Tech

Time for a fresh approach to compute architecture?

Tan KW
Publish date: Thu, 23 May 2024, 05:27 PM
Tan KW
0 459,797
Future Tech

Sponsored Feature Read this Register interview to find out what Professor Onur Mutlu of ETH Zürich thinks about the compute architecture we should be deploying to meet next generation requirements.

The Register: If we look at the way computing architecture works today, what are the challenges and shortcomings of this model?

Professor Onur Mutlu: There are major issues with the way we design compute systems today. One of the biggest is that although datasets are growing, and we're trying to do more sophisticated things with them, the components that do the actual computation are a very small fraction of the system. If you take a typical node, more that 98 percent of that node is dedicated to storing, controlling, and moving data, whereas the processors that operate on that data are a very small part of it. The way we design our systems is very processor-centric. The processor is king, and everything must move to it so you can carry out computation. The storage system, memory system and interconnects are not active components, working productively on that computation. When you are continually moving data between processor and memory or storage subsystems, that represents a major bottleneck.

Reg: How does this square with the age of data-heavy applications?

OM: It's increasingly the case that we have many terabytes of data to store from applications like machine learning and genomics. We did a study with Google where we looked at large machine learning models, ones that use machine learning accelerators, and we found that more than 90 percent of total system energy is actually spent on accessing memory. This causes both energy problems and performance problems. Most of the potential of your hardware is getting wasted, and that's causing sustainability issues too. All this processing hardware that's only used for storing data adds up to a lot of carbon that's getting wasted.

Reg: Is there scope for doing the job in better ways?

OM: I believe that in the future, memory and storage should be, and will, be more tightly integrated. They will also be a lot more active, so that for example when a processor needs to execute the data-intensive part of a workload - such as a large language model with data-heavy inferencing - it offloads that function to the memory. The result is then returned to the processor, allowing it to get on with other things. With this model, everything works more cooperatively to solve problems in a much more efficient and performant way.

Reg: What sort of applications benefit most from this approach?

OM: We're talking about data centric applications like genomics, or training and inferencing of machine learning. You can allocate storage nodes to each, changing dynamically according to the needs of the application. By containing the data and doing the processing in the storage system you create huge energy efficiencies. Today's compute is not energy efficient. So much energy is wasted moving data from memory to processor, just for very simple computations. I call it the hidden cost of data movement. We're trying to change that, for example by putting acceleration capability on the memory side.

Reg: Tell us more about some of the work you've been doing with genomics.

OM: Genomic data has been exploding across the world thanks to the extremely powerful and low-cost genome sequencing technologies we now have. There are times when you need to analyse that data speedily, for example in the treatment of a critically ill infant where you want to determine the best personalised medical treatment. Today that data would be stored in the cloud and needs to get moved to the processors. We want to eliminate that, so that decisions can be made much faster. We've been able to reduce this analysis time lag by around 30x. Energy efficiency can be improved too, depending on the type of analysis you are doing and the type of data you have. And that's just by moving acceleration capability on to the storage side.

Reg: Do you have other examples you can talk about?

OM: There's also our work with machine learning inferencing and large language models. We're seeing similarly good results here. You need to operate on huge data sets for inferencing, or training. The data needs to be structured and stored, and we've been creating a database for that. We're building accelerators next to each flash chip, just like we did with genomics, and the results are similar. We're talking at least 20x improvement in terms of performance and efficiency.

There are further applications where gains have been noted by others. For example with large graphs where you are looking to find structure, such as those used by social networks. Moving data occupies most of the time and energy. When you offload that to the memory and storage system you get huge improvements. You get around 14x performance improvement and around 10x energy efficiency improvement. If you compound all these gains, you get around 100x improvement.

Reg: Are AI and ML changing the rules and requirements around fast data transmission?

OM: Yes, and we need to adapt our systems to handle that. Whenever we talk to people in industry who are building these machine learning accelerators, they are getting bombed by data. That's really changing how they have to do things. We have to move to a more data-centric paradigm to deal with these realities. I'm not saying this will be easy. But we have to move on from the very processor-centric systems we are using today. It's about looking at ways in which this can be done relatively easily without too much effort required from the programmer.

There will inevitably be some pain where change is involved. We'll all need to work a little bit harder. Programming models and system software support will not be perfect from Day One. But over time, as more an more examples of this data-centric approach appear, we'll see the energy and performance benefits revealed more clearly. The software stack will adapt. It's not an overnight transition, but there's certainly a very pressing need for it to happen as soon as possible. I think we'll get there with the move to memory-centric and storage-centric compute, but we might have to find ways to get there much faster.

Reg: What plans does your team have for pushing this forward?

OM: There's more to it than the work we're doing with storage and memory-centric computing. I also see a future not just in using storage-centric models to enable better machine learning, but the flip side of using machine learning to design better systems. We're quite excited about this. If you look at the way we design systems today, there are a lot of human-driven decisions. If you want to design a storage controller, for example, the policies will usually be designed by people. But with ML we can design much better controllers that can actually learn from their decisions over time. This way the system just gets better and better over time in terms of performance and efficiency. A more intelligent controller makes better decisions. Humans still need to be involved, even where we have better automation. But they no longer need to dictate the policies. I think we have a lot of exciting developments that lie just ahead of us.

 

 

https://www.theregister.com//2024/05/23/time_for_a_fresh_approach/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment