SambaNova makes Llama gallop in inference cloud debut

Tan KW

Publish date: Wed, 11 Sep 2024, 06:29 AM

Not to be outdone by rival AI systems upstarts, SambaNova has launched inference cloud of its own that it says is ready to serve up Meta’s largest models faster than the rest.

The cloud offering is one of several which have cropped up amid the AI boom, offering API access to popular open-weight models. Most of these are GPU-based, but for the more boutique vendors dealing in specialized hardware, like Cerebras, Groq, and now SambaNova, it seems whoever can get the largest model to spit out tokens the fastest has a leg up.

If you're not familiar, tokens here refer to how large language models encode words, word fragments, punctuation, and figures. So, the faster your infrastructure can generate tokens, the less time you're left waiting for a response.

According to CEO Rodrigo Liang, SambaNova has managed to get Meta’s 405 billion parameter Llama 3.1 model (more than twice the size of OpenAI's GPT-3.5 model) to churn out tokens at a rate of 132 per second and at the full 16-bit precision it was trained at no less.

To put that in perspective, its estimated the average person can read at about 5 words per second. At 132 tokens a second, SambaNova's system is nearly twice as fasts as the next fastest GPU systems at least according to Artificial Analysis data cited in SambaNova's announcement.

Pedal to the metal

Introduced earlier this summer, Llama 3.1 405B is Meta's first frontier-class model capable of going toe-to-toe with much larger models from the likes of OpenAI, Anthropic, and Google.

And while far smaller than competing models, running 405B at 16-bit precision isn't an easy feat, as simply fitting it into memory requires 810 GB of capacity. That's not even counting the space required by the key-value cache.

To run the model, SambaNova used 16 of its SN40L accelerators, each with 64 GB of speedy HBM3 memory and 520 MB of on-die SRAM. You can find a full breakdown of the chip, codenamed Cerulean 1, on our sibling site The Next Platform.

Using this configuration, SambaNova boasts it's achieved a throughput of 132 tokens per second in 405B and 461 tokens a second when running the smaller 70 billion parameter variant. By comparison, data from Artificial Analysis shows that even the best GPU-based systems can only managed to serve Meta's 405B model at 72 tokens per second with most much slower than that.

What's more, the startup claims it's able to maintain performance in excess of 100 tokens per second up to a batch size of four. Or, in other words, for up to four simultaneous requests. According to Anton McGonnell, head of SambaNova's software products division, there may be some additional headroom to scale that even further.

This level of performance is possible in part thanks to the SN40L's larger caches, McGonnell told the Register. This, he added, allows it to avoid the performance overheads commonly seen in multi-GPU systems.

"If GPUs could truly utilize their memory bandwidth, they will be much faster, but they can't," he explained.

But, while SambaNova was able to get Llama 3 405B running at 16-bit precision, it wasn't without compromise. One of the biggest concessions is the model isn't running at its full 128k token context window and was instead cut back to 8k.

"For the purposes of launch, we're just making the 8k version available, if only because of traffic," McGonnell said. "If people start using 128k, then it slows everything down for everybody else."

While this is unlikely to negatively impact performance in something like a customer service chatbot, it will limit the service's practicality for longer-context applications like document summarization.

The competition heats up

SambaNova Cloud's free and paid enterprise tiers are available starting today. The infrastructure provider also plans to roll out a developer tier later this year which, in addition to higher rate limits, will let devs build models based on Llama 3.1.

However, as we mentioned earlier, SambaNova is far from the only infrastructure vendor leaning on speed to differentiate itself from a sea of GPU-based offerings. Cerebras, which announced its own inference cloud at the Hot Chips conference late last month, already boasts performance of up to 450 tokens per second in Llama 3.1 70B and anticipates it will be able to achieve 350 tokens per second when running the 405B variant. If Cerebras can actually pull that off, it'll put the company well ahead of SambaNova, even if doing so will require 12 of its wafer-scale chips.

There's also Groq, which has previously managed to achieve throughputs of 300 tokens a second in Llama 2 70B using some 576 of its language processing units. The firm recently nabbed $640 million in a series-D funding round, which among other things will help it ramp up the development of its next-gen accelerators. ®

https://www.theregister.com//2024/09/10/sambanovas_inference_cloud/

Discussions

Be the first to like this. Showing 0 of 0 comments

Featured Posts

MQ Trader

Introducing MY's First IPO Fund for Sophisticated Investors!

MQ Chat

New Update. Discover investment communities that resonate with your ideas

MQ Trader

M & A Value Partners IPO Equity Fund has been launched - Targeted 13% Return p.a

Latest Videos

0:17

New IPO: A café chain operator, distributor, and retailer, Oriental Kopi Holdings Berhad aims to list on the ACE Market!

MQ Trader 927 views | 6 d ago

0:17

New IPO: Solar PV EPCC services provider, Northern Solar Holdings Berhad aims to list on the ACE Market!

MQ Trader 678 views | 6 d ago

0:43

M & A Value Partners IPO Equity Fund

MQ Trader 13958 views | 5 mo ago

0:15

MQ Market Updates - 13 January 2025

MQ Trader 193 views | 1 d ago

Apps

MQ Chat

Send individual or group chats with anyone on i3investor

MQ Trader

Earn MQ Points while trading with MQ Trader

MQ Affiliate

Earn side income from Affiliate Program

MQdemy

Online learning and teaching marketplace

Hot Stocks Today >

YTLPOWR

YTL POWER INTERNATIONAL BHD

1000

YTL

YTL CORPORATION BHD

595

NATGATE

NATIONGATE HOLDINGS BERHAD

407

GAMUDA

GAMUDA BHD

378

SUPERMX

SUPERMAX CORPORATION BHD

371

SAPNRG

SAPURA ENERGY BERHAD

319

GENM

GENTING MALAYSIA BERHAD

281

SET

SWIFT ENERGY TECHNOLOGY BERHAD

276

GENTING

GENTING BHD

274

BAUTO

BERMAZ AUTO BERHAD

270

Daily Stocks

INIX-OR

0.02

0.00

141,839,900

HSI-CWA1

0.125

0.00

99,559,600

EAH

0.005

0.00

98,901,500

HSI-PWB1

0.105

-0.005

87,866,700

EDUSPEC-OR

0.005

0.00

75,002,700

E&O-LR

0.055

+0.01

56,261,700

MYEG

0.90

-0.035

50,654,200

GAMUDA-C2R

0.16

-0.025

46,056,600

YTLPOWR-C64

0.04

-0.005

33,511,300

HSI-CWAY

0.08

-0.005

30,341,200

More active Stocks

NESTLE

93.00

+0.30

17,300

HLBANK

20.00

+0.10

715,200

PERSTIM

2.30

+0.10

76,000

HSI-CWA5

0.105

+0.095

620,100

PMETAL

4.76

+0.09

3,191,500

CHINAETF-MYR

4.49

+0.07

500

RHBBANK

6.33

+0.06

2,897,300

PIE

5.79

+0.06

153,100

HLFG

17.96

+0.06

81,500

DIALOG

1.89

+0.05

8,474,500

More gainer Stocks

MPI

22.80

-0.42

124,600

ORIENT

7.15

-0.41

2,132,800

ALLIANZ

19.98

-0.38

42,900

SUNCON

3.99

-0.36

6,801,300

PETDAG

19.16

-0.36

84,800

TENAGA

13.50

-0.32

3,404,900

UTDPLT

31.00

-0.30

143,100

VSTECS

3.50

-0.24

969,600

CARLSBG

20.12

-0.22

55,000

SUNWAY-PA

4.00

-0.21

154,500

More loser Stocks

MQ Trading Signals

BUY
SELL

GPACKET

GREEN PACKET BHD

2025-01-15 12:40:00

EMA 5

10 Mins

PRESTAR

PRESTAR RESOURCES BHD

2025-01-15 12:30:00

EMA 5

30 Mins

RAMSSOL

RAMSSOL GROUP BERHAD

2025-01-15 12:30:00

EMA 5

30 Mins

GOLDETF

TRADEPLUS SHARIAH GOLD TRACKER

2025-01-15 12:30:00

EMA 5

10 Mins

KIMLUN

KIMLUN CORPORATION BERHAD

2025-01-15 12:20:00

EMA 5

10 Mins

More Trading Signals

QES

QES GROUP BERHAD

2025-01-15 12:30:00

ADX

30 Mins

ABFMY1

ABF MALAYSIA BOND INDEX FUND

2025-01-15 12:30:00

EMA 5

10 Mins

POHUAT

POH HUAT RESOURCES HOLDINGS

2025-01-15 12:20:00

ADX

10 Mins

POHUAT

POH HUAT RESOURCES HOLDINGS

2025-01-15 12:20:00

TURTLE SYSTEM 20

10 Mins

KHJB

KIM HIN JOO (MALAYSIA) BERHAD

2025-01-15 12:20:00

OBV

10 Mins

More Trading Signals

Featured Advertisers / Partners

Top Brokers >

AmEquities

Affin Hwang

Rakuten Trade

Hong Leong Bank

Books Review >

Ride The Bull Short The Bear

CS Tan

4.9 / 5.0

This book is the result of the author's many years of experience and observation throughout his 26 years in the stockbroking industry. It was written for general public to learn to invest based on facts and not on fantasies or hearsay....

Read More