Back

GLM5.2 on AMD MI355X at 2626 tok/s/node at over 2x lower cost than Blackwell

50 points3 hourswafer.ai
minraws1 hour ago

Can you folks add performance per watt as a metric to these comparisons, I honestly want to understand where AMD fits in the stack in terms of actual performance to dollars. I have had talks with companies wanting to build data centers outside of US and find it hard to source anything Nvidia in sufficient capacity and scale.

If AMD is competitive performance per watt and roughly reliable in terms of software support which is what most folks outside of US prioritize above all else, since outside of China and US electricity tends to at a relative premium.

Maybe if they make smaller data centers viable at the right price, AMD could be part of the stack outside of US where ever Nvidia is more limited in supply. Though I have genuinely no idea what sourcing an AMD GPU looks like.

I have never seen a company use AMD outside of wafer and a couple others mostly in US.

Genuinely intriguing or maybe not really (could be this stuff is common knowledge) and I am just stuck in my Nvidia bubble here.

Twirrim49 minutes ago

> I have never seen a company use AMD outside of wafer and a couple others mostly in US.

There's a few using them, and even more starting to experiment with them. AMD has long been a source of disappointment around this side of things, so I'm hesitant to feel optimistic we'll finally get some competition. The market really needs viable competition to Nvidia, especially performance/watt.

craftkiller57 minutes ago
technoabsurdist20 minutes ago

AMD MI355X uses 1,400W per GPU and NVIDIA B200 uses 1,200W. So AMD uses about 16% more power.

GZGavinZhao49 minutes ago

> roughly reliable in terms of software support

*chuckles

I have no knowledge of the support for enterprise-grade hardware, but their consumer-grade hardware support is still quite atrocious. I believe in the AMD team and I've been watching them since 2023 catch up with NVIDIA at an unprecedented speed thanks to AI, but no, I still wouldn't consider AMD's software support as good, at least for consumer level hardware running AI. The fact that the Vulkan backend of llama.cpp consistently outperforms the ROCm backend by a 5~10% margin on any model I run on is just laughable (source: I run local LLMs and I always benchmark, but you can also find similar issues in llama.cpp).

oDot1 hour ago

Do these providers have 80+% gross margins or is something eating into them? Maybe utilization?

technoabsurdist49 minutes ago

hi i work at wafer. no the margins are lower averaging at about ~40%. utilization is one of the highest order bits in determining margins here, yes.

AussieWog9351 minutes ago

The 2600 tok/s is an "aggregate", not the actual throughput.

technoabsurdist48 minutes ago

yes it is 213 tok/s single stream (so per user)

383629364827 minutes ago

So per subagent*.

yieldcrv1 hour ago

Agentic coding drivers for different architectures is a massive unlock for the world

So much compute is under utilized waiting for a savant or company to prioritize an architecture, and now all the other engineers can tackle this at any time if they get inspired on the right prompts

technoabsurdist19 minutes ago

this is exactly our thesis at wafer :) thank you for the support

yogthos30 minutes ago

Personally, I can't wait till something like this starts getting to consumer level. https://www.anuragk.com/blog/posts/Taalas.html

yieldcrv7 minutes ago

That’s pretty fascinating, Apple has some innocuous LLMs and transformers baked into its devices and leveraging their neural chipset

So I could see something like this where the neural chipset has an LLM that cant be so easily updated baked into it, until you get a new device