CES 2026: Taking the Lids Off AMD's Venice and MI400 SoCs

erulabs • 1 month ago

> this would be the first time that a high core count CCD will have the ability to support a V-Cache die. If AMD sticks to the same ratio of base die cache to V-Cache die cache, then each 32 core CCD would have up to 384MB of L3 cache which equates to 3 Gigabytes of L3 cache across the chip.

Good lord!

andrepd • 1 month ago

I'd just like to take a moment to appreciate chipsandcheese and how they fill the Anandtech-shaped void in my heart <3

robocat • 1 month ago

> CCD

Core Complex Die - an AMD term for a chiplet that contains the CPU cores and cache. It connects to an IOD (I/O die) that does memory, PCIe etc (≈southbridge?).

Aside: CCX is Core Complex - see Figure 1 of https://www.amd.com/content/dam/amd/en/documents/products/ep...

For any other older fogeys that CCD means something different.

kvemkon • 1 month ago

> memory, PCIe etc (≈southbridge?)

northbridge

DiabloD3 • 1 month ago

To further expand on this, "southbridge" is what we now call a chipset expander (or 50 other company or product line specific names).

Its a switch that has a bunch of unified PHYs that can do many different tasks (non-primary PCI-E lanes, SATA ports, USB ports, etc), leveraging shared hardware to reduce silicon footprint while increasing utility, and connects to PCI-E lanes on the CPU.

LtdJorge • 1 month ago

Don’t EPYC CPUs avoid using a chipset altogether? I think in that case, it would be NB+SB.

DiabloD3 • 1 month ago

Yes.

The "northbridge" in modern Zen systems is the IO die, and in Zen 1/+, its the tiny fractional IO die that was colocated on each chip (which means a Zen 1/+ Epyc had the equivalent of 4 tiny northbridges).

However, they just embed the equivalent design of the chipsets into the IO Die SoC on Epycs.

Fun fact: For desktop, since Zen 1 (and AM4-compatible non-Zen CPUs) they included a micro-southbridge into the IO die. It gave you 2 SATA ports and 4 USB ports, usually the only "good" ones on the board. On Epyc, they just put the full sized one here instead of pairing it with an external one.

This also means, for example, if you have 4 USB3 10gbit ports, and its not handled by a third party add-on chip? Those are wired directly into the CPU, and aren't competing for the x4 that feeds the southbridge.

Also fun fact: The X, B, and A chips are all sibling designs, under the name of Promontory, made jointly with ASMedia. They're essentially all identical, only updated for PCI-E and USB versions as time went on, as well as adding more ports and shrinking die size.

The exception is the X570, its an AMD-produced variant of the Promontory that also contains the Zen 2/3 IO Die, as they're actually the same chip in this case. The chips that failed to become IO Dies had all their Promontory features enabled instead, and became chipset chips. The Zen 2/3 Epycs shipped their IO die, at least partly, as two X570s welded together, with even more DDR PHY thrown in, as some sort of cost saving.

I don't think that panned out, because the X/B/A 600 and 800 variants (Zen 4 and 5) went back to being straight Promontory again.

Wikipedia has some good charts for this: https://en.wikipedia.org/wiki/List_of_AMD_chipsets

mrlonglong • 1 month ago

256 cores on a die. Stunning.

jauntywundrkind • 1 month ago

Intel's Clearwater Forest could be shipping even sooner, 288 cores. https://chipsandcheese.com/p/intels-clearwater-forest-e-core...

It's a smaller denser core but still incredibly incredibly promising and so so neat.

jsheard • 1 month ago

Someone needs to try running Crysis on that bad boy using the D3D WARP software rasterizer. No GPU, just an army of CPU cores trying their best. For science.

zeusk • 1 month ago

This has already been tried :)

iirc, in the 2016 a quadcore intel cpu ran the original crysis at ~15fps

bee_rider • 1 month ago

I wonder what Ampere (mentioned in that article) is going to do. At this rate they’ll need to release a 1000 cpu chip just to be noticeably “different.”

fc417fc802 • 1 month ago

At some point won't the bandwidth requirements exceed the number of pins you can fit within the available package area? Presumably you'll end up back at a low maximum memory high bandwidth GPU design.

I wonder how many of these you could cram into 1U? And what the maximum next gen kW/U figure looks like.

wmf • 1 month ago

Unfortunately Ampere has fallen pretty far behind AMD. I don't see much point to their recent CPUs.

CyberDildonics • 1 month ago

"E-cores" are not the same

bri3d • 1 month ago

The 32 core / die AMD products are almost certainly Zen 6c, which is the same "idea" as Intel E-Cores albeit way less crappy.

https://www.techpowerup.com/forums/threads/amd-zen-6-epyc-ve...

EDIT: actually, now that I think about it some more, my characterization of Zen-C cores as the same "idea" as Intel E-cores was pretty unfair too; they do serve the same market idea but the implementation is so much less silly that it's a bit daft to compare them. Intel E-Cores have different IPC, different tuning characteristics, and different feature support (ie, they are usually a different uarch) which makes them really annoying to deal with. Zen C cores are usually the same cores with less cache and sometimes fewer or narrower ports depending on the specific configuration.

+1

hypercube33 • 1 month ago

eigenspace • 1 month ago

I was about to reply with an "well, actually..." comment and then I saw that you beat me to it with your edit.

Fully agreed, they may be targetting a similar goal, but the execution is so different, and a Intel screwed up the idea so bad that it can really mislead people into assuming that dense Zen cores are the same junk as a Intel E-cores.

eigenform • 1 month ago

ie. marketed as "dense" instead of "efficient"

tester756 • 1 month ago

By what logic?

+2

eigenspace • 1 month ago

mrlonglong • 1 month ago

Ah, I omitted to mention that with 256 cores, you get 512 threads.

Neywiny • 1 month ago

32 cores on a die, 256 on a package. Still stunning though

bee_rider • 1 month ago

How do people use these things? Map MPI ranks to dies, instead of compute nodes?

wmf • 1 month ago

Yeah, there's an option to configure one NUMA node per CCD that can speed up some apps.

janwas • 28 days ago

Gemma.cpp has nested thread pools, one per chiplet, and one across all chiplets. With such core counts it is quite important to minimize any kind of sharing, even RMW atomics.

markhahn • 1 month ago

MPI is fine, but have you heard of threads?

+1

bee_rider • 1 month ago

m4rtink • 1 month ago

640 cores should be enough for anyone

jsheard • 1 month ago

Tell that to Nvidia, Blackwell is already up to 752 cores (each with 32-lane SIMD).

phkahler • 1 month ago

640K cores should be enough for everyone.

fooblaster • 1 month ago

b200 is 148 sms, so no

+2

jsheard • 1 month ago

epistasis • 1 month ago

That's going to run Cities Skylines 2 ~~really really well~~ as well as it can be run.

mort96 • 1 month ago

Does it actually scale well to that many cores? If so, that's quite impressive; most video game simulations of that kind benefits more from few fast cores since parallelizing simulations well is difficult

Neywiny • 1 month ago

No, see https://m.youtube.com/watch?v=44KP0vp2Wvg . You're right it didn't scale that well

epistasis • 1 month ago

Looks like it may be capped at 32 cores in that video, if they are hitting 25%-30% of a 96 core CPU?

Here's analysis of a prior LTT video showing 1/3 of cores at 100%, 1/3 of cores at 50%, and 1/3 idle cores:

https://www.youtube.com/watch?v=XqSCRZJl7S0

In any case, CS2 can take advantage of far more cores than most games.

markhahn • 1 month ago

these big high-core systems do scale, really well, on the workloads they're intended for. not games, desktops, web/db servers, lightweight stuff like that. but scientific, engineering - simulations and the like, they fly! enough that the HPC world still tends to use dual-socket servers. maybe less so for AI, where at least in the past, you'd only need a few cores per hefty GPU - possibly K/V stuff is giving CPUs more to do...

+1

p12tic • 1 month ago

+1

rbanffy • 1 month ago

Neywiny • 1 month ago

Nope, see https://m.youtube.com/watch?v=44KP0vp2Wvg . Just didn't scale enough

lifetimerubyist • 1 month ago

I’m gonna get one of these and I’m just gonna play DOOM on it.

znpy • 1 month ago

256c/512t off a single package… likely 1024 threads in a 2cpu system.

Basically we are about to reach the scale where a single rack of these is a whole datacenter from the nineties or something like that

unnah • 1 month ago

Perhaps the most comparable 1990s system would be the SGI Origin 2800 (https://en.wikipedia.org/wiki/SGI_Origin_2000) with 128 processors in a single shared-memory multiprocessing system. The full system took up nine racks. The successor SGI Origin 3800 was available with up to 512 processors in 2002.

O5vYtytb • 1 month ago

Each core is multiples faster than a 90's CPU for various reasons as well. I think if you look at an entire rack it's easily a multiple of a 90's datacenter.

ksec • 1 month ago

256 Zen 6c Core. I cant wait for cloud vendors to get their hands on it. In a Dual Socket config that is 512 Core and 1024 vCPU per server node. We could get two node in a server, That is 1024 Core with 2048 threads.

Even the slowest of All programming languages or framework with 1 request per second per vCPU, that is 2K Request per second.

Pure brute force hardware scaling.

andrekandre • 1 month ago

random internet feedback:

i really wish the article would have spent 2 sec to write in parenthesis what 'ccd' is (its 'Core Complex Die' fyi)

aurareturn • 1 month ago

This is a hardcore chip website. All their readers know what it is.

If their goal was to appeal to more casual readers, then I agree.

hypercube33 • 1 month ago

Well, it could also mean CCD (Charge Coupled Device) which is also used in this field (or was?)

aurareturn • 1 month ago

Any article mentioning CCD in the context of AMD server chips would mean the compute chiplet of the CPU.

andrekandre • 1 month ago

  > it could also mean CCD (Charge Coupled Device)

in fact this is exactly what i initially thought!

cogman10 • 1 month ago

How is this sort of package cooled? Seems like you'd pretty much need to do some sort of water cooling right?

icegreentea2 • 1 month ago

While the power draw might be high in absolute terms, the surface area is also quite large. For example, the article's estimates add up to just 2000mm2 for the Epyc chip. For reference, a Ryzen 9950X (AMD's hottest desktop CPU) has a surface area of about 262mm2, and a PPT (maximum power draw) of ~230W. This means that the max heat flux at the chip interface will almost certainly be lower on the Epyc chip than on the Ryzen - I don't think we're going to be getting 1000W+ PPT/TDP chips.

From that you can infer that there shouldn't be the need for liquid cooling in terms of getting the heat off the chip.

There still are overall system power dissipation problems, which might lead you to want to use liquid cooling, but not necessarily.

For example, Super Micro will sell you air cooled 1U servers that options up to 400W CPU options (https://www.supermicro.com/en/products/system/hyper/1u/as%20...)

zozbot234 • 1 month ago

You can move a lot of air with good efficiency even just by using bigger fans that don't need to spin as fast most of the time. Water cooling is a good default for power-dense workloads, but far from an absolute necessity in every case.

wmf • 1 month ago

You can cool it however you want but the better the cooling the better the performance. We'll probably see heat pipes at a minimum.

aurareturn • 1 month ago

Air almost certainly. They always develop these chips within a thermal envelop. The envelop should be within what air cooling can do.

PS. Having many cores doesn’t mean a lot more power. Multi core performance can be made very efficient by having many cores running at lower clock rate.

unethical_ban • 1 month ago

AMD Venice? 2005 is calling!

ironbound • 1 month ago

The new double wide rack looks good

rballpug • 1 month ago

x86_64 server architecture 256 cores on a die.

Blackwell 100+200 compression spin lock documentation.

zwaps • 1 month ago

Have not checked for a while, but does AMD at this point have any software to run stable and efficiently?

Or are they still building chips no one wants to use because cuda is the only thing that doesn’t suck balls

wmf • 1 month ago

ROCm is pretty stable now.