Back

Arcee Trinity Mini: US-Trained Moe Model

70 points2 monthsarcee.ai
halJordan2 months ago

Looks like a less good version of qwen 30b3a which makes sense bc it is slightly smaller. If they can keep that effiency going into the large one it'll be sick.

Trinity Large [will be] a 420B parameter model with 13B active parameters. Just perfect for a large Ram pool @ q4.

davidsainez2 months ago

Excited to put this through its paces. It seems most directly comparable to GPT-OSS-20B. Comparing their numbers on the Together API: Trinity Mini is slightly less expensive ($0.045/$0.15 v $0.05/$0.20) and seems to have better latency and throughput numbers.

htrp2 months ago

Trinity Nano Preview: 6B parameter MoE (1B active, ~800M non-embedding), 56 layers, 128 experts with 8 active per token

Trinity Mini: 26B parameter MoE (3B active), fully post-trained reasoning model

They did pretraining on their own and are still training the large version on 2048 B300 GPUs

Balinares2 months ago

Interesting. Always glad to see more open weight models.

I do appreciate that they openly acknowledge the areas where they followed DeepSeek's research. I wouldn't consider that a given for a US company.

Anyone tried these as a coding model yet?

bitwize2 months ago

A moe model you say? How kawaii is it? uwu

ghc2 months ago

Capitalization makes a surprising amount of difference here...

donw2 months ago

Meccha at present, but it may reach sugoi levels with fine-tuning.

noxa2 months ago

I hate that I laughed at this. Thanks ;)

ksynwa2 months ago

> Trinity Large is currently training on 2048 B300 GPUs and will arrive in January 2026.

How long does the training take?

arthurcolle2 months ago

Couple days or weeks usually. No one is doing 9 month training runs

trvz2 months ago

Moe ≠ MoE

cachius2 months ago

?

azinman22 months ago

The HN title uses incorrect capitalization.

rbanffy2 months ago

I was eagerly waiting for the Larry and Curly models.

m4rtink2 months ago

^_-