๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด ์ž๋งค ํŽ˜์ด์ง€: Strix Halo์—์„œ ROCm์ด ๋น ๋ฅธ๋ฐ๋„ ์ง„ ์ด์œ  โ€” APU ๋Ÿฐํƒ€์ž„ polling์ด ๋งŒ๋“  35% ์ „๋ ฅ ํšจ์œจ ์—ญ์ „ (Korean)

<aside> ๐ŸŽฏ

TL;DR โ€” On AMD Strix Halo (Ryzen AI MAX+ 395, Radeon 8060S iGPU) running the same Qwen3-30B-A3B Q4_K_M model, HIP/ROCm dominates raw throughput (pp512 354 vs 78 t/s, tg 48 vs 36 t/s) but just loading the model burns 2.5 CPU cores continuously because of HSA runtime polling. For 24/7 workloads where idle time dominates, Windows Ollama (Vulkan) consumes about 35% less energy per token. A clean case study in why dGPU intuitions don't transfer to APUs.

</aside>

<aside> ๐Ÿ“Š

[Image #1 โ€” Hero] ROCm and Vulkan facing off on top of a Strix Halo APU

file: 2026-05-31-strixhalo-hero.png

</aside>


๐Ÿ“Œ Intro โ€” Strix Halo, a new "middle-ground" platform

Who this is for โ€” Infra/ML engineers running local LLM workloads on Strix Halo / Ryzen AI MAX+ systems, and any backend decision-maker who has to validate the "AMD GPU means ROCm" intuition. If you ever picked an inference backend based on a single throughput table, this case study is for you.

AMD Strix Halo (Ryzen AI MAX+ 395 + Radeon 8060S iGPU, gfx1151) is neither a desktop dGPU nor a laptop iGPU โ€” it's a new middle ground. With up to 65 GB of unified memory (UMA/GTT) you can fit a 30B-class LLM entirely on the GPU, and "run an MoE 30B on a single desktop" stops being marketing and becomes a measurable claim.

The trap is what happens the moment you carry the old dGPU intuition โ€” "AMD GPU = ROCm is the answer" โ€” onto this new variant. Your operational metrics invert. This article logs four measurement rounds comparing ROCm 7.2.4 + a self-built HIP llama.cpp against Windows-native Ollama (Vulkan), on the same model, on the same machine.

Quick glossary:


๐Ÿ”ฌ Round 1 โ€” VRAM recognition ("ROCm only sees 4 GB" โ€” solved)

The first wall was the predictable one. With default BIOS allocation, ROCm only sees the dedicated VRAM (4 GB), so the 30B model can't be GPU-offloaded. Increasing BIOS UMA/GTT to 65 GB brings rocminfo Pool 1 GLOBAL to 95.83 GiB, and llama-bench correctly enumerates the device.

<aside> ๐Ÿงฉ

[Image #2] BIOS UMA dial โ€” from 4 GB to 65 GB

file: 2026-05-31-strixhalo-vram-dial.png

</aside>

rocminfo (gfx1151 agent, Pool 1):
  Segment: GLOBAL; FLAGS: COARSE GRAINED
  Size:    100,478,882 KB  โ‰ˆ 95.83 GiB

llama-bench device discovery:
  Device 0: AMD Radeon(TM) 8060S Graphics, gfx1151 (0x1151)
  VRAM: 98123 MiB,  free: 97044 MiB

This part is uncontroversial. "We can use it now" is the conclusion, and most articles would stop here.