As someone from the rendering side of GPU stuff, what exactly is the point of ROCm/CUDA? We already have Vulkan and SPIR-V with vendor extensions as a mostly-portable GPU API, what do these APIs do differently?
Furthermore, don't people use PyTorch (and other libraries? I'm not really clear on what ML tooling is like, it feels like there's hundreds of frameworks and I haven't seen any simplified list explaining the differences. I would love a TLDR for this) and not ROCm/CUDA directly anyways? So the main draw can't be ergonomics, at least.
For context, the submitter of the issue is Anush Elangovan from AMD who's recently been a lot more active on social after the SemiAnalysis article, and taking the reigns / responsibility of moving AMD's software efforts forward.
However you want to dissect this specific issue, I'd generally consider this a positive step and nice to see it hit the front page.
I think AMDs offer was fair (full remote access to several test machines), then again just giving tinycorp the boxes on their terms with no strings attached as a kind of research grant would have earned them some goodwill with that corner of the community.
Either way both parties will continue making controversial decisions.
> He refused. It had to come from AMD. That's absurd and extortionist.
I'm on the wrong side of the Twitter wall to read the source, but that doesn't sound absurd. Extortionist, maybe. Hotz's major complaint (last time I checked, anyway) is pretty close to one I have - AMD appears to have between little and no strategic interest in consumer grade graphics cards having strong GPGPU support leading to random crashes from the kernel drivers and a certain attitude of "meh, whatever" from AMD corporate when dealing with that.
I doubt any specific boxes or testing regime are his complaint, he'd be much more worried about whether AMD management have any interest in companies like his succeeding. Third parties providing some support doesn't sound like it'd cut it. The process of being burned by AMD leaves one a little leery of any alleged support without some serious guarantees that more major changes are afoot in their management view.
"I estimate having software on par with NVDA would raise their market cap by 100B. Then you estimate what the chance it that
@__tinygrad__
can close that gap, say it's 0.1%, probably a very low estimate when you see what we have done so far, but still...
That's worth 100M. And they won't even send us 2 ~100k boxes. In what world does that make sense, except in a world where decisions are made based on pride instead of ROI. Culture issue."
This is his opinion, nothing more, nothing less. He currently has a partially implemented piece of software that hasn't seen a release since November and isn't performant at all.
To be fair, having seen his software evolve, and having seen ROCm evolve, I'm more optimistic for his software in a year than yours.
He picked his problem better. The whole reason that tinygrad is, well, tiny, is that it limits the amount of overhead to onboard people and perform maintenance and rewrites. My strong impression is that the ROCm codebase is simply much too large for AMD's dev resources. You're trying to race NVidia on their turf with less resources. It's brave, but foolish.
I can see how Tinygrad could succeed. The story makes sense. AMD's doesn't, neither logically nor empirically. NVidia would have to seriously fumble.
That said I'm deeply worried about anyone whose based their company on amd gpus. The only reason why they do well in hpc is because there's an army of dreadfully underpaid and over performing grand students to pick up the slack from AMD. Trying to do that in a corporate environment is company suicide.
You dont think AMD being competitive with Nvidia (3,37 trillion USD MC) would be "nearly 100B good"? Believe it or not the only reason thats not the case is good bug-free software. Thats what tinygrad is doing
AMD already has major ongoing projects with OpenXLA/IREE. Lots of established engineers/researchers, and it’s in collaboration with Google/AWS. Hotz is delusional if he thinks that he can do better by ripping off Karpathy’s toy autograd implementation.
Yeah, AMD is already pouring a lot of support into OpenXLA/IREE, which has a lot of well-respected compiler engineers and researchers working on it, and companies like AWS are also investing into it.
I don’t really think TinyCorp has anything to offer AMD.
Really telling they have to ask us for what cards we want as opposed to supporting all cards by default from day 1 like Nvidia.
All because they went with a boneheaded decision to require per-device code compilation (gfx1030, gfx1031...) instead of compiling to an intermediate representation like CUDA's PTX. Doubly boneheaded considering the graphics API they developed, Vulkan, literally does that via SPIR-V!
I can understand wanting to prioritize support for the cards people want to use most, but they should still plan to write software support for all the cards that have hardware support.
Hardware first, but then their hardware isn't any better than NVidia's, so I don't see how that's a valid excuse here.
(Okay, maybe their super high end unobtanium-level GPUs are better hardware-wise. Don't know, don't care about enterprise-only hardware that is unbuyable by mere mortals.)
It's just not, people like to try and defend AMD out of hatred for Nvidia but the thousands of fumbles over the past 15 years that have led AMD to their current position and Nvidia to their current dominance are not deserving of coddling and excuses.
The fact support still isn't there, they've had 2 years since Stable Diffusion to get a serious team up and shipping and they still don't even have enough resources pointed at this to not have to be asking what should be prioritized.
The only way to fix their culture/priorities is to stop buying their cards.
Imagine nvidia supported only the 4090, 4080 and 4070 for cuda at the consumer level. With the 3090 not being supported since the 40xx series came out. This is what amd is defending here.
I honestly can't figure out which Radeon GPUs are supposed to be supported.
The GitHub discussion page in the title lists RX 6800 (and a bunch of RX 7xxx GPUs) as supported, and some lower-end RX 6xxx ones as supported for runtime. The same comment also links to a page on the AMD website for a "compatibility matrix" [1].
That page only shows RX 7900 variants as supported on the consumer Radeon tab. On the workstation side, Radeon Pro W6800 and some W7xxx cards are listed as supported. It also suggests to see the "Use ROCm on Radeon GPU documentation" page [2] if using ROCm on Radeon or Radeon Pro cards.
That link leads to a page for "compatibility matrices" -- again. If you click the link for Linux compatibility, you get a page on "Linux support matrices by ROCm version" [3].
That "by ROCm version" page literally only has a subsection for ROCm 6.2.3. It only lists RX 7900 and Pro W7xxx cards as supported. No mention of W6800.
(The page does have an unintuitively placed "Version List" link through which you can find docs for ROCm 5.7 [4]. Those older docs are no more useful than the 6.2.3 ones.)
Is RX 6800 supported? Or W6800? Even the amd.com pages seem to contradict each other on the latter.
Maybe the pages on the AMD site only list official production support or something. In any case it's confusing as hell.
Nothing against the GitHub page author who at least seems to try and be clear but the official documentation leaves a lot to be desired.
rocm is kind of a joke. Recently I wanted to write some golang code which talks to rocm devices using amd smi. You have to build and install the go amd smi from source, the go amd smi repo has dead links and there is basically no documentation anywhere on how to get this working.
Compare this to nvidia where I just imported the go nvml library and it built the cgo code and automatically links to nvidia-ml.so at runtime.
I figure that list is only what’s officially supported, meaning things not on that list may or may not work?. For example, my 6800 XT runs stable diffusion just fine on Linux with PyTorch ROCm.
My wishlist for ROCm support is actually supporting the cards they already released. But that's not going to happen.
By the time an (consumer) AMD device is supported by ROCm it'll only have a few years of ROCm support left before support is removed. Lifespan of support for AMD cards with ROCm is very short. You end up having to use Vulkan which is not optimized, of course, and a bit slower. I once bought an AMD GPU 2 years after release and 1 year after I bought it ROCm support was dropped.
FWIW, every ROCm library currently in the Debian 13 'main' and Ubuntu 24.04 'universe' repository has been built for and tested on every discrete consumer GPU architecture since Vega. Not every package is available that way, but the ones that are have been tested on and work on Vega 10, Vega 20, RDNA 1, 2 and 3.
Note that these are not the packages distributed by AMD. They are the packages in the OS repositories. Not all the ROCm packages are there, but most of them are. The biggest downside is that some of them are a little old and don't have all the latest performance optimizations for RDNA 3.
Those operating systems will be around for the next decade, so that should at least provide one option for users of older hardware.
Packages existing and the software actually working are very different things. You can run rocm on unsupported GPUs like a 780m, but as soon as you hit an issue you are out of luck. And you’ll hit an issue.
For example, my 780m gets 1-2 inferences from llama.cpp before dropping off the bus due to a segfault in the driver. It’s a bad enough lockup that linux can’t cleanly shutdown and will hang under hard rebooted.
The 780m is an integrated GPU. I specified discrete GPUs because that's what I have tested and can confirm will work.
I have dozens of different AMD GPUs and I personally host most of the Debian ROCm Team's continuous integration servers. Over the past year, I have worked together with other members of the Debian project to ensure that every potentially affected ROCm library is tested on every discrete consumer AMD GPU architecture since Vega whenever a new version of a package is uploaded to Debian.
FWIW, Framework Computers donated a few laptops to Debian last year, which I plan to use to enable the 780m too. I just haven't had the time yet. Fedora has some patches that add support for that architecture.
As the underdog AMD can't afford to have their efforts perceived as half-assed or a hobby or whatever. They should be moving heaven and earth to maximize their value proposition, promising and delivering on longer support horizons to demonstrate the long term value of their ecosystem.
Honestly at this point half-assed support would be a significant step up from their historical position. The one thing they have pioneered is new tiers of fractional assedness asymptotically approaching zero.
I mean at this point my next card is going to be an nvidia. It has been a total waste of time trying to use rocm for anything machine-learning based. No one uses it. No one can use it. The card I have is somehow always not quite supported.
You can use ROCM on consumer radeon as long as you pay more than 400 dollars for one of their gpus. Meanwhile, you can run stable diffusion with the -lowvram flag on a 3050 6gb that goes for 180 dollars
I have a mi50 with 16gb of hbm thats collecting dust (its Vega bases, so it can play games, I guess) because I don’t want to bother setting up a system with Ubuntu 20.04, the last version of Ubuntu the last version of ROCM that supported the MI50 works on.
With situations like this, its not hard to see why Nvidia totally dominates in the compute/ai market.
AMD did over $5 billion in GPU compute (Instinct line) last year. Not nVidia numbers but also not bad. Customers love that they can actually get Instinct system rather than trying to compete with the hyperscalers for limited supplies of nVidia systems. Meta and Microsoft are the two biggest buyers of AMD Instincts, though...
AMD Instinct is also more power efficient and has comparable (if not better) performance for the same (or less) price.
The MI50 may be considered deprecated in newer releases, but it seems to work fine in my experience. I have a Radeon VII in my workstation (which shares the same architecture) and I host the MI60 test machine for Debian AI Team. I haven't had any trouble with them.
I wrote that patch. It's not actually used for MI50/MI60 in any of the Debian system packages, since Debian builds for gfx906 rather than using the gfx900 fallback path that patch provides. Debian is not relying on any special patches to enhance gfx906 support. That architecture is the same as upstream.
Now, for some other GPU architectures, you're absolutely right. There are indeed important patches in Debian that enable its extra-wide hardware compatibility.
I don’t think the mi60 has reached deprecated status yet (the last time I look at prices for the mi50 and mi60, the mi60 was something like 3x expensive, and I think thats because its still officially supported), but I’ll check this all out. Thanks.
The MI60 is basically just a faster MI50 with more memory. They were deprecated together. It's plausible there could be small firmware or driver differences that cause issues in one but not the other, but I think that's unlikely.
I’m constantly baffled and amused on why AMD keeps majorly failing at this.
Either the management at AMD is not smart enough to understand that without the computing software side they will always be a distant number 2 to NVIDIA, or the management at AMD considers it hopeless to ever be able to create something as good as CUDA because they don’t have and can’t hire smart enough people to write the software.
Really, it’s just baffling why they continue on this path to irrelevance. Give it a few years and even Intel will get ahead of them on the GPU side.
If I were Jensen, I would snap up all the GPU software experts I possibly could, and put them to work improving the CUDA ecosystem. I'd also spin up a big research group to further fuel the CUDA pipeline for hardware, software, and application areas.
Which is exactly what NVIDIA seems to be doing.
AMD's ROCm software group seems far behind, is probably understaffed, and probably is paid a fraction of what NVIDIA pays its CUDA software groups.
AMD also has to catch up with NVlink and Spectrum-X (and/or InfiniBand.)
AMD's main leverage point is its CPUs, and its raw GPU hardware isn't bad, but there is a long way to go in terms of GPU software ecosystem and interconnect.
> I’m constantly baffled and amused on why AMD keeps majorly failing at this.
i wonder if you've considered the possibility that there's some component/dimension of this that you're simply unaware of? that it's not as straightforward as whatever reductive mental model you have? is that even like within the universe of possibilities?
Furthermore, don't people use PyTorch (and other libraries? I'm not really clear on what ML tooling is like, it feels like there's hundreds of frameworks and I haven't seen any simplified list explaining the differences. I would love a TLDR for this) and not ROCm/CUDA directly anyways? So the main draw can't be ergonomics, at least.
However you want to dissect this specific issue, I'd generally consider this a positive step and nice to see it hit the front page.
https://www.reddit.com/r/ROCm/comments/1i5aatx/rocm_feedback...
https://www.reddit.com/user/powderluv/
I think AMDs offer was fair (full remote access to several test machines), then again just giving tinycorp the boxes on their terms with no strings attached as a kind of research grant would have earned them some goodwill with that corner of the community.
Either way both parties will continue making controversial decisions.
Another neocloud, that is funded directly by AMD, also offered to buy him boxes. He refused. It had to come from AMD. That's absurd and extortionist.
Long thread here: https://x.com/HotAisle/status/1880467322848137295
It's like asking a tire manufacturer to give you a car for free.
Just uploaded some pictures of how complex these machines really are...
https://imgur.com/gallery/dell-xe9860-amd-mi300x-bGKyQKr
I'm on the wrong side of the Twitter wall to read the source, but that doesn't sound absurd. Extortionist, maybe. Hotz's major complaint (last time I checked, anyway) is pretty close to one I have - AMD appears to have between little and no strategic interest in consumer grade graphics cards having strong GPGPU support leading to random crashes from the kernel drivers and a certain attitude of "meh, whatever" from AMD corporate when dealing with that.
I doubt any specific boxes or testing regime are his complaint, he'd be much more worried about whether AMD management have any interest in companies like his succeeding. Third parties providing some support doesn't sound like it'd cut it. The process of being burned by AMD leaves one a little leery of any alleged support without some serious guarantees that more major changes are afoot in their management view.
This reads as incredibly entitled. AMD owes him nothing, especially if he's opposed to the leadership's vision and being belligerent about it.
There is maybe 1 or 2 companies with enough cachet to demand management changes at a supplier like AMD - and they have market caps in the trillions
That's worth 100M. And they won't even send us 2 ~100k boxes. In what world does that make sense, except in a world where decisions are made based on pride instead of ROI. Culture issue."
https://x.com/__tinygrad__/status/1879620242315317304
Take the free offer, prove everyone wrong and then start to tell us how great you are. https://x.com/HotAisle/status/1880507210217750550
He picked his problem better. The whole reason that tinygrad is, well, tiny, is that it limits the amount of overhead to onboard people and perform maintenance and rewrites. My strong impression is that the ROCm codebase is simply much too large for AMD's dev resources. You're trying to race NVidia on their turf with less resources. It's brave, but foolish.
I can see how Tinygrad could succeed. The story makes sense. AMD's doesn't, neither logically nor empirically. NVidia would have to seriously fumble.
Worked for AMD in the CPU market.
That said I'm deeply worried about anyone whose based their company on amd gpus. The only reason why they do well in hpc is because there's an army of dreadfully underpaid and over performing grand students to pick up the slack from AMD. Trying to do that in a corporate environment is company suicide.
However it would also raise future revenue, which should be what's reflected by the market.
So it would still be something that's good for the company, but not nearly 100B good.
I don’t really think TinyCorp has anything to offer AMD.
All because they went with a boneheaded decision to require per-device code compilation (gfx1030, gfx1031...) instead of compiling to an intermediate representation like CUDA's PTX. Doubly boneheaded considering the graphics API they developed, Vulkan, literally does that via SPIR-V!
It is clear that AMD's approach isn't working and they need to change their balance.
(Okay, maybe their super high end unobtanium-level GPUs are better hardware-wise. Don't know, don't care about enterprise-only hardware that is unbuyable by mere mortals.)
The fact support still isn't there, they've had 2 years since Stable Diffusion to get a serious team up and shipping and they still don't even have enough resources pointed at this to not have to be asking what should be prioritized.
The only way to fix their culture/priorities is to stop buying their cards.
But that's why my business exists... https://news.ycombinator.com/item?id=42759191
Windows support is also bad, but supports significantly more than one GPU.
The GitHub discussion page in the title lists RX 6800 (and a bunch of RX 7xxx GPUs) as supported, and some lower-end RX 6xxx ones as supported for runtime. The same comment also links to a page on the AMD website for a "compatibility matrix" [1].
That page only shows RX 7900 variants as supported on the consumer Radeon tab. On the workstation side, Radeon Pro W6800 and some W7xxx cards are listed as supported. It also suggests to see the "Use ROCm on Radeon GPU documentation" page [2] if using ROCm on Radeon or Radeon Pro cards.
That link leads to a page for "compatibility matrices" -- again. If you click the link for Linux compatibility, you get a page on "Linux support matrices by ROCm version" [3].
That "by ROCm version" page literally only has a subsection for ROCm 6.2.3. It only lists RX 7900 and Pro W7xxx cards as supported. No mention of W6800.
(The page does have an unintuitively placed "Version List" link through which you can find docs for ROCm 5.7 [4]. Those older docs are no more useful than the 6.2.3 ones.)
Is RX 6800 supported? Or W6800? Even the amd.com pages seem to contradict each other on the latter.
Maybe the pages on the AMD site only list official production support or something. In any case it's confusing as hell.
Nothing against the GitHub page author who at least seems to try and be clear but the official documentation leaves a lot to be desired.
[1] https://rocm.docs.amd.com/projects/install-on-linux/en/lates...
[2] https://rocm.docs.amd.com/projects/radeon/en/latest/docs/com...
[3] https://rocm.docs.amd.com/projects/radeon/en/latest/docs/com...
[4] https://rocm.docs.amd.com/projects/radeon/en/docs-5.7.0/docs...
Compare this to nvidia where I just imported the go nvml library and it built the cgo code and automatically links to nvidia-ml.so at runtime.
By the time an (consumer) AMD device is supported by ROCm it'll only have a few years of ROCm support left before support is removed. Lifespan of support for AMD cards with ROCm is very short. You end up having to use Vulkan which is not optimized, of course, and a bit slower. I once bought an AMD GPU 2 years after release and 1 year after I bought it ROCm support was dropped.
Note that these are not the packages distributed by AMD. They are the packages in the OS repositories. Not all the ROCm packages are there, but most of them are. The biggest downside is that some of them are a little old and don't have all the latest performance optimizations for RDNA 3.
Those operating systems will be around for the next decade, so that should at least provide one option for users of older hardware.
For example, my 780m gets 1-2 inferences from llama.cpp before dropping off the bus due to a segfault in the driver. It’s a bad enough lockup that linux can’t cleanly shutdown and will hang under hard rebooted.
I have dozens of different AMD GPUs and I personally host most of the Debian ROCm Team's continuous integration servers. Over the past year, I have worked together with other members of the Debian project to ensure that every potentially affected ROCm library is tested on every discrete consumer AMD GPU architecture since Vega whenever a new version of a package is uploaded to Debian.
FWIW, Framework Computers donated a few laptops to Debian last year, which I plan to use to enable the 780m too. I just haven't had the time yet. Fedora has some patches that add support for that architecture.
Support is coming in three months!
To
This card is ancient and will be no longer developed for. Buy our brand new card released in three months!
Every damned time.
5 years is not very long tbh.
AMD are merging the architectures (UDNA) like nVidia but it's not going to be before 2026. (https://wccftech.com/amd-ryzen-zen-6-cpus-radeon-udna-gpus-u...)
With situations like this, its not hard to see why Nvidia totally dominates in the compute/ai market.
AMD Instinct is also more power efficient and has comparable (if not better) performance for the same (or less) price.
https://salsa.debian.org/rocm-team/rocm-hipamd/-/raw/d6d2014... (one patch of many)
Now, for some other GPU architectures, you're absolutely right. There are indeed important patches in Debian that enable its extra-wide hardware compatibility.
Either the management at AMD is not smart enough to understand that without the computing software side they will always be a distant number 2 to NVIDIA, or the management at AMD considers it hopeless to ever be able to create something as good as CUDA because they don’t have and can’t hire smart enough people to write the software.
Really, it’s just baffling why they continue on this path to irrelevance. Give it a few years and even Intel will get ahead of them on the GPU side.
Which is exactly what NVIDIA seems to be doing.
AMD's ROCm software group seems far behind, is probably understaffed, and probably is paid a fraction of what NVIDIA pays its CUDA software groups.
AMD also has to catch up with NVlink and Spectrum-X (and/or InfiniBand.)
AMD's main leverage point is its CPUs, and its raw GPU hardware isn't bad, but there is a long way to go in terms of GPU software ecosystem and interconnect.
i wonder if you've considered the possibility that there's some component/dimension of this that you're simply unaware of? that it's not as straightforward as whatever reductive mental model you have? is that even like within the universe of possibilities?