That would only matter (to me, at least) if those Apple chips were propping up an open platform that suits my needs. As things stand today, procuring an M chip represents a commitment to the Apple software ecosystem, which Apple made abundantly clear doesn't optimize for user needs. Those marginally faster CPU cycles happen on a time scale that anyway can't offset the wasted time fighting MacOS and re-building decades-long muscle memory, so thanks but no thanks.
Sure. Insofar as Apple Silicon beats these things, "I'll take less powerful hardware if it means I'm not stuck with the Apple ecosystem" is a perfectly reasonable tradeoff to make. Two things, though.
First, I don't like making blind tradeoffs. If what I need (for whatever reason) is a really beefy ARM CPU, I'd like to know what the "Apple-less tax" costs me (if anything!)
Second, the status quo is that Apple Silicon is the undisputed king of ARM CPU performance, so it's the obvious benchmark to compare this thing against. Providing that context is just basic journalistic practice, even if just to say "but it's irrelevant because we can't use the hardware without the software".
Why do you need ARM? There is nothing magic, most CPUs are an internal instruction set with a decoder on top. bad as x86 is, decoding is not the issue. they can make lower power use x86 if they want. They can also make mips or riskv chips that are good.
There's nothing special about ARM, sure. Hence "for whatever reason". Still, ARM is a known quantity, and the leading alternative to x86 for desktop CPUs. The article is titled "reaching desktop performance".
We know how Apple's hardware performs on native workloads. We know how it performs emulating x86 workloads (and why). Surely "... and this is how this hardware measures up against the other guys trying to achieve the exact same thing" is a relevant comparison? I can't be the only person who reads "reaching desktop performance" and wonders "you mean comparable to the M1, or to the M3 Ultra?"
This CPU will end up in products that are competing against Apple's in the market. People will look at and choose between two products with X925 or M4/5. It's a very obvious parallel and a big oversight for the article.
For better or worse if you make a (high end) consumer CPU it will be judged against the M-series, just like if you make a high end phone it will be judged against the iPhone.
The X925 core is used in chips like the gb10 for the nvidia dgx spark. So it is relevant to compare to apple silicon performance imo. The mac studio is pretty much a competitor to it.
All he is saying: We currently have products in a similar product category (arm based desktop computers) that are widely used and have known benchmark scores (and general reviews) and it would make sense if I publish a new cpu for the same product category ("Reaching Desktop Performance" implies that) that I'd compare it to the known alternatives.
In the end you can just run Asahi on your macbook, the OS is not that relevant here. A comparison to macbooks running Asahi Linux would be fine.
> But why would an article address _their_ specific usecase?
amelius, if anyone had specific requirements, it was you with your "systems for in-flight entertainment".
OP asked a very reasonable question for a very generic comparison to the 800-pound gorilla in the consumer CPU world in general, and ARM CPU world in particular.
If the article can reference AMD's Zen 5 cores and Intel's Lion/Sunny Cove, they could have made at least a brief reference to M-series CPUs. As a reader and potential buyer of any of them, I find it would have been a very useful comparison.
When purchasing any ARM based computer a key question for me, is how many of those can I purchase for the cost of a Mac mini, and how many Mac mini can I purchase for the cost of that, and does that have working drivers...
And the answer there may absolutely be "none", which equates to doing away with ARM, which is totally fine. I don't have a horse in the x86 vs ARM race, especially since it's pretty clear that performance per watt stands within a narrow margin across arches on recent nodes.
FWIW, Apple Virtualization framework is fantastic, and Rosetta 2 is unmatched on other Arm desktops where QEMU is required. For example, you can get Vivado working on Debian guest, macOS host trivially like that.
Pretty simply because I don't want to use MacOS, its terrible window management, quirks and idiosyncrasies. In your comparison, my gripe wouldn't be about the hassle of finding 3rd-party compatible batteries, but about the daily handling of the Makita while knowing the DeWalt to be more ergonomic and better suited to my needs.
As someone who uses Linux, macOS and Windows interchangeably, I'm curious to know what you're using.
I learned to live with macOS, but I also like and use Gnome, which many Linux-only people hate. I tried most WMs on Linux, like Hyprland, Sway, i3, but none ever felt worth the config hassle when compared to the sane defaults of Gnome.
Those are of almost zero use for people wishing to run Linux etc.
Yes, Asahi exists, and props to the developers, but I don't think I'm alone in being unwilling to buy hardware from a manufacturer who obviously is not interested in supporting open operating systems
Same, I wish Chips and Cheese would compare some of these cores to Apple Silicon, especially in this case where they're talking about another ARM core.
A few years ago they were writing articles about Apple Silicon.
The core they're talking about was released about two years ago. nvidia stuck it on their grace blackwell (e.g. DGX Spark) as basically a coordinator on the system.
M5 has about a 32% per core advantage, though the DGX obviously has a much richer power budget so they tossed in 10 high performance cores and 10 efficiency cores (versus the 4 performance and 6 efficiency in the latter). Given the 10/10 vs 4/6 core layouts I would expect the former to massively trounce the latter on multicore, while it only marginally does.
Samsung used the same X925 core in their Exynos 2500 that they use on a flip phone. Mediatek put it in a couple of their chips as well.
"Reaching desktop" is always such a weird criteria though. It's kind of a meaningless bar.
You make a valid point; Apple has indeed set a high standard for ARM cores in performance. A comparison with their M4 and M5 cores would provide valuable context for these new developments.
Without being a cpu geek, a lot of the branch prediction details go over my head, however generally a good review. I liked the detail of performance on more complex workloads where IPC can get muddy when you need more instructions.
I feel these days however, for any comparison of performance, power envelope needs to be included (I realise this is dependent on the final chip)
ARM Cortex-X925 achieves indeed a very good IPC, but it has competitive performance only in general-purpose applications that cannot benefit from using array operations (i.e. the vector instructions and registers). The results shown in the parent article for the integer tests of SPEC CPU2017 are probably representative for Cortex-X925 when running this kind of applications.
While the parent article shows AMD Zen 5 having significantly better results in floating-point SPEC CPU2017, these benchmark results are still misleading, because in properly optimized for AVX-512 applications the difference between Zen 5 and Cortex-X925 would be much greater. I have no idea how SPEC has been compiled by the author of the article, but the floating-point results are not consistent with programs optimized for Zen 5.
One disadvantage of Cortex-X925 is having narrower vector instructions and registers, which requires more instructions for the same task and it is only partially compensated by the fact that Cortex-X925 can execute up to 6 128-bit instructions per clock cycle (vs. up to 4 vector instructions per clock cycle for Intel/AMD, but which are wider, 256-bit for Intel and up to 512-bit for Zen 5). This has been shown in the parent article.
The second disadvantage of Cortex-X925 is that it has an unbalanced microarchitecture for vector operations. For decades most CPUs with good vector performance had an equal throughput for fused multiply-add operations and for loads from the L1 cache memory. This is required to ensure that the execution units are fed all the time with operands in many applications.
However, Cortex-X925 can do at most 4 loads, while it can do 6 FMAs. Because of this lower load throughput Cortex-X925 can reach the maximum FMA throughput only much less frequently than the AMD or Intel CPUs. This is compounded by the fact that achieving better FMA to load ratios requires more storage space in the architectural vector registers, and Cortex-X925 is also disadvantaged for this, by having 4-time smaller vector registers than Zen 5.
> While the parent article shows AMD Zen 5 having significantly better results in floating-point SPEC CPU2017, these benchmark results are still misleading, because in properly optimized for AVX-512 applications the difference between Zen 5 and Cortex-X925 would be much greater. I have no idea how SPEC has been compiled by the author of the article, but the floating-point results are not consistent with programs optimized for Zen 5.
The arithmetic intensity of most SPECfp subtests is quite low. You see this wall because it ends up reaching bandwidth limitations long before running out of compute on cores with beefy SIMD.
Auto vectorizing optimizers have gotten quite good. If you are using integers it often just happens whether you think about it or not. With floats unless you specify fast math you will need to use wide types to let it know you don't care about floating point addition order.
If ARM starts dominating in desktop and laptop spaces with a quite different set of applications, might we start seeing more software bugs around race conditions? Caused by developers writing software with X86 in mind, with its differing constraints on memory ordering.
That's a possibility. Some code still assumes (without realizing!) x86 style ordered loads and stores. This is called a strong memory model, specifically TSO, Total Store Order. If you tell x86 to execute "a=1; b=2;", it will always store value to 'a' first. Of course compilers might reorder stores and loads, but that's another matter.
ARM is free to reorder stores and loads. This is called a weak memory model. So unless it's explicitly told to the compiler, like C++ memory_order::acquire and memory_order::release, you might get invalid behavior. Heisenbugs in the worst case.
This is actually one reason I feel like developing my systems level stuff on ARM64 instead of x86 (I have a DGX Spark box) is not a bad idea. Building lower level concurrent data structures, etc. it just seems wiser to have to deal with this more immanently.
That said, I've never actually run into one of these issues.
If you go around your OS yes that could be the case but you can already have issues using the application from machine to machine with the same OS having different amounts of RAM and different CPU's. But I am not an expert in these matters.
The major issue is these days most software is electron based or a webapp. I miss the days of 98/XP, where you'd find tons of desktop software. A PC actually felt something that had a purpose. Even if you spin up a XP/98(especially 98/2000 VM) now, you'd see the entire OS feels something that you can spend some time on. Nowadays most PCs feel like a random terminal where I open the browser and do some basic work(except for gaming ofcourse).
I really hate the UX of win 11 , even 10 isn't much better compared to XP.
I really hope we go back to that old era.
The issue is that the C memory model allows more behaviours than the memory model of x86-64 processors. You can thus write code which is incorrect according to the C language specification but will happen to work on x86-64 processors. Moving to arm64 (with its weaker memory model than x86-64) will then reveal the latent bug in your program.
And “happen to work on x86-64 processors” also will depend on the compiler. If you write
*a = 1;
*b = 'p';
both the compiler and the CPU can freely pick the order in which those two happen (or even execute them in parallel, or do half of one first, then the other, then the other half of the first, but I think those are hypothetical cases)
x86-64 will never do such a swap, but x86-64 compilers might.
If you write
*a = 1;
*b = 2;
, things might be different for the C compiler because a and b can alias. The hardware still is free to change that order, though.
OpenBSD famously keeps a lot of esoteric platforms around, because running the same code on multiple architectures reveal a lot of bugs. At least that was one of the arguments previously.
You don't need to be writing assembly. Anything sharing memory between multiple threads could have bugs with ARM's memory model, even if written in C, C++, etc.
For rustaceans missing that /s, if you just use Relaxed ordering everywhere and you aren't sure why, but hey tests pass on x86, then yeah on arm it may have a problem. On x86 it effectively is SeqCst even if you specify Relaxed.
It has some interesting conclusions, such as that it covers certain AVX512 gaps:
"AVX512 plugs many of the holes that SSE had, whilst SVE2 adds more complex operations (such as histogramming and bit permutation), and even introduces new ‘gaps’ (such as 32/64-bit element only COMPACT, no general vector byte left-shift, non-universal predication etc)."
And also that rusty x86 developers might face skill issues:
"Depending on your application, writing code for SVE2 can bring about new challenges. In particular, tailoring fixed-width problems and swizzling data around vectors may become much more difficult when the length is unknown."
Better favor as much as possible RISC-V implementations.
But, I don't know if there are already good modern-desktop-grade RISC-V implementations (in the US, Sifive is moving fast as far as I know)... and the hard part: accessing the latest and greatest silicon process of TMSC, aka ~5GHz.
Those markets are completely saturated, namely at best, it will be very slow unless something big does happen: for instance AMD adapts its best micro-architecture to RISC-V (ISA decoding mostly), etc.
And if valve start to distribute a client with a strong RISC-V game compilation framework...
This is kind of a solution in search for a problem. RISC-V will grow only if people find some value in it. If it solves their actual problems in ways that other architectures can't.
Yeah, the primary reason RISC-V exists is political (the desire to have an "open source" CPU architecture). As noble as that may be, it's not enough to get people or companies to use (or even manufacture!) it. It'll either be economical (costs) and/or performance (including efficiency) that drives people.
It took ARM decades to get to where it is, and that involved a long stint in low-margin niche applications like embedded or appliances where x86 was poorly suited due to head and power consumption.
I don't think that's the primary reason there's momentum there. The reason is to avoid ARM licensing fees and IP usage restrictions.
I think you'll see ever more accelerating RISC-V adoption in China if the United States continues on its "cold war" style mentality about relations with them.
That said we're a long long way from Actually Existing RISC-V being at performance parity with ARM64, let alone x86.
First, I don't like making blind tradeoffs. If what I need (for whatever reason) is a really beefy ARM CPU, I'd like to know what the "Apple-less tax" costs me (if anything!)
Second, the status quo is that Apple Silicon is the undisputed king of ARM CPU performance, so it's the obvious benchmark to compare this thing against. Providing that context is just basic journalistic practice, even if just to say "but it's irrelevant because we can't use the hardware without the software".
We know how Apple's hardware performs on native workloads. We know how it performs emulating x86 workloads (and why). Surely "... and this is how this hardware measures up against the other guys trying to achieve the exact same thing" is a relevant comparison? I can't be the only person who reads "reaching desktop performance" and wonders "you mean comparable to the M1, or to the M3 Ultra?"
I am looking for a CPU.
I don't want to confront my users with "Please enter your Apple ID" or any other unexpected messages that I have no control over.
Is Apple M series an option for me?
For better or worse if you make a (high end) consumer CPU it will be judged against the M-series, just like if you make a high end phone it will be judged against the iPhone.
All he is saying: We currently have products in a similar product category (arm based desktop computers) that are widely used and have known benchmark scores (and general reviews) and it would make sense if I publish a new cpu for the same product category ("Reaching Desktop Performance" implies that) that I'd compare it to the known alternatives.
In the end you can just run Asahi on your macbook, the OS is not that relevant here. A comparison to macbooks running Asahi Linux would be fine.
amelius, if anyone had specific requirements, it was you with your "systems for in-flight entertainment".
OP asked a very reasonable question for a very generic comparison to the 800-pound gorilla in the consumer CPU world in general, and ARM CPU world in particular.
If the article can reference AMD's Zen 5 cores and Intel's Lion/Sunny Cove, they could have made at least a brief reference to M-series CPUs. As a reader and potential buyer of any of them, I find it would have been a very useful comparison.
This is not possible with Apple parts.
That's what my example was about. It was only specific because I wanted to have a concrete example.
I don't see how that's holding you back from using these tools for your work anymore than using a Makita power tool with LXT battery pack.
I learned to live with macOS, but I also like and use Gnome, which many Linux-only people hate. I tried most WMs on Linux, like Hyprland, Sway, i3, but none ever felt worth the config hassle when compared to the sane defaults of Gnome.
Yes, Asahi exists, and props to the developers, but I don't think I'm alone in being unwilling to buy hardware from a manufacturer who obviously is not interested in supporting open operating systems
So they don’t actively help (or event make it easy by providing clear docs), but they do still do enough to enable really motivated people
A few years ago they were writing articles about Apple Silicon.
This is an industry blog, not a consumer oriented blog.
Anyway, here it is in GB10 form-
https://browser.geekbench.com/v6/cpu/14078585
And here is a comparable M5 in a laptop-
https://browser.geekbench.com/macs/macbook-pro-14-inch-2025
M5 has about a 32% per core advantage, though the DGX obviously has a much richer power budget so they tossed in 10 high performance cores and 10 efficiency cores (versus the 4 performance and 6 efficiency in the latter). Given the 10/10 vs 4/6 core layouts I would expect the former to massively trounce the latter on multicore, while it only marginally does.
Samsung used the same X925 core in their Exynos 2500 that they use on a flip phone. Mediatek put it in a couple of their chips as well.
"Reaching desktop" is always such a weird criteria though. It's kind of a meaningless bar.
And Qualcomm.
I feel these days however, for any comparison of performance, power envelope needs to be included (I realise this is dependent on the final chip)
While the parent article shows AMD Zen 5 having significantly better results in floating-point SPEC CPU2017, these benchmark results are still misleading, because in properly optimized for AVX-512 applications the difference between Zen 5 and Cortex-X925 would be much greater. I have no idea how SPEC has been compiled by the author of the article, but the floating-point results are not consistent with programs optimized for Zen 5.
One disadvantage of Cortex-X925 is having narrower vector instructions and registers, which requires more instructions for the same task and it is only partially compensated by the fact that Cortex-X925 can execute up to 6 128-bit instructions per clock cycle (vs. up to 4 vector instructions per clock cycle for Intel/AMD, but which are wider, 256-bit for Intel and up to 512-bit for Zen 5). This has been shown in the parent article.
The second disadvantage of Cortex-X925 is that it has an unbalanced microarchitecture for vector operations. For decades most CPUs with good vector performance had an equal throughput for fused multiply-add operations and for loads from the L1 cache memory. This is required to ensure that the execution units are fed all the time with operands in many applications.
However, Cortex-X925 can do at most 4 loads, while it can do 6 FMAs. Because of this lower load throughput Cortex-X925 can reach the maximum FMA throughput only much less frequently than the AMD or Intel CPUs. This is compounded by the fact that achieving better FMA to load ratios requires more storage space in the architectural vector registers, and Cortex-X925 is also disadvantaged for this, by having 4-time smaller vector registers than Zen 5.
The arithmetic intensity of most SPECfp subtests is quite low. You see this wall because it ends up reaching bandwidth limitations long before running out of compute on cores with beefy SIMD.
ARM is free to reorder stores and loads. This is called a weak memory model. So unless it's explicitly told to the compiler, like C++ memory_order::acquire and memory_order::release, you might get invalid behavior. Heisenbugs in the worst case.
That said, I've never actually run into one of these issues.
x86-64 will never do such a swap, but x86-64 compilers might.
If you write
, things might be different for the C compiler because a and b can alias. The hardware still is free to change that order, though.Only found this which talks about performance-per-area (PPA) and performance-per-clock ()I assume cycle) (PPC): https://www.reddit.com/r/hardware/comments/1gvo28c/latest_ar...
It has some interesting conclusions, such as that it covers certain AVX512 gaps:
"AVX512 plugs many of the holes that SSE had, whilst SVE2 adds more complex operations (such as histogramming and bit permutation), and even introduces new ‘gaps’ (such as 32/64-bit element only COMPACT, no general vector byte left-shift, non-universal predication etc)."
And also that rusty x86 developers might face skill issues:
"Depending on your application, writing code for SVE2 can bring about new challenges. In particular, tailoring fixed-width problems and swizzling data around vectors may become much more difficult when the length is unknown."
Better favor as much as possible RISC-V implementations.
But, I don't know if there are already good modern-desktop-grade RISC-V implementations (in the US, Sifive is moving fast as far as I know)... and the hard part: accessing the latest and greatest silicon process of TMSC, aka ~5GHz.
Those markets are completely saturated, namely at best, it will be very slow unless something big does happen: for instance AMD adapts its best micro-architecture to RISC-V (ISA decoding mostly), etc.
And if valve start to distribute a client with a strong RISC-V game compilation framework...
It took ARM decades to get to where it is, and that involved a long stint in low-margin niche applications like embedded or appliances where x86 was poorly suited due to head and power consumption.
I think you'll see ever more accelerating RISC-V adoption in China if the United States continues on its "cold war" style mentality about relations with them.
That said we're a long long way from Actually Existing RISC-V being at performance parity with ARM64, let alone x86.