Apple is actually interesting. They are one of the few companies with a chip / PC play with real power AND basically no play I'm the hyperscalar market.
That means they're actually incentivized at least short term, to benefit PCs becoming strong enough to do local LLMs. Which makes this play make even more sense. Though, I've been saying for a while that the local AI inflectiom point is the death knell for these frontier labs.
> Though, I've been saying for a while that the local AI inflectiom point is the death knell for these frontier labs.
"Death knell" is a touch hyperbolic. Hardware that can only run quantized models that take up GBs in VRAM falls short of even an A100 (by almost an order of magnitude[0]), which in turn falls short of what an 8xH100 cluster can do (also by another order of magnitude[0]).
I'm an avid believer in local LLMs, but I cannot deceive myself - data center accelerators will win on power dissipation numbers alone[1], even when giving generous allowances for higher efficiency on Apple chips - and assuming the Apple-efficiency advantage persists on the same TSMC process node.
0. Based on my unscientific fine-tuning training experiments across local and rented GPUs. YMMV for inference.
1. Unless Apple surprises everyone and brings back the XServe with M7, if not, then laptop and desktop for factors simply can't dump heat fast enough to compete head-to-head, and will be designed for lower input wattage.
Doesn’t need to be a winner head to head. If it can do 90% of the tasks the big boys do, at 50% speed, for virtually no extra overhead cost save for the power consumed by a prompt - that’s gonna work for a lot of people. And that’s also basically where we’re at today. Qwen3.6 35b running quantized on 10 year old hardware solves basically all of my uses cases for agents except for coding.
The frontier models are faster, and better at coding, but not so much that i’ll pay $200/month for them.
Consider this. One of the smallest Qwen models (4B parameters) powers my home automation voice assistant, and runs on CPU alone at >20 tok/s. It is enough for that use case, and could be made even better/faster with a modest GPU. It isn't as smart as some cloud-connected thingamajig, but I would never allow a literal Google or Amazon bug in my home. Huge SOTA models aren't relevant everywhere. Most people use LLMs for rather trivial tasks such as finding typos or drafting text.
But with Apple's AFM 3 architecture, we might end up with huge SOTA adjacent on devices with limited RAM.
They use a technique where you only load between 1B and 4B of a 20B dense model for an entire prompt run, not token by token like a MoE, and use mostly the low power ANE instead of GPU cores.
Now, imagine if/when they scale up to 100B or more? On a chip using 2W?
> If it can do 90% of the tasks the big boys do, at 50% speed
I want to live in this world too, but these numbers, as of today, are very aspirational and far removed from reality.
I'm no tokenmaxxer; I find my modest local setup useful, I also know the limitations, it's slow and it sucks (relatively) at high-level and/or long-context planning, compared to frontier models. Only a minority of my prompts are max-effort - its not all I do, but, it also means frontier labs aren't dying any time soon
Consider also that right now LLMs run slowly enough you can watch them think. I've seen a demo of an LLM running at an absurdly high speed and it reminds me of when I moved from a 2400 baud modem to a 14.4 - BBS screens that I could watch draw were all of a sudden nigh-interactive. Faster-than-realtime video generation is also coming, and will also continue to require huge hardware for a long while yet.
I love local models - I have a machine at home that runs a few for me and it's a lot of fun - but for the time being they are not super trustworthy on tool calls and staying on script. Another year or so might change all that!
I’m sure you’re right, for the things you are asking of an llm, just as I am right about the things I am asking of an llm.
The real question is, what are 90% of people going to ask llms to do. I’d argue mostly it’s going to be stuff that works-now or almost-works on local models, but that’s just an opinion. It also depends on the frontier models hitting a wall of steeply diminishing returns, since they set the expectations for all of this stuff - my gut says that’s happened already they just won’t admit it for a while - but we’ll see.
This is what makes sense for me as well. All I need a local model is for playing with simple graphics: no gradients, at most ten colours which I can push through VTracer to get an SVG. Draw Things does the job, usually in 120 seconds or less.
Sometimes, I need a quick throwaway bit of python. That can take 30 minutes of my time.
The established AI players have no financial interest to make LLM available locally. They aren't hardware companies and if running LLM requires paying them to host the models as well then they can naturally capture more of the value chain = more revenue.
Apple is the only player here where it would play into their natural hardware incentive to get you to pay more for better hardware. It would make sense for them to find a way to run LLM locally (eg, newer architectures that others here have pointed out).
Is it hyperbolic though? One of the best things about the compute and memory shortage is that people are going to insane lengths to optimize things to run on lower memory / lower compute devices. If we keep this up for a while and then ramp up memory and local compute production, that AI inflection point may actually come.
The big question for local LLMs is whether there is a 100 tok/s model which requires less than 16 GB of memory and is competitive on most tasks with the cloud models.
There is some signal that this is possible through both hardware innovation and training/data improvements.
Cloud models have their own constraints - I can’t have opus4.8 spend 4 hours on a deep research question I had in the shower without spending money. I can’t do real time video game upscaling and graphics work in the cloud period.
A laptop is about an order of magnitude cheaper than a cloud server thanks to economies of scale, uptime requirements, and other factors.
if you do the electricity math you'll see that you pay more on local models while getting less (local is more heavily quantized) compared with OpenRouter.
I'm not talking local Gemma/Qwen vs cloud Opus, but against OpenRouter same Gemma/Qwen
there are reasons to run local - privacy, availability, but cost is not one of them
The thing is, with the level of hard investment AI vendors have, even a small reduction of their addressable market is significant. They aren’t profitable, and inference is getting commoditized fast, so even if they eventually become profitable (not via financial engineering) they won’t be able to have good margin. The pressure of both open models AND local models is pretty bad imho
We'll likely see a transformation in how frontier models are trained as a result of a push towards local inference. While it seems unlikely now, given current pricing for RAM, in 10-15 years it's not unthinkable to assume we could see individual machines with 10-12TB (and well beyond that) of RAM which are accessible to the GPU. Min/max system RAM increased a LOT from 2010-2025 and largely because it was cheap. Once the hyperscalers aren't generating revenue for the RAM manufacturers, I wouldn't be surprised to see a massive push towards consumers in order to maintain gross profit. Not to mention new players who enter the market because the margins are measurably absurd right now.
At some point there will be diminishing returns towards the "just throw more RAM at it" approach the current frontier models are taking. Commoditization is just as inevitable as it ever was... and in doing so will enable actual leaps of what AI/ML is capable of. That's not to say there won't be a place for 99.999999% accurate vs 99.99999% but those cases will be limited and likely prime to disruption based on real innovation vs access to capital.
Indeed. Local models becoming available and halfway decent don't obviate the laws of scale. And because there's no ceiling to what scaling more will buy you in terms of capability, there's no reason not to scale more, there's no incentive for billionaires not to grab all the fab capacity they can.
Enjoy paying $1000 or more for a little 4 GiB cloud terminal that connects you to all your online accounts where all your actual work gets done. This is the future.
There's a limit that won't be breached without a fundamental breakthrough in physics of computation, but we're not there yet by a long shot. You can train bigger models, faster, and infer with them faster and more precisely, by throwing more compute at the problem for the foreseeable.
I worked at a hyperscaler when the M1 came out. A MacBook Air M1, running a Linux VM was faster and more energy efficient than anything we had in the data center.
It's plausible but is the Apple Tax for a 1TB memory machine on top of current memory prices really worth it? I paid around $4000 for 4090m laptop with 16GB VRAM back in 2023, it's great but DoA for even quantized LLMs. I can run SLMs and fine tune it but that's it.
We need one of those specialized inference chip startups to succeed and a PC manufacturer willing to bet on them against Nvidia for the local AI to find mass market appeal.
I recently bought a Mac mini M4 16 GB - mostly to run Immich. I assumed I needed a Linux box. After a lot of researched I was quite surprised that the mac was the cheapest option. So not always an Apple tax.
>" After a lot of researched I was quite surprised that the mac was the cheapest option. So not always an Apple tax."
Apple has always been the most cost effective choice for the value you get going all the way back to the Apple II, it's just that the floor of that cost has always been high. Anyone who thinks otherwise is a just a fanboy one way or the other.
That's true only for the entry level macs. My M4 Mac Mini has the best Performance/value. But my workstation laptop with 32 cores, 96GB DDR5, Nvidia GPU costs lesser than Macs with lesser performance; not to mention I upgraded the RAM post purchase.
That's how much many developers currently spend on tokens - every day. Whatever "Apple Tax" applies to a device that can run a capable model offline will amortise itself in a blink.
>Whatever "Apple Tax" applies to a device that can run a capable model offline will amortise itself in a blink.
Current high-end Mac Studio with 32-core M3 Ultra chip and 96 GB of memory is $6800, 96GB is not enough to run GLM 5.2 without extreme quantization or stacking HW; but for the sake of discussion let's run quantized version on a single high end Mac Studio.
GLM 5.2 Max plan costs $ 112/m, so it would take ~60 months to recover the costs assuming the machine was bought just for AI. By then the current AI landscape would have changed drastically.
I use local AI on both Linux and Mac every single day, there's freedom, privacy and peace of mind in running the model locally. But I feel cost/value of local AI is overblown.
I didn't have a single Apple device in my house until a month ago when I bought a Neo. The last Apple devices I had before that were an iPod Nano and a PowerMac G5 many many years ago.
Apple has pretty good competition in every segment with the exception of maybe the iPad, but I'm not a tablet user.
Some folks like to have a computing environment free of proprietary influences and extremely strong vendor lock-in. I cannot claim to posses any apple devices.
I wasn't thinking of Asahi. Just pointing out that you can run all the standard unix/open source tools and apps on Mac OS (vi, git, qgis, blender, vsc, python, node, etc). With the advantage of higher quality hardware and generally less fiddling.
But if you don't like it, switch. I don't see vendor lock-in.
there a many people who don't own Apple. Why are you so surprised? I certainly don't and never will. What's it got that I can't get on a standard PC + Linux?
Tangential: About 8 years ago ex-Apple chip engineers left to design server-grade chips, this was Nuvia, and they got sued by Apple to the point that they had to get acquired by Qualcomm.
The article says base M7 memory bandwidth is targeted at 240GB/s.
M1 had 70 GB/s, M1 Pro: 200, M1 Max 400, M1 Ultra 800.
Modern RTX 6000: ~1,600 or so.
If we get a 1,200-1,500 GB/s bandwidth M7 variant in late 2027 with 512GB of RAM, that will be a very interesting chip. Tracking LLM size and performance improvements, I can imagine that being a sort of inflection point for local inference. I wonder what the power budget would be in desktop format.
A hypothetical M7 Ultra with LPDDR6 14.4Gbps memory would be 1.85 Tb/s.
You're look at about 100 tokens/s for a 1T MoE 37B active 4bit model.
It'd probably cost $30k or more I'm guessing if memory prices do not come down. Even at $30k, it could still be a relative bargain since an RTX Pro 6000 Blackwell 96GB card costs $12k today. The M3 Ultra with 512GB was around $8k before Apple discontinued it. I expect an M7 Ultra to have 768GB or 1024GB.
Apple Silicon Macs were on their way to becoming cheap local LLM machines relative to professional GPUs before this memory crisis. It may still emerge as such in a few years.
Here's some interesting math: At 512GB, an Ultra chip could make 42 pro iPhones. Assume a 55% profit margins, and $1200 ASP, you're looking at $28,160 in profit from making iPhones instead. No wonder Apple discontinued the M3 Ultra 512GB. If they only have a limited supply of RAM for all their products, it makes no sense to produce an $8000 M3 Ultra 512GB when you can produce 42 pro iPhones. You can only configure an M3 Ultra up to 96GB today as of June 2026.
Apple would have to raise the price of a 512GB Ultra Mac to around $50k to match iPhone profits.
> Assume a 55% profit margins, and $1200 ASP, you're looking at $28,160 in profit from making iPhones instead. No wonder Apple discontinued the M3 Ultra 512GB.
How would that work? They purchase 512GB from Samsung and then it doesn't matter if that's like 128x 4GB or 4x 128GB?
Note that this reserved capacity now has competition from OpenAI, Anthropic, xAI, Meta, Microsoft, Chinese data centers and so on, all willing to pay premium.
If comapnies keep spending half a macbook neo worth of subscription on AI plans monthly per person, Apple is going to have a hard time competing.
In British English the "an" is correct, even though most English dialects don't actually render the H as silent. It's a French-derived word that had a silent H originally, ergo we use "an".
I’d assume by next year the open weights models will be outlawed the way things are going nowadays :/
Edit: for those of you downvoting I don’t celebrate this prospect. I’m merely realistic about where things are going given the rapid vibe shift from the administration on AI since the start of June.
192gb or 256gb of RAM would be enough ! We could run real time large MoE models, REAPed for our usage (e.g. english agentic coding), dynamic quant 2-4bits
Apple is finally going to realize Jobs vision where sand comes into the factory, is turned into RAM and CPU chips, then installed in a Mac or iPhone then shipped to a customer.
Well yes. But similar to the Apple TSMC relationship, could Apple step in with large orders to established RAM makers such that the RAM makers can invest with stability?
No it isn't, DRAM is made with a different process and those are chiplets, perfectly possible to outsource, and the only possibility really as TSMC does not make DRAM.
Well yeah but NVidia just released a contender to their silicon and the M6 is probably already set in stone. Best to reshift resources to a great M7 than having a mediocre M6 and M7.
(This is assuming Apple will deliver, but this area is one of the biggest ones they have in AI, and they need the developer ecosystem to exist and survive)
Come to think of it, modern cars have a lot of electronics such as touchscreens, cameras, and sensors. It wouldn’t surprise me if new car prices are not immune to what’s happening with RAM and storage prices.
What's their backup plan if the AI world doesn't pan out? What if it turns out people want base compute capability and lots of RAM for filestore cache and programs?
Maybe this strategy works, even in that world.
Remember when we all thought (were told we thought) the world was heading to 3D views of our 2D lived experience like a solid Cube of GUI we could rotate around and live inside? Well Apple took the simple 2D square pane of virtual desktops and .. made it a SONY strip. One variable: sideways.
So here we are being told AI is the future. Apple seems to be saying "yes but it will run local" which might be a safe bet if AI comes true but I wonder how many of us want the AI outcome, which is morally speaking the 3D immersive GUI cube here: what if we don't want that?
I can't imagine any world where we put this AI stuff back in the box. It is simply too useful and too powerful. And as we start seeing all his upheaval where models are getting banned, etc, I can even see the appeal of on-device AI increasing for a lot of use cases.
So I think Apple has the right instinct. In fact, I've had the thought multiple times that I really want a lot of workflows just running on my device. Workflows like fast vector search (already fast on the m4, but I want it more common place), or realtime transcription and summarization to be even faster, on device, etc.
To me AI is on par with the internet and what made it so powerful was piracy and porn and just the wide spectrum of things that are possible when you connect machines together. We are going to need the same thing again. Freedom to use any model that does any thing we want.
Anything AI focused in silicon is also valuable for a ton of other use cases. If LLMs and GenAI don’t pan out, that silicon just gets used for other processing. Then they scale back on the dedicated die space in subsequent generations.
The worst case scenario is that we're at a plateau and LLMs max out around here. And it'd stand to reason that if that happens we'd see local models catch up at least to some extent. Compared to 5 years ago, that's a pretty good world.
Without AI everyone’s computing needs were pretty well satisfied with current phones and laptops. LLMs are the one thing that could drive new demand if they can run locally.
this is the backup strategy. the "AI doesn't pan out" scenario is basically if claude and openai go bankrupt, we continue running local models on our hardware.
there isn't a future where we all just decide that nah, we don't want AI anymore. usefuly things don't disappear.
AI was the only reason I bought a new computer (a refurb M3 max with 64GB). Without AI, no idea what we should bother with, it depends on what application comes out to drive local computing power (maybe better games? Yawn).
AI isn't going anywhere, this is akin to the .com bubble. It burst, but the internet didn't go anywhere. While companies can fail, this technology is with us for the long run now, short of societal collapse.
> What's their backup plan if the AI world doesn't pan out? What if it turns out people want base compute capability and lots of RAM for filestore cache and programs?
I think reducing the die area dedicated to ai stuff is not going to be a problem.
And in fairness apple already has essentially ai-less hardware in the form of the MacBook neo and it’s been an astonishing success.
I have one and it’s a very good laptop, particularly for the price i paid it.
Do we have a choice? It's being forced upon us by folks who have the power to distort any market they want. Energy prices are rising, and the PC industry is about to be destroyed by component prices. It will be dumb clients that run the software our feudal overlords of the data centers will have the grace to grant us. And the government lets it happen because it furthers their interests.
Seems like a made-up distinction that shouldn't be necessary since M6 has not even released. I suspect this is a marketing ploy to meant to drive up both interest while also increasing prices for the next generation of Mac hardware.
What it's saying is that the M6 will be released, but not the M6 Pro or M6 Max. Instead, Apple will wait to release new Max/Pro chips for a future generation.
It's not simply marketing since the Pro/Max chips of a generation use the same cores as the regular version, just more of them or different combinations of performance and efficiency cores.
> Seems like a made-up distinction that shouldn't be necessary since M6 has not even released.
The claim is that M6 will be released, but the only variants will be lower end.
When they get to the M7 generation, they will make high end variants.
It's a real distinction because each generation of parts shares an architecture.
The article has an entire section speculating what the M6 parts will be, but says they'll top out around 200GB/s memory bandwidth and 12 graphics cores.
> Seems like a made-up distinction that shouldn't be necessary since M6 has not even released.
Why would it? Each generation of the M series has an architectural improvement on their chipsets. The difference between an M1 and an M1 Pro is the allocation and arrangement not the architecture. M6 to M7 presumably will have architectural changes.
This is no different than them skipping the “Ultra” chips on some generations. The only real difference is it going all the way down to skipping the “Pro” line. So, only the MacBook Air, low end MBP, and maybe the iPad Pro and Mac Mini get the M6.
Made up how? They'll do a refresh of lower end devices, but not the high core count versions.
It's the same thing as how the Mac Studio got an M4 Max refresh, but they didn't make an M4 Ultra so if you want the 28+ core CPU or 60+ core GPU, that's still using an M3 Ultra.
This time it'll be across all the Pro, Max, and Ultra versions, if you want those they'll stay at the previous generation for the M6 cycle.
Not that weird - Apple has a huge set of chips and hardware and software products. Putting every single thing on a fixed identical update cycle together won't always make sense.
Except that is not what's happening. The article clarifies something that is misleading if you interpret the headline in isolation: "high-end M6" means "the high-end variants of the M6 line", not "the entire M6 line".
Whether it matters for the consumer (who only sees released and announced end results) or not is irrelevant.
It can still be a very real, not made-up distinction, if the actual facts on the ground are that Apple designed an M6 line, but then scrapped that design and asked the team to create a new design with emphasis on AI-focused specs.
It's not the name that's important (the M7 could still come out as M6), is them skipping a design, or cpu "Tick-Tock model" step.
Well, I guess this is the silver lining to the price increases. I'd been thinking about an M5 128GB for local inference (eg DS4), probably off the table now given that it jumped $2k overnight. But I was on the fence about it for a long time given that even the M5 is not that good compared to even a 4090. It would have been good, but not "omg" good.
If they are pulling out all the stops to make the M7 more competitive.. guess I can wait for that?
The M7 Pro and M7 Max are scheduled for as early as the end of 2027, while the M7 Ultra is on track for 2028.
This means there won't be a redesigned MBP this year since there won't be M6 Pro/Max chips. People were expecting a redesigned slimmer MBP with OLED display later this year, myself included.
I was holding out for one until I decided to switch from an M1 Pro 16" MBP to an M5 Air 15" due to the expected price increase. I think many M1 Pro/Max generation people were waiting to upgrade this year.
Current MBPs are such a delight, I really don't want to think about a thinner MBP again, I just get shivers remembering the Ive butterfly keyboard models
Isn't that switch basically a downgrade? You get some more single core performance and some weight savings, but also a worse (and smaller) screen, less multicore performance, less GPU performance, less video encoding performance and a smaller battery? I'm on an M2 Max myself, and glad they introduced a larger form factor Air, but it seems like a long way from an upgrade.
The optics and marketing is already fucked, the MBP goes to M5 Max, the Mini has the M4, the Studio has M2 or M3, the iMac apparently has two different kinds of M4s, it's all fucked.
In the long run I truly believe local AI will win and Apple will be the world's most important AI company because of these chips. Imagine something like today's Opus running for free and in complete privacy on your local machine with a beautiful Apple UX on top. For most tasks for most people, that's a much better proposition than a frontier model in the cloud you have to pay for and send all your data to and that only works when you're online.
>In the long run I truly believe local AI will win
What do you mean by 'win'?
For a normal coder/person's use cases, yes. But AI companies are becoming more specialised in different fields and these tailored models will be leagues ahead in those niches.
I would say local AI is very real. I use it but so many here am on other forums do so nowadays as well. This is the reason I just cannot fathom the valuations of the AI firms out there.
I was waiting for a MacBook Pro M6 Max and now I don’t know what to do, especially with the price increase I feel like I really screwed up not just getting an MBP M5 Max a month ago
Mac mini Pro line is doomed, they never made enough of it; skipped M5 Pro, now skipping M6 Pro, it is like 2014-2018 again. Now ordering a custom M4 Pro build take 3 months+ to ship with an increased price.
Apple isn't just transitioning to TSMC's 2nm node, they are also transitioning to a chiplet based design using TSMC's advanced packaging.
> What sets the A20 apart isn’t just the node shrink—it’s the revolution in packaging. Apple is transitioning to Wafer-Level Multi-Chip Module (WLCM) integration, meaning that RAM will no longer be situated beside the chip, but rather on the chip wafer itself, integrated alongside the CPU, GPU, and Neural Engine.
This shift eliminates the need for silicon interposers and substrates, thereby enhancing signal integrity, improving thermal dissipation, and facilitating faster memory access with lower latency. The benefits? Better multitasking, smoother AI processing (hello, Apple Intelligence), improved battery life, and potentially a smaller chip footprint—freeing up space for other components.
A kind request - please try to write HN replies without AI, but if you're going to, please at least edit out any "it's not X its Y" or "isn't just X, but also Y" AI tics. A lot of us come here to get away from talking to AIs all day.
Do we have any explanations of what WLCM means that are more industry focused? I couldn't find anything that didn't look like blogspam. And that explanation of the DRAM being on the same wafer doesn't really make sense. For one, at that point there's no "multi chip" part if you're integrating more onto the same die rather than less.
And their explanation isn't really passing the smell test for me for other reasons, for instance the fact that DRAM processes are pretty radically different than bulk logic processes, which wouldn't really let you put it all on the same wafer, much less the same die. Even back in the day when you had eDRAM blocks (like the Xbox 360's eDRAM die), that was really a DRAM process with a bit of logic cells that wouldn't be competitive if they weren't sitting right next to the DRAM blocks.
I could be wrong here though, my examples are more than a bit long in the tooth.
You can start by reading up on TSMC's name for the tech (although there are many versions at TSMC and TSMC isn't the only company packaging chiplets and memory on top of a silicon interposer).
The terms to search for are fan-out wafer level packaging (FOWLP) and TSMC InFO. The chiplets come from different wafers and are reconstituted into a molded plastic wafer, allowing multiple die side-by-side. Then multiple layers of wires are built on top, terminating in a BGA.
Ok, part of my confusion was that it was being presented in contrast to InFO-oS and InFO-PoP, but it appears to mostly be a modified version of InFO-PoP called InFO-M? Because Apple has been using InFO-PoP for almost a decade at this point, starting with the A10.
So far the only thing I've seen useful out of apple intelligence is running parakeet natively and effectively... which should have been their very first feature... given it's been on phones for 10+ years.
As someone who wants to run effective llms locally for many things their other big benefit has been the unified memory studios for a small bit.
Given that M6 will be on TSMC smaller 2nm node and the first smaller node size in 3-years, it seems like the oddest of all years for the high-end Macs to skip.
Well this kind of sucks. I've been waiting for the M6 MBPs because they're rumored (strong rumors, though) to finally remove the notch that has been a historic self-own. But it sounds like I might as well wait longer for the M7 lineup. Or maybe get a Framework Pro instead.
There’s so many annoying bugs in Mac OS (like the screwed up window management and alt-tab not working properly), that the notch seems like an odd complaint at this point. The OS is fighting the user constantly, and there’s not much we can do…
I agree. It was very annoying to me to spend the money (and on the nano matte one too) and still have that stupid notch. But it never makes any difference at all which is good news.
It’s a complete embarrassment. They added it for aesthetic alignment with the iPhone 13. And then the 14 removed the notch soon after. They’ve kept it for years since then. It has no functional purpose. It’s not there for face ID or because they couldn’t figure out how to do a hole punch camera.
Same, have a very old MBP. Not sure what to do because I don’t want to wait a year and a half. That coupled with today’s price increases make it a tougher decision.
I am waiting till apple copies the "allocation" concept from high end car manufacturers. "Sure, buy the 25 iphones ans we will gladly put you on the waitlist."
because America can't compete. Build a fab in the US, labor unions, labor costs, regulations, land, energy, taxes, government, water, etc all make this not economical. Everything would cost twice as much and you'd rather buy the cheaper product and it'll be bankrupt. There were reasons why all the manufacturing went overseas to Asia. You're right, the demand right now is HUGE but it won't always be huge. At this point, we don't have the talent or the knowledge to do it well anyway which is why we needed TSMC and Samsung to bring employees over to train people. https://www.cppionline.org/wp-content/uploads/2017/07/The-De...
Apple is very late to the AI party. By the time M7 is shipped, Nvidia will announce 6090 and people will be buying used (3|4|5)090 GPUs to run local models at much better performance than heat throttled M7.
I would prefer a Studio if it does a decent enough job even if throttles a bit under load, way less power usage and noise than those GPUs plus the PC you need to put those in.
RAM is a commodity and nvidia will be paying the same prices. The used market will reflect the cost of RAM. nvidia owns the top of the market but many of us don't need that.
What people? Are you seriously thinking the hundreds of millions of customers Apple have is going to be buying run-to-the-ground GPUs second hand and build local workstations for AI? Might as well ask them to self host email while you’re at it.
The difference between these two is that one of them is an unsolved research problem that we’ve all spent far too much time on, and the other is just running an LLM.
Do you really think the average Apple user will use it when there’s already better AI provided by OpenAI and Anthropic which don’t require advanced local hardware?
I was just countering your argument with an equally compelling counter-argument.
As for how it helps: we're not talking about this year's AI ecosystem, or even next year's. This rumor, assuming it's true, is talking about two chip generations into the future — and probably at least three or four chip generations before it's a mature AI platform. What will AI be doing for us in five years from now? How does Apple plan for that future? Will concerns of privacy increase or decrease in that time?
That means they're actually incentivized at least short term, to benefit PCs becoming strong enough to do local LLMs. Which makes this play make even more sense. Though, I've been saying for a while that the local AI inflectiom point is the death knell for these frontier labs.
"Death knell" is a touch hyperbolic. Hardware that can only run quantized models that take up GBs in VRAM falls short of even an A100 (by almost an order of magnitude[0]), which in turn falls short of what an 8xH100 cluster can do (also by another order of magnitude[0]).
I'm an avid believer in local LLMs, but I cannot deceive myself - data center accelerators will win on power dissipation numbers alone[1], even when giving generous allowances for higher efficiency on Apple chips - and assuming the Apple-efficiency advantage persists on the same TSMC process node.
0. Based on my unscientific fine-tuning training experiments across local and rented GPUs. YMMV for inference.
1. Unless Apple surprises everyone and brings back the XServe with M7, if not, then laptop and desktop for factors simply can't dump heat fast enough to compete head-to-head, and will be designed for lower input wattage.
The frontier models are faster, and better at coding, but not so much that i’ll pay $200/month for them.
They use a technique where you only load between 1B and 4B of a 20B dense model for an entire prompt run, not token by token like a MoE, and use mostly the low power ANE instead of GPU cores.
Now, imagine if/when they scale up to 100B or more? On a chip using 2W?
I want to live in this world too, but these numbers, as of today, are very aspirational and far removed from reality.
I'm no tokenmaxxer; I find my modest local setup useful, I also know the limitations, it's slow and it sucks (relatively) at high-level and/or long-context planning, compared to frontier models. Only a minority of my prompts are max-effort - its not all I do, but, it also means frontier labs aren't dying any time soon
I love local models - I have a machine at home that runs a few for me and it's a lot of fun - but for the time being they are not super trustworthy on tool calls and staying on script. Another year or so might change all that!
The real question is, what are 90% of people going to ask llms to do. I’d argue mostly it’s going to be stuff that works-now or almost-works on local models, but that’s just an opinion. It also depends on the frontier models hitting a wall of steeply diminishing returns, since they set the expectations for all of this stuff - my gut says that’s happened already they just won’t admit it for a while - but we’ll see.
Sometimes, I need a quick throwaway bit of python. That can take 30 minutes of my time.
Apple is the only player here where it would play into their natural hardware incentive to get you to pay more for better hardware. It would make sense for them to find a way to run LLM locally (eg, newer architectures that others here have pointed out).
Interesting times.
Of course, these are a lot of ifs.
There is some signal that this is possible through both hardware innovation and training/data improvements.
Cloud models have their own constraints - I can’t have opus4.8 spend 4 hours on a deep research question I had in the shower without spending money. I can’t do real time video game upscaling and graphics work in the cloud period.
A laptop is about an order of magnitude cheaper than a cloud server thanks to economies of scale, uptime requirements, and other factors.
I'm not talking local Gemma/Qwen vs cloud Opus, but against OpenRouter same Gemma/Qwen
there are reasons to run local - privacy, availability, but cost is not one of them
So yeah, commercially it might be a death knell. Yes there's still a market for super computers, but would your rather own Apple or Cray?
At some point there will be diminishing returns towards the "just throw more RAM at it" approach the current frontier models are taking. Commoditization is just as inevitable as it ever was... and in doing so will enable actual leaps of what AI/ML is capable of. That's not to say there won't be a place for 99.999999% accurate vs 99.99999% but those cases will be limited and likely prime to disruption based on real innovation vs access to capital.
SOCs with unified memory have shifted this a bit forward, but they're also expensive as shit.
10TB ram in a consumer device is simply not happening in the next 10 years.
Enjoy paying $1000 or more for a little 4 GiB cloud terminal that connects you to all your online accounts where all your actual work gets done. This is the future.
This is highly doubtful.
Rule of thumb: everything people think is exponential is actually an S curve.
We need one of those specialized inference chip startups to succeed and a PC manufacturer willing to bet on them against Nvidia for the local AI to find mass market appeal.
Apple has always been the most cost effective choice for the value you get going all the way back to the Apple II, it's just that the floor of that cost has always been high. Anyone who thinks otherwise is a just a fanboy one way or the other.
That's how much many developers currently spend on tokens - every day. Whatever "Apple Tax" applies to a device that can run a capable model offline will amortise itself in a blink.
Current high-end Mac Studio with 32-core M3 Ultra chip and 96 GB of memory is $6800, 96GB is not enough to run GLM 5.2 without extreme quantization or stacking HW; but for the sake of discussion let's run quantized version on a single high end Mac Studio.
GLM 5.2 Max plan costs $ 112/m, so it would take ~60 months to recover the costs assuming the machine was bought just for AI. By then the current AI landscape would have changed drastically.
I use local AI on both Linux and Mac every single day, there's freedom, privacy and peace of mind in running the model locally. But I feel cost/value of local AI is overblown.
Apple has pretty good competition in every segment with the exception of maybe the iPad, but I'm not a tablet user.
Sure, you can use the App Store and use all the stuff that integrates with iPhone, iCloud, etc
But you can also just treat it as Linux for Laptops (that actually works), and roll with all the standard open source tools.
While they don't _prevent_ Asahi from doing what they're doing, they certainly don't go out of their way to make it easy for them.
But if you don't like it, switch. I don't see vendor lock-in.
Notes sync, Copy/Paste would be hard to give up and took zero effort
And in the rare occasions in which I have to use someone's MacBook, I'm completely lost - like some elderly person.
why not just say "I think that"
do you see yourself as some kind of visionary about this particular topic? literally EVERYONE is saying that, it's the most obvious fact about AI
M1 had 70 GB/s, M1 Pro: 200, M1 Max 400, M1 Ultra 800.
Modern RTX 6000: ~1,600 or so.
If we get a 1,200-1,500 GB/s bandwidth M7 variant in late 2027 with 512GB of RAM, that will be a very interesting chip. Tracking LLM size and performance improvements, I can imagine that being a sort of inflection point for local inference. I wonder what the power budget would be in desktop format.
You're look at about 100 tokens/s for a 1T MoE 37B active 4bit model.
It'd probably cost $30k or more I'm guessing if memory prices do not come down. Even at $30k, it could still be a relative bargain since an RTX Pro 6000 Blackwell 96GB card costs $12k today. The M3 Ultra with 512GB was around $8k before Apple discontinued it. I expect an M7 Ultra to have 768GB or 1024GB.
Apple Silicon Macs were on their way to becoming cheap local LLM machines relative to professional GPUs before this memory crisis. It may still emerge as such in a few years.
Here's some interesting math: At 512GB, an Ultra chip could make 42 pro iPhones. Assume a 55% profit margins, and $1200 ASP, you're looking at $28,160 in profit from making iPhones instead. No wonder Apple discontinued the M3 Ultra 512GB. If they only have a limited supply of RAM for all their products, it makes no sense to produce an $8000 M3 Ultra 512GB when you can produce 42 pro iPhones. You can only configure an M3 Ultra up to 96GB today as of June 2026.
Apple would have to raise the price of a 512GB Ultra Mac to around $50k to match iPhone profits.
How would that work? They purchase 512GB from Samsung and then it doesn't matter if that's like 128x 4GB or 4x 128GB?
If comapnies keep spending half a macbook neo worth of subscription on AI plans monthly per person, Apple is going to have a hard time competing.
That’s indeed very hypothetical considering that Apple silicon uses on-package HBM.
Edit: for those of you downvoting I don’t celebrate this prospect. I’m merely realistic about where things are going given the rapid vibe shift from the administration on AI since the start of June.
The article didn't state the M5 Ultra won't be released. It will probably provide 1228GB/s of memory bandwidth this year.
(This is assuming Apple will deliver, but this area is one of the biggest ones they have in AI, and they need the developer ecosystem to exist and survive)
Maybe this strategy works, even in that world.
Remember when we all thought (were told we thought) the world was heading to 3D views of our 2D lived experience like a solid Cube of GUI we could rotate around and live inside? Well Apple took the simple 2D square pane of virtual desktops and .. made it a SONY strip. One variable: sideways.
So here we are being told AI is the future. Apple seems to be saying "yes but it will run local" which might be a safe bet if AI comes true but I wonder how many of us want the AI outcome, which is morally speaking the 3D immersive GUI cube here: what if we don't want that?
So I think Apple has the right instinct. In fact, I've had the thought multiple times that I really want a lot of workflows just running on my device. Workflows like fast vector search (already fast on the m4, but I want it more common place), or realtime transcription and summarization to be even faster, on device, etc.
Can't it do both? The M1 Pro with 16gb+ is still more than nearly everyone needs.
It’s all fairly easy bets to make and correct.
there isn't a future where we all just decide that nah, we don't want AI anymore. usefuly things don't disappear.
I think reducing the die area dedicated to ai stuff is not going to be a problem.
And in fairness apple already has essentially ai-less hardware in the form of the MacBook neo and it’s been an astonishing success.
I have one and it’s a very good laptop, particularly for the price i paid it.
Do we have a choice? It's being forced upon us by folks who have the power to distort any market they want. Energy prices are rising, and the PC industry is about to be destroyed by component prices. It will be dumb clients that run the software our feudal overlords of the data centers will have the grace to grant us. And the government lets it happen because it furthers their interests.
https://bontechlabs.com/news/apple-is-reportedly-using-intel...
Given the risks involved in establishing Apple Silicon designs with a new fab, I would expect early M7 parts to be in test production right now.
The fundamental M7 design is already set in stone.
Mark Gurman's Bloomberg article does not mention fabrication partners or processes.
If they have Apple's designs months prior to launch, rather than after launch.
It's not simply marketing since the Pro/Max chips of a generation use the same cores as the regular version, just more of them or different combinations of performance and efficiency cores.
The claim is that M6 will be released, but the only variants will be lower end.
When they get to the M7 generation, they will make high end variants.
It's a real distinction because each generation of parts shares an architecture.
The article has an entire section speculating what the M6 parts will be, but says they'll top out around 200GB/s memory bandwidth and 12 graphics cores.
Why would it? Each generation of the M series has an architectural improvement on their chipsets. The difference between an M1 and an M1 Pro is the allocation and arrangement not the architecture. M6 to M7 presumably will have architectural changes.
Or did this announcement also add an M6 chip, and they're just skipping pro?
It's the same thing as how the Mac Studio got an M4 Max refresh, but they didn't make an M4 Ultra so if you want the 28+ core CPU or 60+ core GPU, that's still using an M3 Ultra.
This time it'll be across all the Pro, Max, and Ultra versions, if you want those they'll stay at the previous generation for the M6 cycle.
Not that weird - Apple has a huge set of chips and hardware and software products. Putting every single thing on a fixed identical update cycle together won't always make sense.
It can still be a very real, not made-up distinction, if the actual facts on the ground are that Apple designed an M6 line, but then scrapped that design and asked the team to create a new design with emphasis on AI-focused specs.
It's not the name that's important (the M7 could still come out as M6), is them skipping a design, or cpu "Tick-Tock model" step.
Are you thinking Apple is leaking that there will be a long wait for much more expensive chips in order to… what?
If they are pulling out all the stops to make the M7 more competitive.. guess I can wait for that?
I was holding out for one until I decided to switch from an M1 Pro 16" MBP to an M5 Air 15" due to the expected price increase. I think many M1 Pro/Max generation people were waiting to upgrade this year.
They can release a redesigned MBP with the base M6 chip.
They don't want to tell the world how the new redesigned MBP is the best laptop in the world but it's slower than the older MBPs.
What do you mean by 'win'?
For a normal coder/person's use cases, yes. But AI companies are becoming more specialised in different fields and these tailored models will be leagues ahead in those niches.
Are you upgrading from a perfectly good machine? Then wait.
> What sets the A20 apart isn’t just the node shrink—it’s the revolution in packaging. Apple is transitioning to Wafer-Level Multi-Chip Module (WLCM) integration, meaning that RAM will no longer be situated beside the chip, but rather on the chip wafer itself, integrated alongside the CPU, GPU, and Neural Engine.
This shift eliminates the need for silicon interposers and substrates, thereby enhancing signal integrity, improving thermal dissipation, and facilitating faster memory access with lower latency. The benefits? Better multitasking, smoother AI processing (hello, Apple Intelligence), improved battery life, and potentially a smaller chip footprint—freeing up space for other components.
https://hwbusters.com/news/apples-a20-chip-ushers-in-a-new-e...
It's entirely possible that TSMC is ramping up more slowly than expected.
And their explanation isn't really passing the smell test for me for other reasons, for instance the fact that DRAM processes are pretty radically different than bulk logic processes, which wouldn't really let you put it all on the same wafer, much less the same die. Even back in the day when you had eDRAM blocks (like the Xbox 360's eDRAM die), that was really a DRAM process with a bit of logic cells that wouldn't be competitive if they weren't sitting right next to the DRAM blocks.
I could be wrong here though, my examples are more than a bit long in the tooth.
> CoWoS (Chip-on-Wafer-on-Substrate)
https://semiwiki.com/wikis/industry-wikis/cowos-chip-on-wafe...
It's a more advanced update from their older InFO tech.
As someone who wants to run effective llms locally for many things their other big benefit has been the unified memory studios for a small bit.
But in terms of “noticing it” you are correct. You won’t pay attention after a day or two.
EDIT: this menu managing app will need permissios to make screen captures. So much for the privacy. Forgot to mention.
hyperscalers better all IPO in the next 8 quarters
I wonder how much the rumored 768GB RAM version will cost.
A top of range Mac is a depreciating asset and looks exactly the same as the other models physically.
some kind of private-public partnership
sorry if thats already happening in some capacity, like i said - "stupid question"
but can the gov not just fast track this as a "national security" or something?
i think the usa should be the one who make 1nm or smaller chips on demand, even if it takes 5-10. years to do.
and yes i realize i might sound dumb here but i'm the one suffering from high hardware prices!!
I guess it should be https://www.bloomberg.com/news/articles/2026-06-25/apple-to-...
EDIT: gift link if paywalled (archive.is capture is truncated): https://www.bloomberg.com/news/articles/2026-06-25/apple-to-...
They need to pull out of this half assed bandwagon approach.
They don't need to pull out of this approach.
Do you really think the average Apple user will use it when there’s already better AI provided by OpenAI and Anthropic which don’t require advanced local hardware?
As for how it helps: we're not talking about this year's AI ecosystem, or even next year's. This rumor, assuming it's true, is talking about two chip generations into the future — and probably at least three or four chip generations before it's a mature AI platform. What will AI be doing for us in five years from now? How does Apple plan for that future? Will concerns of privacy increase or decrease in that time?