PSI: Pressure stall information for CPU, memory, and IO

(lwn.net)

123 points | by henridf 2110 days ago

7 comments

Scaevolus 2110 days ago
Handling OOM livelocks is exciting, and they have a good explanation for why the current OOM-killer fails:
> One usecase is avoiding OOM hangs/livelocks. The reason these happen is because the OOM killer is triggered by reclaim not being able to free pages, but with fast flash devices there is always some clean and uptodate cache to reclaim; the OOM killer never kicks in, even as tasks spend 90% of the time thrashing the cache pages of their own executables. There is no situation where this ever makes sense in practice.
SomeHacker44 2110 days ago
Interesting!
After reading it, I realized I was actually hoping for information at a lower level than VM for memory pressure. Actually finding live, actionable information about DRAM bandwidth usage, delays caused by the hardware system of caches including TLBs, L1/2/3 caches and main memory contention, etc. I have not found that existing tools are insufficient in monitoring/dealing with VM swapping - OTOH I usually seek to keep that at zero and leave a little swap just to allow for some chance of alerting and recovery before OOM killer kicks in.
[-]
- grandmczeb 2110 days ago
  Can you give a little more detail about the kind of information you’d like from the hardware?
  [-]
  - Filligree 2110 days ago
    How much time does each CPU core spend running/waiting for L1/L2/L3/DRAM? How often are those stalls due to cross-core contention for the same bit of memory? Which execution units are in use? What's the limiting factor on throughout?
    That's what I can think of in the first two minutes, anyhow. It all comes down to the last one.
    [-]
    - zlynx 2110 days ago
      Intel CPUs provide a crazy amount of performance counters. I bet that the numbers you want are in there somewhere. Look into the perf tool.
      [-]
      - Filligree 2110 days ago
        Alas, I'm using AMD.
  - jakeogh 2109 days ago
    Prob too easy to ask here, but can I get a list of pending syscalls sorted by age?
zokier 2110 days ago
Digging the LKML thread, this appears to be the corresponding userland component for the OOM use-case:
https://github.com/facebookincubator/oomd
There was also more minimal proof-of-concept example posted by Endless OS guys:
https://gist.github.com/dsd/a8988bf0b81a6163475988120fe8d9cd
teddyh 2110 days ago
Sounds good. The next step would be to start using this instead of load average in all the appropriate places, like batch(1), etc.
everybodyknows 2109 days ago
Curious that there is no mention of the existing "memory" cgroup. On some desktop Linux, you'll find it here:
```
  ls -l /sys/fs/cgroup/memory/
```
The 000-permission 'pressure_level' file controls asynchronous notifications to apps, advising prompt shedding of load. This is apparently the mechanism alluded to in a Googler's recent blog post, writing from the point of view of Go server coding: https://news.ycombinator.com/item?id=17551012
politician 2110 days ago
I'm happy to see a new take on trying to produce a meaningful load metric.
[-]
- lolc 2110 days ago
  Especially as it promises to disentangle IO and CPU load.
cjhanks 2110 days ago
I have long looked for an efficient metric for measuring VM pressure. Hope to see this, or something like this merged.