We found a bug in the hyper HTTP library

(blog.cloudflare.com)

109 points | by Pop_- 4 days ago

13 comments

  • Twey 1 hour ago
    This would have been flagged by Clippy lints `let_underscore_untyped` or `let_underscore_must_use`, which sadly are not enabled by default.
    • microgpt 10 minutes ago
      Or just by not writing let _ =
    • pwdisswordfishq 1 hour ago
      Ehh, easy fix

          #[allow(clippy::let_underscore_untyped,clippy::let_underscore_must_use)]
          let _ = self.poll_flush(cx)?;
      • Twey 44 minutes ago
        I said ‘flagged’, not ‘fixed’ :)

        You can always write the wrong code if you want it enough. But hopefully a warning would have prompted someone to think harder about this flow.

        • pwdisswordfishq 19 minutes ago
          But "let _ =" is already an explicit suppression of a must-use warning. Where does this arms race of "no, I really know what I am doing, compiler" versus "no, this really looks like a mistake, programmer" end?
      • lunar_mycroft 1 hour ago
        You can set the lints to `forbid` instead of `deny`, which means they can't be `allowed` like that.
      • nesarkvechnep 1 hour ago
        Yeah, but you must know about them and the possible bug first in order to allow them...
        • Twey 42 minutes ago
          Hence ‘sadly’. IMNSHO both of these (or at least _untyped) should be enabled by default. Untyped `let _` is too big a footgun during refactorings.
        • Joker_vD 56 minutes ago
          At which point you wouldn't have written this bug in the first place; or the warnings would trigger immediately, you'd change _ to an actual variable and then remove the warning pragmas because now you don't assign to _.
          • Twey 40 minutes ago
            `Poll` is marked `#[must_use]` so if you were assigning to something other than `_` you'd get a warning that you're ignoring the `Pending` path. The Clippy lint is only for `_` which Rust considers a use by default.
  • edelbitter 5 hours ago
    Cloudflare does not notice (until a customer complains) that they are sending broken responses at scale? I would have thought they would notice this from sampling and linting a few replies.. just in case they did something like Cloudbleed again.
    • ramon156 2 hours ago
      Can you get reasonable results without exposing sensitive info? I'm asking because I genuinely have no idea what it's like at their scale
  • nopurpose 1 hour ago
    Nice writeup, but I don't understand how `curl` didn't trigger bug for them (or any other hyper HTTP server out there), given the explanation in the article.

    `curl --http1.1` sends `Connection: Close` so sender (hyper) must attempt to shutdown connection after sending whole body. Surely any network is slower than memory copy into socket kernel buffers, so it must reliably trigger condition "buffer flush can't be done in one go" and thus trigger early TCP shutdown.

  • worldsavior 2 hours ago
    > We spent six weeks chasing a nearly invisible bug — a race condition that occurred only under specific conditions — in the hyper library that impacted how the Images binding returned processed image data back to the client. In the end, it took four lines of code to fix it.

    That's a long time, must be frustrating.

  • microgpt 2 hours ago
    Would using Rust have prevented this?
    • re-thc 1 hour ago
      Isn't this already Rust?
      • pjmlp 1 hour ago
        That was obviously a joke question, pointing that Rust isn't the solution for everything.
      • lelanthran 53 minutes ago
        Woosh :-)
    • Cthulhu_ 1 hour ago
      The Hyper library in question is a Rust library.

      Did you read the article, or are you a "use rust" parrot / bot based on titles?

      • waysa 1 hour ago
        Sarcasm. (I guess)
        • frankharv 44 minutes ago
          Obviously written by a C freak using a BSD
  • pseudony 1 hour ago
    So “fearless concurrency” still only happens when one just decides to not be afraid… :)
    • c0balt 1 hour ago
      This does not appear to be a concurrency bug though?
      • microgpt 9 minutes ago
        Of course it's a concurrency bug. It races sending data to the kernel against the kernel sending data to the network. If the wrong one wins the bug occurs.
      • pseudony 32 minutes ago
        “ a race condition that occurred only under specific conditions — in the hyper library”
  • Thaxll 1 hour ago
    So much for Rust forcing you to handle errors.
    • Matl 2 minutes ago
      Go does force you too, but it also supports _ as a bypass - because sometimes you do know better. Just not in this case.

      Rust never promised it'll let programmers turn off their brain, that's what LLMs are for.

    • wongarsu 1 hour ago
      You could argue the bug happened exactly because hyper's poll_flush treats flushing some but not all data as a successful return, not an error case.
    • atoav 1 hour ago
      You could say the exact same thing about safety belts and airbags in cars after someone has died in a crash.

      Why even bother with measures that prevent many problems if they won't prevent all of them, right?

      • chlorion 3 minutes ago
        This is the argument I like too.

        It's the same argument anti-vaxers love to make. "Well you can still get covid after getting the shot", which is something I read and heard quite a lot. That doesn't make the thing useless.

        Humans are really dumb.

  • 100ms 4 hours ago
    > The failure was caused by a timing-dependent race condition in hyper’s HTTP/1 connection handling. When the reader was slower and the socket buffer filled, poll_flush returned Poll::Pending, but the dispatch loop discarded that result. Hyper then treated the response as complete and shut down the socket while data remained buffered internally, causing the client to receive an EOF before the full body arrived.

    https://github.com/hyperium/hyper/issues/4022

    Saved you 3000 words

  • nopurpose 5 hours ago
    [dead]
  • logicchains 5 hours ago
    [flagged]
    • fwlr 4 hours ago

          let _ = …?
      This is the Rust idiom for “I am intentionally ignoring this return value”. The linter would have caught

          self.poll_read()?;
      
      and in fact one of the options the linter itself suggests in this case is exactly this “let underscore equals” idiom. (Arguably, this code exists because of the linter, not due to its absence!)

      In any case, the return value is being “handled” - the question mark examines the result and breaks the loop if the result is not `Ok(…)`, ie if the call is not successful.

      Intentionally ignoring the successful return value isn’t necessarily terrible, either - you could be calling the function for its side effect, and you don’t care what the specific result of that effect is, just as long as there is some effect. E.g. maybe you have a state machine, and this is the code that repeatedly drives it.

      (Not coincidentally, polling is what you do to Futures, and Futures are state machines that you need to repeatedly drive…)

      In conclusion, I do not think this is prima facie terrible code, nor is it an obvious bug. Async rust is subtle and complicated, and not always fully understood by those who nevertheless have to use it.

      • logicchains 54 minutes ago
        >This is the Rust idiom for “I am intentionally ignoring this return value”.

        That doesn't make the code any less awful, it just makes idiomatic Rust sound awful. Discarding a return value without even a comment to explain why shouldn't be allowed in any critical project, and the linter should be perfectly capable of ensuring that a comment accompanies the discard and complaining loudly when it doesn't.

    • lifthrasiir 5 hours ago
      It is an explicit way to discard return values; `self.poll_read(cx)?` etc. alone would warn. Or in this case, `Poll<Result<(), Error>>` is unwrapped once and `Result<(), Error>` is being discarded. The decision to discard `Result<(), Error>` should have been intentional, albeit turned out to be not always the case.
      • watt 4 hours ago
        If they're not going to handle the return values, they should change the function signature to reflect this aspirational contract, that that function "never fails".

        I see in the article they did change the poll_flush to run just-in-time at poll_shutdown. So they definitely can make a "best effort" poll_flush version that just does not return any errors for use in that loop.

        But all in all? Amateur hour.

        • a_cul 3 hours ago
          You're missing how rust works. The function is explicitly allowed to fail, which is why it returns a Result<(), Error>. They're using the function calls within for their side effects. The ? at the end of each line signals that the function will short-circuit return with an error if the function call fails, and only if it is successful it returns the actual value: they just don't care about this value, hence the let _ =. Basically, they are doing the equivalent of:

            let _, err = function_call();
            if err {
              return err
            }
            ...
          • watt 2 hours ago
            What I am saying, is make another version of the function, which is explicitly not allowed to fail, if you want to use it in the loop.
    • QuantumNomad_ 5 hours ago
      Assigning to _ in Rust specifically means that you intentionally want to discard the value, and the clippy linter and the Rust compiler both know that.
  • giammbo 5 hours ago
    [flagged]
  • algoth1 2 hours ago
    I wonder if this bug was found via project glasswing
    • re-thc 1 hour ago
      > I wonder if this bug was found via project glasswing

      Did you read how they said it took weeks? Would run out of tokens at that rate...