Why We’re Switching to gRPC

(eng.fromatob.com)

244 points | by protophason 2504 days ago

25 comments

  • time4tea 2503 days ago
    Yeah. Easy things are easy with most technologies... It's only after a while that you start to see the 'problems'.

    With grpc... It's designed by Google for Google's use case. How they do things and the design trade-offs they made are quite specific, and may not make sense for you.

    There are no generated language interfaces, so you cannot mock the methods. (Except by mocking abstract classes, and nobody sane does that, right)

    That's because grpc allows you to implement whatever methods you like of a service interface, and require any fields you like - all are optional, but not really, right.

    Things that you might expect to be invalid, are valid. A zero byte array deserialised as a protobuf message is a perfectly valid message. All the strings are "" (not null), the bools false, and the ints 0.

    Load balancing is done by maintaining multiple connections to all upstreams.

    The messages dont work very well with ALB/ELB.

    The tooling for web clients was terrible ( I understand this may have changed )

    The grpc generated classes are a load of slowly compiling not very nice code.

    Like I say, if your tech and business is like Google's ( it probably isn't) then it's a shoe-in, else it's definitely worth asking if there is a match for your needs.

    • kjeetgill 2503 days ago
      > With grpc... It's designed by Google for Google's use case. How they do things and the design trade-offs they made are quite specific, and may not make sense for you.

      Agreed. It's always important to try to pick technologies that 'align' with your use-cases as well as possible. This is easier said than done and gets easier the more often you fail to do it well! I do think people will read "for Google's use case" and hear "only for Google's scale". I actually think the gRPC Java stack is pretty efficient so it "scales down" pretty well.

      I want to skip over some of what you're saying to address this:

      > Things that you might expect to be invalid, are valid. A zero byte array deserialised as a protobuf message is a perfectly valid message. All the strings are "" (not null), the bools false, and the ints 0.

      Using a protobuf schema layer is wayyyy nicer than JSON blobs but I agree that it is misconstrued as type safety and validation. It's fantastic for efficient data marshaling and decent for code generation but it dosn't solve the "semantic correctness" side of things. You should still be writing validation. Its a solid step up from JSON not a panacea.

      • alasdair_ 2503 days ago
        JSON has a bunch of schema systems, including Open API which is a repackaging of Swagger with some extra stuff and is also endorsed by Google.

        Do you consider protobuf superior to those alternatives for web-based (rather than server to server) projects?

        • kjeetgill 2503 days ago
          I spend all my time server to server so I don't feel qualified to give real advice.

          My impression is that if you're going to talk to a browser, that edge stands to gain much more from conforming to HTTP standards. If your edge is more "applicationy" and less "webpagy" then maybe a browser facing gRPC (or GraphQL?) might be more appealing again.

          As to the other JSON schema systems, I kinda wish one of them won? It feels like a lot of competing standards still. Not really my area of expertise.

          • anderspitman 2502 days ago
            There are a couple gRPC implementations for the browser[0](officially supported), but it seems to require quite a bit of adaptation, and looked pretty complicated to set up.

            [0] https://grpc.io/blog/state-of-grpc-web/

          • luhn 2503 days ago
            I think OpenAPI/Swagger has won. Haven’t heard of any others recently.
    • buckhx 2503 days ago
      I am by no means arguing with your general point, but some of these and may be language specific.

      For example in Go, the service definitions are generated as interfaces and come with an "Unimplemented" concrete client. We have a codegen package that builds mock concrete implementations of services for use in tests.

      Zero values are also the standard in Go and fit most use cases. We have "optional" types defined as messages that wrap primitives such as floats for times when a true null is needed (has been mostly used for update type methods).

      The web clients work, but generate ALOT of code. We're using the improbable package so we can provide different transports for broswer vs server JS clients btw.

      The big win we've seen from grpc is being able to reason about the entire system end to end and have a central language for conversation and contracts across teams. Sure there are other ways to accomplish that, but grpc has served that purpose for us.

      • ptoomey3 2503 days ago
        Our main hinderance with gRPC was that several disparate teams had strange issues with the fairly opaque runtime. The “batteries included” approach made attempts to debug the root causes quite difficult.

        As a result of the above, we have been exploring twirp. You get the benefits of using protobufs for defining the RPC interface, but without quite as much runtime baggage that complicates debugging issues that arise.

        • kjeetgill 2503 days ago
          That's always the problem with "batteries included". If they don't work it's often not worth the effort to fix them; you gotta toss em.

          I'm curious what languages you were using gRPC with. The batteries includedness across tons of languages is a big part of gRPC's appeal. I'd assume Java and C++ get enough use to be solid but maybe that's wishful thinking?

          • ptoomey3 2503 days ago
            We were mostly using ruby (which uses their C bindings) and golang (which are native to golang).
            • cachvico 2503 days ago
              What kind of problems did you run into, if you don't mind sharing?
              • ptoomey3 2503 days ago
                One that we encountered in several services were gRPC ruby clients that semi-regularly blocked on responses for an indeterminate amount of time. We added lots of tracing data on the client and server to instrument where “slowness” was occurring. We would see every trace span look just like you would hope until the message went into the runtime and failed to get passed up to the caller for some random long period of time. Debugging what was happening between the network response (fast) and the actual parsed response being handed to the caller (slow) was quite frustrating, as it requires trying to dig into C bindings/runtime from the ruby client.
                • mgsouth 2502 days ago
                  It was a couple of years ago, but the Go gRPC library had pretty broken flow control. gRPC depends upon both ends having an accurate picture of in-flight data volumes, both per-stream and per-transport (muxed connection). It's a rather complex protocol, and isn't rigorously specified for error cases. The main problem we encountered was that errors, especially timed-out transactions, would cause the gRPC library to lose track of buffer ownership (in the sense of host-to-host), and result in a permanent decrease of a transport's available in-flight capacity. Eventually it would hit zero and the two hosts would stop talking. Our solution was to patch-out the flow control (we already had app-level mechanisms).

                  [edit: The flow control is actually done at the HTTP/2 level. However, the Go gRPC library has its own implementation of HTTP/2.]

                  • ptoomey3 2502 days ago
                    Yet another batteries included downside...a blackbox http implementation that is hard to debug.
                • ptoomey3 2503 days ago
                  This isn’t to say it happened on every response..it was a relatively small fraction. But, it was enough to tell _something_ was going on. Who knows, it could be something quirky on the network and not even a gRPC issue. But, because the runtime is so opaque, it made debugging quite difficult.
          • alasdair_ 2503 days ago
            Even Java codefen has issues - like a single classfile so big it crashes any IDE not explicitly set up for it, or a whole bunch of useless methods that lead to autocomplete being terrible.
    • dub 2503 days ago
      You can add whatever custom validation you want on top of proto3 (using annotations if you like). Required fields aren't very useful at a serialization level: adding a new required field would always be a backwards incompatible change. You should never do it. They're only useful if you have total certainty that you can define all your APIs perfectly on the first try. But again, if you really want them you can always build something like https://github.com/envoyproxy/protoc-gen-validate on top. That's the benefit of a simple and extensible system vs one that tries to bake-in unnecessary or problematic default behaviors.

      Also: why wouldn't grpc work well with load balancers? It's based on HTTP/2. It's well supported by envoy, which is fast-becoming the de facto standard proxy for service meshes.

      • alasdair_ 2503 days ago
        >Also: why wouldn't grpc work well with load balancers? It's based on HTTP/2

        You answered your own question.

        There is always some bit of older infrastructure, like a caching proxy or “enterprise” load balancer that doesn’t quite understand http/2 yet - it is the same reason so much Internet traffic is still on ipv4 when ipv6 exists - the lowest common denominator end up winning for some subsection of traffic.

        • tedk-42 2503 days ago
          Not wrong on this.

          We use gRPC in our tech stack but some pods handle far more connections than others due to the multiplexing/reuse of existing connections.

          Sadly Istio/Envoy solutions are in our backlog for now.

          We can't fault gRPC otherwise. It's way faster than if we were to encode/decode json after each microservice hop. It's integrates into golang nicely (another google coolaid solution!) so a win-win there.

    • dilyevsky 2503 days ago
      > The messages dont work very well with ALB/ELB.

      Doesn't work on ALB because its HTTP/2 support is trash, not gRPC's fault here. Works fine with NLB btw.

      > Load balancing is done by maintaining multiple connections to all upstreams.

      Again, this is a "feature" of HTTP/2. Use linkerd or envoy that support subsetting among other useful things.

      Don't blame your misunderstanding of how technology is meant to be used on said technology.

      • adamson 2503 days ago
        This feels like a circular argument. Not everyone needs HTTP/2 support (especially for internal services for enterprise applications).
        • ptoomey3 2503 days ago
          Need or not need...a bigger issue is simply the technical reality of what you have now. If your non-trivial infrastructure doesn’t have great http2 support, it might be a pretty big lift to make that change first.
          • dilyevsky 2503 days ago
            Yes, it is such a foundational thing that it has to be woven into how you run your infrastructure. Can't just drop it in in most cases. If my memory serves me well, Google basically re-started their entire codebase/infra from scratch - google3 (version 2 was skipped, apparently) to accommodate this shift.
    • denormalfloat 2503 days ago
      > There are no generated language interfaces, so you cannot mock the methods.

      For Java that isn't true. Java ships with a lightweight InProcess server to stub out the responses to your client.

      > Load balancing is done by maintaining multiple connections to all upstreams.

      Load balancing is fully pluggable. The default balancer only picks the first connection.

      > The tooling for web clients was terrible

      Agreed. This is almost entirely the fault of Chrome and Firefox, for not implementing the HTTP/2 spec properly. (missing trailers).

      • time4tea 2503 days ago
        There is a big difference in my book between starting up an in process server (requiring all sorts of grpc naming magic) running your extensions to an abstract class inside the grpc magic, and a language level interface.

        On the one level you can say new X(new MyService()) or new X(mock(Service.class))) if you have to, and on the other its just loads of jibber-jabber.

        • denormalfloat 2503 days ago
          There is a reason mocking is not supported: It's incredibly error prone. The previous version of gRPC (Stubby) did actually support mocking of the stub/service interfaces. The issue is that people mock out responses that don't map to reality, causing the tests to pass but the system-under-test to blow up. This happened often enough that ability to mock was ripped out, and the InProcess server-client added.

          The extra "jibber-jabber" is what makes people confident that their Stub usage is correct.

          Some sample bugs NOT caught by a mock:

          * Calling the stub with a NULL message

          * Not calling close()

          * Sending invalid headers

          * Ignoring deadlines

          * Ignoring cancellation

          There's more, but these are real bugs that are trivially caught by using a real (and cheap) server.

    • mleonhard 2503 days ago
      One more thing that's a show-stopper for any public service: No flow control. This means anybody who can connect to your gRPC server can OOM it.
      • dikei 2503 days ago
        I suppose you can use a proxy to perform rate-limiting.
    • damnyou 2503 days ago
      You should never use protobuf types directly in your code. Always convert to native types at the edges — that will let you do validation.
      • kbwt 2503 days ago
        At that point, what does Protobuf really buy you?
        • damnyou 2502 days ago
          A rock solid serialization and RPC system.
    • adamson 2503 days ago
      > Things that you might expect to be invalid, are valid. A zero byte array deserialised as a protobuf message is a perfectly valid message. All the strings are "" (not null), the bools false, and the ints 0.

      How does this work? How do you make, say, all fields but the second null? Do you just send a messages that's (after encoding) as long as the first two fields, where the first field is 0x00 and the second contains whatever data you want?

      • violinist 2503 days ago
        Two things: 1) proto buffers intentionally don't allow null values; values that aren't set will return a default value 2) gRPC uses proto3, which does not distinguish between a field unset and a field set to the default value
  • kjeetgill 2503 days ago
    This article comes at a good time because I've been exploring the OpaenAPI vs. gRPC for a codebase that presently uses neither. Evaluating technology feels like a lot of navel gazing, so it's nice to hear other's experiences even if their uses don't line all the way up with ours.

    Disclaimer: Java fanboy bias. For services internal to a company, I think gRPC is an all around win. If you need to talk to browsers integrations, I don't have as many opinions.

    Personally, I really prefer working at the RPC layer rather than at the HTTP layer. It's OOP! It's SOA! Pick your favorite acronym! HTTP's use as a server protocol (as opposed to a browser) is mostly incidental. It works great but most of the HTTP spec entirely inapplicable to services. I like named exceptions to 200 vs 5xx 4xx error codes. Do I really care about GET, PATCH, PUT, HEAD, POST for most of my services when all of my KV/NewSQL/API-over-DB services have narrower semantics anyway.

    Out of band headers are nice though.

    Between protobufs, http2, and a fresh, active server implementation we see pretty solid latency and throughput improvements. It's hard to generalize but I suspect many users will. Performance isn't the only driving factor but it's nice to start from a solid base.

    I'm sure missing all the tools like curl and friends is an annoyance, but I like debugging from within my language, and in JVM land at least it's been easy enough.

    • atombender 2503 days ago
      Have you considered GraphQL? Lots of overlap with gRPC, but much more web-friendly. Much better support for optional vs. requires data, too. And comes with server push, replacing the need for WebSockets/SSE.

      Only downside I can think of is that there's no analogous mechanism to gRPC streams; you have to implement your own pagination.

      • kjeetgill 2503 days ago
        I havn't looked into GraphQL much at all, so correct me if I'm mistaken.

        From what I understand of it, the big idea is that instead of passing parameters from the client to the server and fully implementing the query logic, stitching, and reformatting etc. on the server side, you now have a way to pass some of that flexibility out to the client. Instead of updating both the server and the client as uses change, more can be done from the client alone.

        I spend most of my time on the infra side of things and rarely if ever make my way out to the browser so I can't speak to WebSockets/SSE or web friendliness. Being the "backend-for-backend" I just prefer being more tight-fisted about what my clients can and can't do. I mostly deal with internal customers with tighter SLAs so I like to capacity plan new uses.

        Maybe I'm just old fashioned.

  • rubenbe 2503 days ago
    I recently chose gRPC as a communication protocol between two devices (sort of IoT).

    Until now it works perfectly as expected. The C++ code generator provides a clean abstraction plus it saved a lot of time (both in programming and debugging). The gRPC proto file syntax also nudge you in the right direction wrt protocol design.

    When trying to "sell" gRPC it helps that there are generators for plenty of languages and it's backed by a major company.

    • gravypod 2503 days ago
      I wish that tooling around generating protoc into stubs and client libraries was simpler. I wish there was a single command I could run and turn large collection of proto files into libraries for "all" languages (python, java, C++, Node package, etc). Unfortunately there's no universal approach to this.
      • q3k 2503 days ago
        This seems like an odd requirement. Are you trying to generate stubs for your API users ahead of time? This will likely not work as generated stubs evolve in lockstep with protoc and runtime support libraries, and thus are not guaranteed to work across discrepant versions. Thus, stub code should be generated alongside the consumer/client. It also likely shouldn't be committed into a VCS.
        • gravypod 2503 days ago
          It would be done in CI. Generate stubs -> package/compile -> push to internal package repo.

          This way your protocol for your infrastructure is just another library.

          • q3k 2503 days ago
            Having an explicit 'create client library by generating/compiling proto stubs' is generally also bad mojo from my experience, unless you're also abstracting API stability and service discovery. If not, it will be unnecessarily painful to make a change to either the service discovery method or a non-backwards-compatible proto change, as you will have to lockstep both the service rollout, the library build and the client bump.
      • jsty 2503 days ago
        What makes that impossible now? I'm probably overlooking something in your use case, but couldn't you just have a simple build script / makefile that generated the libraries with a different protoc call for each library?
        • gravypod 2503 days ago
          Nothing makes it impossible but comparing it to things like thrift the complexity becomes apparent:

              thrift --gen <language> <Thrift filename>
          
          This handles the following languages: C (depends on GLib), Cocoa, C++, C#, D, delphi, Erlang, Go, Haskell, Java, Javascript, OCaml, Perl, PHP, Python, Ruby, Smalltalk. From this one command I can instantly integrate this into almost every build system I know of.

          On the other hand gRPC has the same features but it's a slightly different workflow. For each language you go to the language's code generation page, find the command line option for generating your language's code, read up on some decisions that were made for you, etc. All of that is fine, the part that annoys me a bit is each language needs a language module for the compiler (if it's not one of the core few languages). For example in the documentation for generating Go [1] they have you download and install protoc-gen-go from http://github.com/golang/protobuf assuming that you already have golang installed.

          gRPC seems much more focused on the idea that I want to define an API for my code, I want that specification to live inside the project that I am writing, and that you can figure out how to generate stubs on a language by language basis.

          What I want is something I can write a set of system specification files, type one command and get modules built for all languages. From there I can import those modules using my native language's favorite package manager (npm, composer, Hunter for CMake, etc). Ideally the Protocol Specification, the Library Generation, and the Library Usage are three components that are separate.

          [1] - https://developers.google.com/protocol-buffers/docs/referenc...

          • jsty 2503 days ago
            With the significant caveat that I haven't used it much myself (I mainly work with Bazel which obviates the need), I think Uber's prototool [0] can manage most if not all of that. Might be worth giving it a look.

            [0] https://github.com/uber/prototool

      • shereadsthenews 2503 days ago
        bzl build :my_proto_library ?
        • gravypod 2503 days ago
          In many conversations I've had with Google (or Google-Adjacent) engineers I've revealed a truth to them that was quite shocking: Bazel isn't the only build system in wide scale deployment currently. It's also far from the most used build system currently. While Bazel is a monolithic tool that solves this problem there is no external tool that solves this problem of configuring the different semantics and configuration for protos in different languages.
          • shereadsthenews 2503 days ago
            You said you wished for a command and I gave you one.
  • justicezyx 2503 days ago
    The truth is gRPC, like kubernetes, was built with decades of lessons of RPC framework inside a container oriented distributed environment; and more importantly, gRPC is the blessed framework inside Google as well, meaning it's qualified to power the largest and most complex distributed systems in the world (I think it'd safe to omit 'one of' here), which in comparison is not the case for kubernetes.

    Addition: Borg and kubernetes are designed with similar goals but different emphasis. They are like complementing twins had different personalities. For this I recommend Min Cai's Kubecon'18 presentation about peleton [1], the slide is titled "comparison of cluster manager architecture".

    [1] https://kccna18.sched.com/event/GrTx/peloton-a-unified-sched...

    • shereadsthenews 2503 days ago
      Wait, I don’t get it. Kubernetes:Borg::gRPC:Stubby. Google uses gRPC internally to the same extent that they use Kubernetes internally, i.e. hardly at all.
      • mehrdada 2503 days ago
        This analogy is very misleading. Kubernetes is probably never going to run any real workload internally at Google, but gRPC powers all external APIs of Google Cloud, and increasing other Google properties (e.g. ads, assistant), used by mobile apps like Duo and Allo, and have some big internal service use cases. The reason Stubby still dominates internally is simply taking lots of time to migrate to gRPC that might be hard to justify, but I do see gRPC being used very widely internally at Google; it’s simply a matter of time. I don’t see that happening to Kubernetes; it’s a joke when compared to Borg.

        Google aside, many other companies like Dropbox rely on gRPC extensively to successfully run infrastructure: https://static.sched.com/hosted_files/grpconf19/f7/Courier%2...

        • CydeWeys 2503 days ago
          I work at Google and my team has real workloads running on Kubernetes.

          There's plenty of internal teams that use GCP. Increasingly this might be the direction things are heading.

          • mehrdada 2503 days ago
            GCP itself is a job on Borg. ;)
            • justicezyx 2503 days ago
              That's not true. GCE uses Borg very differently than normal Google internal systems, which you can imagine is quite natural as they are reserving different customers. GCS and other system, in turn, also differ wildly than GCE. when you talk about GCP as a whole, it becomes impossible to summarize in a few statement, and I doubt there is anyone on earth who is capable to describe it coherently even without time constraint.
              • mehrdada 2503 days ago
                What I said (GCP runs on Borg) is absolutely and technically correct, affirmed by your own comment, which highlights the power and flexibility of Borg. The point being no one[1] at Google relies on Kubernetes for raw cluster management capabilities at scale. They might use it for other things that can make deployment more friendly in some scenarios. (This doesn’t make Kubernetes a bad system by any means, just quite different and not a substitute for Borg whereas gRPC is a direct substitute for Stubby). This debate is better argued in your own eng-misc@ and not on a public forum.

                [1]: no-one that we care. At Google this is obviously always incorrect. There’s always that someone who uses weird things like mongoDB and AWS.

          • shereadsthenews 2503 days ago
            And there's no reason why a small project should not. But nobody is going to move, say, indexing to GCP. And when it comes to power laws the big things are big and the small things are not.
            • CydeWeys 2503 days ago
              This sounds like a No True Scotsman argument to me, that if something runs on GCP instead of Borg, it isn't "real". Also throw in shades of moving goalposts.

              Indexing doesn't run on GCP primarily because it's legacy (as in, the first product Google ever did) and thus long predates GCP itself.

              • shereadsthenews 2503 days ago
                It’s neither of those fallacies. The fallacy is to suppose that if you know several people using technology X then it must be quite popular. We see this all the time on HN where people suppose that, say, Erlang is quite popular because there are dozens of companies, each with five engineers, using it. But then we ignore that there are five companies with a hundred thousand engineers each that do everything in c++. It’s the same with these other things. It’s quite like that K8s satisfies the requirements of 80% of the projects at Google and it’s also quite likely that all of them put together consume 1% of the production resources, so it leads to the question of whether it’s even capable of solving a really large problem, as mehrdada argues elsewhere in this thread.
                • CydeWeys 2503 days ago
                  It's not a Google product obviously, but Snapchat runs on GCP. That's quite big. Is that not a "real" product? Admittedly they're on App Engine, a much older product than Kubernetes, but I suspect they'd be able to run on Kubernetes, and perhaps that's what they would choose if they were to build from scratch right now.
              • the-rc 2503 days ago
                Indexing has been rewritten many times over. Even if you removed all the dependencies on Bigtable and co., I think indexing would be the last to move, for quite practical reasons, due to its sheer size and design. The parent poster picked probably the worst workload to migrate to public GCP. Gmail, YouTube and search serving are easier in comparison.
        • shereadsthenews 2503 days ago
          I guess you’d find that on an rpc-weighted scale Stubby handles several orders of magnitude more traffic than those gRPC endpoints you mentioned.
          • mehrdada 2503 days ago
            This may be true today (although even on this metric I’d estimate Kubernetes to be off by some orders of magnitude, bordering zero). My point is there’s a path and plan forward for gRPC adoption and it’s a matter of transition to a new system (which can, admittedly, be very long). For Kubernetes, I don’t think there is a credible path for replacing Borg.
        • enitihas 2503 days ago
          Curiously, what do you find lacking in Kubernetes compared to Borg?
          • mehrdada 2503 days ago
            Scale, for one thing.

            Kubernetes is not a bad system but it’s not designed to run Google.

            • Thaxll 2503 days ago
              What google system runs more than 5k tasks on a single cluster?
              • the-rc 2503 days ago
                https://github.com/google/cluster-data/blob/master/README.md...

                "ClusterData2011_2 provides data from an 12.5k-machine cell over about a month-long period in May 2011."

              • shereadsthenews 2503 days ago
                10k replicas in a single cell is the default charge-free quota of every individual engineer at that company. It’s basically zero.
              • dilyevsky 2503 days ago
                Heh, you have no idea. Lots and lots of systems. And 5k wouldn't even register there...
                • Thaxll 2500 days ago
                  Can't edit my comment anymore but I meant 5k nodes / 150k pods / tasks.
            • int0x80 2503 days ago
              Can you expand on that, please?
              • dilyevsky 2503 days ago
                I used to work on one of Borg teams at Google and now run Kubernetes platform. Nearly every Kubernetes component (node, master, networking) will melt down at fraction of Borg scale. It's not even close.
                • int0x80 2503 days ago
                  thanks for the answer.
              • q3k 2503 days ago
                https://kubernetes.io/docs/setup/cluster-large/

                More specifically, "No more than 5000 nodes" and "No more than 150000 total pods" is fairly limiting to large (Google-large) clusters.

                • int0x80 2503 days ago
                  thanks for the answer.
    • techslave 2503 days ago
      grpc was built way before containers were a thing. jails and zones were barely out of the gate at the time.
      • justicezyx 2503 days ago
        You are taking about stubby, I would guess.

        Borg is cirra 2003, pb/stubby was before that. Gfs probably was similar in timing as stubby. And many other cluster level foundations. In the end, Borg is the true corner stone that ties everything together and completes the Google infrastructure puzzle (or the modern global scale cluster computing).

  • sytelus 2503 days ago
    I'd needed RPC framework for few of my projects but every time I'd considered gRPC, I ended up walked away from it. The big issue is that gRPC has a huge amount of dependencies and it tries to a lot of things, many of them might be irrelevant for you but would cause extra headache anyway. When all need is just serializing your stuff and send over the wire, there are much better lightweight frameworks. For C++, I think RpcLib is one of the best. It doesn't even require maintaining .proto file, do "compile" of schems every time you change something, etc. The moral of the story is that always look around instead of just going for the most popular solution first.
  • shereadsthenews 2503 days ago
    A couple of subtly wrong points in the article. Firstly gRPC payload can be anything, needs not be an encoded protocol buffer. Secondly there’s not a whole lot of “validation” going on in the protobuf codec. Basically any fundamentally correct buffer encoded as message A will decode successfully as message B for any B. If there are unknown fields they are silently consumed. If there are missing fields they are given the default values and there is no “required” in proto3. So there is significantly less safety, and significantly more flexibility, in gRPC than people generally realize.
    • ntenenz 2503 days ago
      `required` was removed due to the challenges it introduces in designing backwards-compatible API changes[1].

      [1] https://github.com/protocolbuffers/protobuf/issues/2497#issu...

      • docker_up 2503 days ago
        "Required" fields that are no longer required, and "optional" fields that are no longer optional are basically 6 of one and half a dozen of another.

        I'm personally strongly in the "required" camp because at least the interface makes an attempt at giving clues to a user as to what fields are important. If everything is optional, there's no information being passed as to what is important anymore.

    • duality 2503 days ago
      "Basically any fundamentally correct buffer encoded as message A will decode successfully as message B for any B."

      This is incorrect. I suspect you're overextending proto3's treatment of unknown fields to include discarding incorrectly typed fields too. If A has field 1 types as an int, and B has field 1 typed as a string, an A message with field 1 set will not parse as a B message. However, if the A message has no fields set, or sets a field number unknown to B, that could parse successfully with "leftover" unknown fields.

      • kentonv 2503 days ago
        > If A has field 1 types as an int, and B has field 1 typed as a string, an A message with field 1 set will not parse as a B message.

        In the C++ reference implementation, which I wrote, this is not true. The field 1 with the wrong wire type would be treated as an unknown field, not an error.

        It's possible that implementations in other languages have different behavior, but that would be a bug. The C++ implementation is considered the reference implementation that all others should follow.

        However, shereadsthenews' assertion is not quite right either. Specifically, a string field and a sub-message field both use the same wire type; essentially, the message is encoded into a byte string. So if message A has field 1 type string, containing some bytes that aren't a protobuf, and message B has field 1 type sub-message, then you'll get a parse error.

        But it is indeed quite common that one message type parses successfully as another unrelated type.

        • therein 2503 days ago
          > In the C++ reference implementation, which I wrote, this is not true. The field 1 with the wrong wire type would be treated as an unknown field, not an error.

          Yeah I was about to say, protobuf C++ implementation will definitely treat it as an unknown field. I just had it do that a few days ago. :)

      • shereadsthenews 2503 days ago
        Ok but these messages are isomorphic on the wire:

          message enc {
            int foo = 1;
            SomeMessage bar = 2;
          }
        
          message dec {
            bool should_explode = 1;
            string why = 2;
          }
        
        You can successfully decode the latter from an encoding of the former.
        • dweis 2503 days ago
          Minor nit, but not necessarily. For basically all values of SomeMessage, dec should fail to parse due to improperly encoded UTF8 data for field 2 (modulo some proto2 vs. proto3 and language binding implementation differences).

          Change field 2 to a bytes field instead of a string field and then yes.

          • shereadsthenews 2503 days ago
            I should mention that I consider this a feature not a bug. The isomorphism permits an endpoint to use ‘bytes submessage_i_dont_need_to_decode’ to cheaply handle nested message structures that need to be preserved but not inspected, such as in a proxy application.
          • shereadsthenews 2503 days ago
            True but UTF8 enforcements was quite absent in all implementations until proto3, and the empty string would be a special case.
        • keymone 2503 days ago
          Bool will decode from int’s encoding??
      • dweis 2503 days ago
        I don't think this is the case, or at least, I'd expect it to be a bug.

        Protocol Buffers should generally be non-destructive of the underlying data. That means even if it encounters the wrong wire type for a field, it should simply retain that value in the unknown field set rather than discard it.

  • dunk010 2503 days ago
    Rich Hickey, the creator of Clojure, gave a talk with some very relevant points in this space (The Language of the System): https://www.youtube.com/watch?v=ROor6_NGIWU
    • dustingetz 2502 days ago
      "gRPC is great because most systems just glue one side effect to another side effect, so what's the point of packaging that into REST" – I think a HN comment from a googler

      The great thing about Clojure is that you can make holistic systems that are end-to-end immutable and value-centric, which means the side effects can go away, which means gRPC stops making sense and we can start building abstractions instead of procedures!

    • cgdub 2503 days ago
      His point about protocol buffers (i.e. schema-out-of-band protocols) is unfortunately brief in this talk.

      Depending on your use case, you may have to do a lot to work around protocol buffers not being self-describing. I haven't seen a good description of the problem online, but if you find yourself embedding JSON data in protobufs to avoid patching middleware services constantly, you should look at something like Avro or MessagePack or Amazon Ion.

  • nevi-me 2503 days ago
    I'm yet to see enough services expose a gRPC endpoint, at least other than Google. That'll keep the perception of adoption s/low.

    I'm writing this as I take a break from working on a polyglot project made up of Kotlin, Rust, Node, where we use gRPC and gRPC-web. We're slowly stealing endpoints from the other services/languages to Rust.

    Without focusing on the war of languages, the codegen benefits of protobufs have made what used to be a lot of JSON serde much easier.

    • mleonhard 2503 days ago
      Can you point to a single public Google-run gRPC service? I was under the impression that all connections into Google are proxied by GFE (Google Front-End) servers to internal servers running the Stubby RPC server code. GFE is definitely not running gRPC server code. I don't believe a gRPC endpoint could pass Google's own Production Readiness Review process.
      • terinjokes 2503 days ago
        I've seen Google endpoints available over gRPC over the last few years. Many, if not most, of the Cloud endpoints are directly documented as being available over gRPC[0]. For others, like Google Ads, a peak at the client libraries show it's using gRPC[1] as well.

        [0]: https://cloud.google.com/pubsub/docs/reference/service_apis_...

        [1]: https://github.com/googleads/google-ads-java/blob/master/goo...

        • nevi-me 2503 days ago
          Yes, this. A lot (based on my last interaction 2 yrs ago) of Google's SDKs are convenience methods that hide gRPC behind. If you use a language that doesn't have an SDK, you can mostly connect directly to their rpc endpoints.
      • nevi-me 2503 days ago
        The googleapis [1] repo has the publicly accessible gRPC definitions, which you can access directly. I've done this before, though it was a bit tedious as I had to learn how to manually pass Google credentials (documentation wasn't good enough).

        [1] https://github.com/googleapis/googleapis

  • _ZeD_ 2503 days ago
    hey, hey! have you known about this new "webservices" stuff? with SOAP you can call remote code as it is here! and with WSDL you can create automatically the client!
    • adrianmonk 2503 days ago
      Short summary of web services:

      Phase 1: Ad hoc, free-for-all chaos.

      Phase 2: SOAP tries to bring order. It fails mainly because the "S" ("Simple") is a lie.

      Phase 3: Pendulum swings hard toward simplicity with HTTP plus JSON plus nothing else, thanks.

      Phase 4: Things shift possibly more toward the middle (a little structure), but none of the competing systems have become obvious winners.

      • crehn 2503 days ago
        Most things in life seem to change like a pendulum learning from previous swings, slowly converging to a healthy middle. Extremes are useful since they give perspective and attractive since they're easy to grasp.
    • Sevii 2503 days ago
      Except for the fun time when the WSDL doesn't match the actual implementation because 'they don't support WSDL'. (despite serving one from their SOAP service)
      • tracker1 2502 days ago
        Or worse, when the WSDL has a response type of "Object" ... OMG was this ever painful to generate clients for. Usually cheated and used Node as a bridge service.
    • nullwasamistake 2503 days ago
      Man I miss WSDL. We're building it all over again with gRPC.
      • docker_up 2503 days ago
        gRPC is exactly like XDR and ONC-RPCs from the Unix days of the 90s, ex. NFS.
  • stephenr 2503 days ago
    Apart from “well google (created|uses) it” I don’t really get the benefit of gRPC compared to any other rpc, eg jsonrpc or even xmlrpc, both of which are fairly static, open specifications for a way to communicate, rather than actual releases of a library that apparently has a new release every 10 days or so.
    • wvenable 2503 days ago
      Binary. Streaming. Strongly typed (with caveats). There's a whole article about the advantages/differences linked from the top of this page.
      • jwalton 2503 days ago
        > My API just returned a single JSON array, so the server couldn’t send anything until it had collected all results.

        Why can't you stream a JSON array?

        Edit: Here's a (hastily created and untested) node.js example, even:

            class JSONArrayStream extends Transform {
                constructor() {
                    super({readableObjectMode: false, writableObjectMode: true});
                    this.dataWritten = false;        
                }
        
                _transform(data, encoding, callback) {
                    if (!this.dataWritten) {
                        this.dataWritten = true;
                        this.push('[\n');
                        this.push(JSON.stringify(data) + '\n');
                    } else {
                        this.push(',' + JSON.stringify(data) + '\n');
                    }
                    callback();
                }
        
                _flush(callback) {
                    this.push('\n]');
                }
            }
        • spenczar5 2503 days ago
          There can be a bit more to it:

          - how can a client send an error message if it runs into problems mid-stream? You can invent a system, but you’re walking into an ad-hoc protocol pretty fast; why not use something others wrote?

          - what if the remote end wants to interrupt the stream sender to say “stop sending me this” for any reason? For example, an erroneous item in the stream, or a server closing down during a restart.

          - grpc supports fully bidirectional streams, interleaving request and response in a chatty session; how do you do this?

          Not that the original article mentioned these. I bristle though when I hear the engineer’s impulse to “why don’t you just”-away at something.

        • stephenr 2503 days ago
          you’d probably need to do some tricks to get it to parse in a browser.

          Editing, because I can’t reply: I was specifically going to mention SSE/EventSource but expected an immediate “not everything is a browser” response.

          Editing the 2nd: yep, that’s kinda what I meant by “tricks” - essentially splitting out chunks to pass to the json parser.

          • jwalton 2503 days ago
            See my edited example above; you can stream data to this, it'll stream nicely to the browser. This uses "\n"s at the end of every line, which means you can write a very simple streaming parser client side, because you can just split the input at "\n," to get nice JSON bits, but there's certainly JSON streaming libraries on NPM that will parse this for you more "properly". And, it parses with a normal JSON parser too.
          • pbedat 2503 days ago
            Modern browsers understand text/event-stream so no need for dirty tricks ;)
        • pbedat 2503 days ago
          Also: Server sent events
      • stephenr 2503 days ago
        https://news.ycombinator.com/item?id=20023784 Specifically claims that validation isn’t performed as the article mentions

        “binary” is not necessarily a benefit, particularly for developer tooling/debugging

        Streaming is one area it may have a benefit but honestly the other issues outweigh that possible benefit (and it’s not like there aren’t other ways to stream data to a browser without resorting to polling)

        • ska 2503 days ago
          It's not just streaming. Any heavy natively binary objects (think scientific computing) are a pain to marshal without a good binary interface, and it can easily become a real performance issue.
    • cblum 2503 days ago
      Of course you're being downvoted. But I'm totally with you on this one.

      As someone currently responsible for migrating services to gRPC at my company, I always say that the main reason people are switching is "because Google."

      While there are merits to gRPC and protobuf, I don't think there are enough advantages to throw away REST and JSON and all the tooling around it. The moment you start switching, you start feeling the pain.

      "Because Google" is also the main reason everyone wants to run their crap on Kubernetes. It's pure hype.

      Always makes me think of "You Are Not Google": https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb

  • pilif 2503 days ago
    The article says that one of the advantages of gRPC is streaming and that JSON wouldn’t support streaming.

    That’s however just an implementation detail. JSON can easily be written and read as a stream.

    Switching your whole architecture, dealing with a binary protocol and the accompanying tooling issues just because of your choice of JSON parser feels like total overkill.

    JSON over HTTP is ubiquitous, has amazing tooling and his highly debuggable. Parsers have become so fast that I feel they might even have the opportunity to be faster than a protobuf based solution.

    Finally I don’t buy the argument about validation. You have to validate input and output on the boundaries no matter what.

    Even when your interface says “this is a double”, it says nothing about ranges (as seen in the article where valid ranges were specified in the comment) for example.

    • maltalex 2503 days ago
      > Parsers have become so fast that I feel they might even have the opportunity to be faster than a protobuf based solution.

      Not even close. Event new JSON serializers/deserializers aren't magic. Protobuf is a LOT easier to parse, so it's naturally a LOT faster.

      First two duck results for "json vs protobuf benchmark":

      https://auth0.com/blog/beating-json-performance-with-protobu...

      https://codeburst.io/json-vs-protocol-buffers-vs-flatbuffers...

      • ricardobeat 2503 days ago
        The first link shows a mere 4% margin when talking to a JavaScript VM.

        Even at a 5x improvement, most projects will never reach a point where the transport encoding is a bottleneck. Protobuf has a lot going for it (currently using in a project) but can’t be sold on speed alone.

        • denormalfloat 2503 days ago
          Is the JSON parser implemented natively, or in JS? It may not be apples-to-apples.
          • scottlamb 2503 days ago
            > Is the JSON parser implemented natively, or in JS? It may not be apples-to-apples.

            True, but if you're wanting an implementation you can use in Javascript running in the browser, it may accurately reflect reality. You have a high-quality browser-supplied (presumably native) implementation of JSON available. For a protobuf parser, you've just got Javascript. (You can call into webassembly, but given that afaik it can't produce Javascript objects on its own, it's not clear to me there's any advantage in doing so unless you're moving the calling code into webassembly also.)

            I don't think browser-based parsing speed is important though. It's probably not a major contributor to display/interaction latency, energy use, or any other metric you care about. If it is, maybe you're wasting bandwidth by sending a bunch of data that's discarded immediately after parsing.

          • fanf2 2503 days ago
            My guess would be that most of the cost is creating the JS objects and the parsing is a relatively small part of the cost, so optimizing it would not help much.
          • connor4312 2503 days ago
            Yea, the V8 json parser is implemented naively and optimized alongside the engine in a way that other serialization methods in Javascript, and JSON in other languages, is generally not.
    • mehrdada 2503 days ago
      As mentioned in other comments, gRPC transport is orthogonal to Protobuf serialization. The gRPC runtime library takes no dependency on that. You can use gRPC with JSON. It just happens the default code generators use protobuf IDL and serialization. You can use gRPC library with your own JSON based stub generator.
      • kjeetgill 2503 days ago
        While that's true I think protobufs are (correctly) seen as the standard preferred way to use gRPC. The first point from the main page:

        > Simple service definition

        > Define your service using Protocol Buffers, a powerful binary serialization toolset and language

        It's a little unfair to call it that orthogonal.

    • Thaxll 2503 days ago
      You can't do good streaming using REST / JSON, it's either broken / slow / badly implemented. And that's for one direction, bidirectional streaming is not even possible.
    • EugeneOZ 2503 days ago
      Not all of your API endpoints should response with JSON, that's all. Create endpoint for streaming - it's simple solution.
  • dewey 2503 days ago
    There's also https://github.com/uw-labs/bloomrpc/blob/master/README.md which is kinda like Postman but for gRPC. I didn't see it mentioned in the Caveats section of the post so maybe useful to someone else too.
  • superfreek 2503 days ago
    gRPC is great, but my issues with it are debugging and supporting the browser as a first class citizen.

    We've been working hard on OpenRPC [0]. An Interface Description for JSON-RPC akin to swagger. It's a good middle ground between the two.

    [0] https://open-rpc.org

    • anderspitman 2502 days ago
      No streaming? I poked through the docs and spec but didn't see it mentioned. Assuming it's just JSON-RPC under the hood that answers my question, but maybe ya'll have added support on top.
    • denormalfloat 2503 days ago
      Have you looked at gRPC-Web?
      • huehehue 2503 days ago
        I have mixed feelings about gRPC-Web, and welcome alternatives. Setting up a proxy with any sort of non-standard config can be a pain, gRPC-Web doesn't translate outbound request data for you which can get ugly[0], and your service bindings may or may not try to cast strings to JS number types which silently fail if over MAX_SAFE_INTEGER.

        [0] Instead of passing in a plain object, you build it as such:

          const userLookup = new UserLookupRequest();
          const idField = new UserID();
          idField.setValue(29);
          userLookup.setId(idField);
          UserService.findUser(userLookup);
        
        The metadata field doesn't seem to mind though...
  • SamReidHughes 2503 days ago
    I'd just like to say I appreciate the writing at the beginning of the article.

    "While more speed is always welcome, there are two aspects that were more important for us: clear interface specifications and support for streaming."

    This offers a quick exit for anybody who already knows about these advantages.

  • signa11 2503 days ago
    one fundamental issue with grpc seems to be that every request for a given service ends up either creating a thread or use an existing one from a pool-of-threads. ofcourse, you cannot limit the number of threads because that will lead to deadlocks.

    i _suspect_ for google-scale it should all be fine, where available cpu's are essentially limitless, and consistency of data gets handled f.e. due to multiple updates etc. at a different layer.

    writing safe, performant, multi-threaded code in presence of signals/exceptions etc. is non-trivial regardless of how your 'frontend' looks like. async-grpc is quite unwieldy imho.

    i have heard folks trying grpc out on cpu-horsepower-starved devices f.e. wireless-base-stations etc. and running into aforementioned issues.

  • ahuang 2503 days ago
    gRPC isn't a requirement for response streaming (as is quoted as one of the main reasons for doing the migration). That can all be achieved with http/json using chunked encoding. In fact, that's what the gRPC-gateway (http/json gateway to a gRPC service) does https://github.com/grpc-ecosystem/grpc-gateway.

    gRPC adds bi-directional streaming which is not possible in http, but the use cases for that are more specialized.

    • anderspitman 2502 days ago
      Sure, the actual transfer will be streamed, but most JSON clients wait for the entire response before firing your callback. As far as I know there isn't even a commonly used spec for partially reading a JSON document.
  • ishaanbahal 2503 days ago
    Great to see people using GRPC, but this article doesn't state anything that the actual grpc.io website doesn't, except for the OpenAPI comparison.
  • j16sdiz 2503 days ago
    I don't see the article answering the "why" question. It was just "We don't like what we where using, so we tried gRPC"
  • ww520 2503 days ago
    Another alternative is Thrift. It got lots of language bindings and the servers superb.
  • qwerty456127 2503 days ago
    IMHO the only reason to use HTTP for services today is caching.
  • ec109685 2503 days ago
    Parsing speed shouldn’t be a factor in deciding. Parsing protobufs on the browser is going to be way slower than using a native json parser and even on the server, there are java libraries that are much faster than protobufs, e.g. https://jsoniter.com/

    That’s why formats like FlatBuffer where written; however parsing is likely not going to dominate your application, so other factors should influence your decision instead.

    • kentonv 2503 days ago
      > on the server, there are java libraries that are much faster than protobufs

      Be careful not to back broad arguments with outlier benchmarks.

      In general, it is plainly true that JSON is much more computationally difficult to encode and decode than Protobuf. Sure, if you compare a carefully micro-optimized JSON implementation against a less-optimized Protobuf implementation, it might win in some cases. That doesn't mean that Protobuf and JSON perform equivalently in general.

      • ec109685 2503 days ago
        What is the reason not to use the micro optimized JSON implementation if parsing becomes your bottleneck?
        • kentonv 2503 days ago
          I don't think I said that?
          • ec109685 2503 days ago
            My point is that json is always “fast enough”. Either you don’t care about parsing speed and can use what is most ergonomic or you do care and you’ll use an optimized library.

            You’ll never need to move to protobufs due to parsing speed.

    • rurban 2503 days ago
      > however parsing is likely not going to dominate your application, so other factors should influence your decision instead.

      Exactly. If you need performance, you'll use Cap'n Proto or FlatBuffer, which use a native binary interface, so you don't need to create and copy objects, you just map them them in from IO.

      • anderspitman 2502 days ago
        Sadly, capnproto's awesome RPC system doesn't appear to support streaming, though I do believe gRPC supports FlatBuffers.
        • kentonv 2501 days ago
          > Sadly, capnproto's awesome RPC system doesn't appear to support streaming

          Sure it does. You can implement "streaming" in Cap'n Proto by introducing a callback object, and making one RPC call for each item / chunk in the stream. In Cap'n Proto, "streaming" is just a design pattern, not something that needs to be explicitly built in, because Cap'n Proto is inherently far more expressive than gRPC.

          That is, you can define a type like:

              interface Stream(T) {
                write @0 (item :T);
                end @1 ();  # signal successful end of stream
              }
          
          Then you can define streaming methods like:

              streamUp @0 (...params...) -> (stream :Stream(T), ...results...)
              # Method with client->server stream.
          
              streamDown @0 (...params..., stream :Stream(T)) -> (...results...)
              # Method with server->client stream.
          
          
          Admittedly, this technique has the problem that the application has to do its own flow control -- it has to keep multiple calls in-flight to saturate the connection, but needs to place a cap on the number of calls in order to avoid excess buffering. This is doable, but somewhat inconvenient.

          So I am actually in the process of extending the implementation to make this logic built-in:

          https://github.com/capnproto/capnproto/pull/825

          Note that PR doesn't add anything new to the RPC protocol; it just provides helpers to tell the app how many concurrent calls to make.

          • anderspitman 2498 days ago
            Interesting. I do tend to favor protocols that are a bit lower level and more flexible for composing higher-level functionality. Though this does sound pretty complicated to implement using only what capnproto offers right now. Would there be a way to jury rig "request(n)" backpressure as described in reactive streams[0] (also implemented by RSocket[1]) on top of capnproto? That's what I'm using for omnistreams and it's proven very simple to reason implement and reason about.

            [0] https://github.com/reactive-streams/reactive-streams-jvm

            [1] http://rsocket.io/

            [2] https://github.com/omnistreams/omnistreams-spec

            • anderspitman 2498 days ago
              Actually the more I think about it I don't think it would work, since request(n) assumes the producer can send multiple messages for each request.
  • kiliancs 2503 days ago
    > When you use a microservice-style architecture, one pretty fundamental decision you need to make is: how do your services talk to each other?

    Problems I don't have when using Erlang/Elixir umbrella apps + OTP.

  • techslave 2503 days ago
    answer: for trivial reasons. too bad he didn’t dig deeper.
  • 781 2503 days ago
    Does anybody remember reading an article along the line "we use X because it's cool and trending and we are cool people?" Not saying it's the case here, but did anybody honestly admitted in an article that they used a technology because it makes them look cool?

    I do remember reading quite a lot of articles about the inverse of this: "we don't hire people using Windows/IDEs/ because it says a lot about them, a craftsman should chose his tools wisely, ...", but never the positive.

    • stephenr 2503 days ago
      I got a very strong “kool aid” vibe from this.

      I don’t remember any “we don’t hire people on Windows/using an IDE” (the last part would be particularly weird IMO), but I wouldn’t be surprised if somewhere said “if you want to use Windows you’re on your own (support wise) and if it becomes a time sink you switch or find work elsewhere”.

      I’ve supported (in terms of dev environment/tooling) people on Macs, Windows and Linux. Windows by far had the weirdest issues to solve/avoid.