Go Data Structures (2009)

(research.swtch.com)

118 points | by rednafi 10 hours ago

4 comments

  • zellyn 5 hours ago
    The most important thing to know is that Go slices are exactly how you would implement a dynamically growable slice in C if I asked you to. No less, but importantly, no more: a pointer to a piece of memory, and a current length and capacity.

    The classic slice mistake is to forget that append doesn't necessarily return a completely new slice backed by a disjoint piece of memory, especially easy if you've been using a language where variables are immutable by default.

    The problem is that it will often accidentally work; if you over-run the backing store, you will get a new piece of memory:

        a := make([]int, 1, 1) // a is backed by a one-element-long piece of memory
        a[0] = 42
        b := append(a, 17) // this must allocate a new piece of memory
        c := append(a, 37) // this must also allocate a new piece of memory
    
        // At this point a == [42], b == [42, 17], c == [42, 37]
    
    The problem is when the existing capacity is sufficient:

        a := make([]int, 1, 10) // a is backed by a ten-element-long piece of memory
        a[0] = 42
        b := append(a, 17) // no need to allocate: just use the existing memory
        c := append(a, 37) // no need to allocate: just (re)use the existing memory
    
        // At this point a == [42], b == [42, 37], c == [42, 37]
        //                          ^^^^^^^^^^^^^
    • ustad 20 minutes ago
      Go's slice behavior reveals a concerning trade-off in language design where performance was prioritized over predictability. Consider this trap:

          a := make([]int, 1, 10)
          a[0] = 42
          b := append(a, 17)
          c := append(a, 37)
          // Surprise! b and c both equal [42, 37]
      
      The fact that append()'s behavior changes based on a hidden property (capacity) violates what I consider a core principle of language design: fundamental operations should be predictable and easy to reason about without having to think about implementation details.

      While I understand the performance benefits of reusing memory, language fundamentals shouldn't require developers to constantly think about backing array capacity to avoid bugs. This is especially problematic because it "usually works" until it doesn't - the worst kind of bug.

      Modern languages have shown us better ways. Rust makes ownership explicit. Immutable-by-default languages prevent such surprises entirely. Even if Go wanted to maintain performance, they could have made this behavior more explicit - perhaps with separate "append_safe" and "append_reuse" functions.

      Programming language fundamentals should be like good UI design - they should do what you expect them to do. When core operations have surprising edge cases, it creates cognitive overhead that ripples through all code written in that language.

    • kccqzy 3 hours ago
      I'd argue that the Go example is worse than the equivalent in C, because Go has a garbage collector and C does not. In C, precisely because of the lack of a garbage collector, people need to explicitly and carefully document who is responsible for deallocating the memory. This makes things clear who is owning that memory. And this by extension leads to an intuition on whether the append is acceptable on the original slice or not. If a C function returns such a slice and is documented that the caller should free it, you can call realloc and add to it; if the C function documents that the caller should not free it, you naturally wouldn't even try to append to it: you would copy it first.

      In Go, this garbage collector frees people from thinking about freeing memory. But it doesn't free people from thinking about owned vs borrowed memory. And that's where bugs can happen.

      • zellyn 3 hours ago
        I happen to like the way Go did it, but now that generics and iterators have landed, it should be possible to create an always-make-a-copy vector type (although it'll need to use `get` and `set` like Java Lists, not [42] like slices/arrays).

        It's not like you could have appended twice to the same List in Java and expected to get two disjoint arrays, unless you copy first, and `slices.Clone()` makes that easy in Go now too.

      • kbolino 3 hours ago
        This isn't really a practical problem.

        If you created the slice, you control it and you can modify its elements, append to it, etc.

        If you didn't create the slice, you don't control it. If you want to modify it or append to it, you should copy it first.

        This has reflected how I've seen Go used professionally by people experienced in the language. The language could offer more help (but probably never will), but it's not hardly a regression from C. The real regression from C is the lack of const pointers.

        • pjmlp 3 hours ago
          And enums,

             type Colours (Red, Blue, Green)
          
             //....
             c := Colours.Blue
             a := pred(c)
             b := succ(c)
             
             for x := range Colours { 
                //....
             }
          
          
          1976's Pascal style is unthinkable to ever land in Go, but we have iota and const, lets not complain.
          • kbolino 2 hours ago
            Go's "enums" are worse than a lot of other languages, but they're not worse than C.
            • pjmlp 2 hours ago
              Indeed, then again maybe we should not design languages with features worse than C in the 21st century, having 60 years of field experience.
              • kbolino 43 minutes ago
                It's not worse. It's just only slightly better. The most damning thing you can say about Go is not that it doesn't improve upon C, it's that it improves only on C (and only in the ways cared about). The authors of Go really didn't examine other languages very closely. So starting with the authors' goals (light on the page, bounds checked, concurrency with message passing, easy to pick up, fast to compile) and a fairly deep knowledge of C (but little else) you pretty much get Go (especially early Go).
            • 9rx 1 hour ago
              Go enums are the same as in every single other language. After all, all an enum does is count. There isn't anything more you can do with it.

              You can introduce data structures and types that utilize enums, with some languages taking that idea further than others, but that's well beyond enums themselves.

              • masklinn 1 hour ago
                > Go enums are the same as in every single other language. After all, all an enum does is count. There isn't anything more you can do with it.

                You may want to meet a modern language one day. Or hell not even a modern language, even a shit one like Java.

                As it turns out there are languages where you can define an

                   enum foo { bar, baz }
                
                and when you type a parameter as a `foo` you're not at risk of getting 69 or 4328.
                • 9rx 1 hour ago
                  > You may want to meet a modern language one day.

                  Like what? CrabLang? Its enums are identical to Go, unsurprisingly. Above the enum rests a discriminated union that ends up hiding the details of the enum. That is where things begin to differ greatly. You have to do some real trickery to actually get at the underlying enum result in that language. But, when you do, you see that it is exactly the same.

                  > and when you type a parameter as a `foo` you're not at risk of getting 69 or 4328.

                  That's thanks to the type system, though, not enums. Enums are not a type. Enums are a number generator. Hence the name.

                  • tialaramex 54 minutes ago
                    > That's thanks to the type system, though, not enums. Enums are not a type. Enums are a number generator. Hence the name.

                    What's happened here is that you've mistaken "Things I believe" for "What everybody else believes" but you're talking to other people, not yourself, so, this makes you seem like a blithering idiot.

                    The particular version of this trap you've fallen into is a variation of the "It's all just the machine integers" mental model from C. It's just a model and the problem arises when you mistake that model for reality.

                    Now, technically this model isn't even correct for C's abstract machine, but it's close enough for many programmers and it tends to match how they think the hardware works, which is even more confusing for the hardware people who know how it actually works but that's another conversation.

                    This model is completely useless for languages which don't have the same type system, and so it's no surprise that it immediately led you astray.

                    • 9rx 37 minutes ago
                      > but you're talking to other people

                      No, I am clearly talking to a computer program. It is possible that the program is forwarding the discussion on to people. Perhaps that is what you are trying to allude to? The details of how the software works behind the scenes is beyond my concern. There is no intention to talk to other people, even if the software has somehow created that situation incidentally. If I wanted to talk to people, I would go out and talk to people, not type away at my computer into a box given to me by the software.

                      > The particular version of this trap you've fallen into is a variation of the "It's all just the machine integers" mental model from C.

                      As much as I enjoy your pet definition that you've arbitrarily made up on the spot here, the particular trap I have fallen into is the dictionary. It literally states what an enumeration is according to the prevailing usage. It does not describe it as a type, it describes it as the action of mentioning a number of things one by one. Which is exactly what an enum does.

                      The previous comment is talking about type constraints. You can definitely constrain a type such that it is invalid to use it outside of the numbers generated by the enum, just as you can constrain a type to only accept certain strings. e.g. from Typescript: `type Email = "{string}@{string}"` This idea is not limited to enums.

                      That's definitely a thing, and definitely a thing that could be used in conjunction with an enum, but is not an enum itself. Not as enum is normally used. Of course you can arbitrarily define it however you please. But, if you are accepting of each holding their own pet definition, your comment doesn't work. You can't have it both ways.

                • noapologies 1 hour ago
                  > ... even a shit one like Java.

                  I agree with your point about Go enums.

                  But in defense of Java, modern Java is actually pretty pleasant.

                  Virtual threads, records and sealed classes, pattern matching, state-of-the-art garbage collectors, a great standard library etc. (and obviously well-behaved enums).

                  Not to mention the other languages you get for free with the JVM ecosystem.

                  It might not be as expressive as Rust, but certainly Java/JVM > Go.

        • kccqzy 2 hours ago
          The practical problem is about transferring the control. Without it, you end up doing way too much defensive copying.
    • wavemode 2 hours ago
      It's not clear to me why I would need a "dynamically growable slice". That's not a slice at that point, it's an vector aka ArrayList.

      If all I need is a slice of memory, I'd rather only have to pass around the two pieces of data (pointer and length) than all three.

      • tialaramex 1 hour ago
        I think of this as a failed experiment. Like coal powered cars or when we thought maybe transatlantic travel by airship was the Right Thing. It may be hard to know what'll happen without trying.

        The growable array type (what you call "vector") is a venerable data type, although it does still have parameters you might reasonable tweak e.g. what's the correct growth factor? Doubling is popular but e.g. Zig chooses new_size = (old_size * 1.5) + 8 and you'll see disagreement about what API should be offered and why - e.g. C++ std::vector has the wrong reservation API

        But this thing clearly isn't a mistake, it's intentionally a hybrid, and if it had been a huge success maybe everybody else would scramble to copy it.

      • zellyn 1 hour ago
        Yes. "dynamically growable slice" : "vector" :: "tomato" : "to-mah-to"
        • tialaramex 1 hour ago
          The growable array type is an owning container, whereas a slice (in C++ and C# span) is a reference type, it doesn't actually own anything.

          This type isn't either of those things, it's a hybrid, and Go tries to get away with that because it's a garbage collected language so people don't need to care as much about ownership.

    • BoingBoomTschak 4 hours ago
      That's not the "classic slice mistake", that's slices being one of the stupidest things Go ever decided upon. In C++, that'd be the equivalent of merging std::vector<T> and std::span<T, std::dynamic_extent> into a single structure for "reasons" ("simplicity"?).

      https://build-your-own.org/blog/20241125_go_slice_surprise/ goes a bit further, but this is really the point where I decided I wouldn't be able to stand Go.

      • zellyn 3 hours ago
        In the period when Go was ready for production use and Rust wasn't, I couldn't wait for Rust to be ready, so that the response to everyone hating Go for not being a C++ equivalent could just be, "I think you want Rust."

        I think you want Rust.

  • eu 9 hours ago
    it’s a good read, but i think it should focus more on some of the common mistakes that people make when slicing a slice.
    • rednafi 8 hours ago
      Slicing a slice is full of gotchas. I tend to forget all the rules and avoid it whenever I can.
      • adonovan 4 hours ago
        A slice operation s[i:] seems like it should be little more than an ADD instruction for a registerized slice where i is known to be in bounds, but a surprising little detail is that when i==cap(s) we really don't want to create an empty slice whose data pointer points one past the end of the original array, as this could keep some arbitrary object live. So the compiler generates a special branch-free sequence (NEG;SUB;ASR;AND) to compute the correct pointer increment, ((i - cap) >> 63) & (8 * i).

        https://go.dev/play/p/J2U4djvMVoY

      • rendaw 7 hours ago
        Appending a slice is also full of gotchas. Sometimes it modifies the slice in place, sometimes it reallocates and copies.
        • pstuart 2 hours ago
          Only really a gotcha if you pass a slice into a function and expect to see modifications in that slice after the function completes. It's helpful to remember that Go passes by value, not reference.

          That can be addressed by passing the slice as a pointer: https://go.dev/play/p/h9Cg8qL9kNL

          • TheDong 1 hour ago
            > Only really a gotcha if you pass a slice into a function and expect to see modifications in that slice after the function completes. It's helpful to remember that Go passes by value, not reference.

            Slices are passed partly by value (the length), partly by reference (the data).

                func takeSlice(s []int) {
                  slices.Sort(s)
                }
            
            From your explanation, you would expect that to not mutate the slice passed in, but it does.

            This can have other quite confusing gotchas, like:

                func f(s []int) {
                  _ = append(s, 1)
                }
            
                func main() {
                    s := []int{1, 2, 3}
                    f(s[0:2])
                    fmt.Printf("%v\n", s)
                }
            
            I'm sure the output makes perfect intuitive sense https://go.dev/play/p/79gOzSStTp4
      • bborud 8 hours ago
        It is a suprisingly hard thing to implement well. I have no idea how many times I have implemented slice-like things in C (back in the 1990-2000s when I mostly wrote C) and it was one of those things I never managed to be happy with.
        • pjmlp 6 hours ago
          Something that even Mesa and Modula-2 already supported by 1980's, maybe one day C will eventually get proper slices.
        • neonsunset 8 hours ago
          It can be done but that requires a better, more expressive type system.
          • rednafi 7 hours ago
            An expressive type system also often means slower build times. I dislike working with Rust for this exact reason.

            While most people highlight the difficulty of picking up the syntax, I find Rust to be an incredibly tedious language overall. Zig has a less expressive type system, but it compiles much faster (though not as fast as Go). I like what Zig and Odin folks are doing over there.

            I like the balance Go strikes between developer productivity and power, though I dearly miss union types in Go.

            • thegeekpirate 6 hours ago
              You'd most likely be happy with Odin (https://odin-lang.org), which I find to be essentially a fixed Go with no GC.
            • dist1ll 6 hours ago
              It's possible to build languages that compile faster than Go, with a much more expressive type system.

              It's just that compile times and DevEx haven't been a priority for most projects.

            • pjmlp 6 hours ago
              As proven by other languages with similar type systems and faster compile times, Rust's case is a matter of tooling, not language features.
              • _flux 4 hours ago
                Not sure if that's really a proof, as it could be the exact combination of language features that makes up the slowness. For example traits, non-ordered definitions in compilation units and monomorphization probably don't help. GHC also isn't a speed demon for compilation.

                But sure, LLVM and interfacing with it is quite possibly a big contributor to it.

                • pjmlp 3 hours ago
                  Haskell isn't the only language around with complex type systems.

                  However, it is actually a good example regarding tooling, as the Haskell ecosystem has interpreters and REPL environments available, for quick development and prototyping, something that is yet to be common among Rustaceans.

                  • neonsunset 3 hours ago
                    Rust has Cranelift: https://cranelift.dev
                    • pjmlp 3 hours ago
                      Indeed, but its compile times aren't much better than LLVM, at least one year ago.

                      Ideally we would be having the F# REPL/JIT, plus Native AOT for deployment, as comparable development workflow experience.

                      Naturally F# was chosen as example, because that's your area. :)

                      Not being negative per se, I also would like to have something like Haskell GHCi, or OCaml bytecode compiler, as options on rustup, so naturally something like this might eventually come.

            • 4ad 7 hours ago
              An expressive type system absolutely, positively, unequivocally does not imply slower build times (especially with a Church-style type system). There are plenty of programming languages with advanced type systems which compile extremely quickly, even faster than Go, for example OCaml.

              Don't make the fallacy of conflating Rust's slow compile time with its "advanced" (not really, it's 80's tech) type system. Rust compilation is slow for unrelated reasons.

              • taurknaut 6 hours ago
                Old doesn't mean non-advanced. GraalVM is based on a paper (Futamura) from fifty years ago. Off the top of my head I can't think of many language features younger than the eighties—maybe green threading? That would be surprising but might fit. I suppose you could also say gradual typing. Haskell has many recent innovations, of course, but very few of those have seen much use elsewhere. Scala has its implicits, I guess, that's another one.

                Personally, I write java at my day job and the type system there makes me loooong for rust.

                • pjmlp 2 hours ago
                  No need for Rust, when JVM has Haskell, Scala, Kotlin, Clojure, Common Lisp.
                  • taurknaut 1 hour ago
                    I prefer rust to all of them, but I also come from a very systemsy background. Plus it has the benefit of being much easier to embed inside or compose around basically any runtime you'd like than managed code, which is why I chose rust rather than basically any managed language.

                    But, it's just a tool, and the tools I choose reflect the type of stuff I want to build. The JVM is extremely impressive in its own right. You're just not going to to find any one runtime or ecosystem that hits every niche. I'm happy to leave the language favoritism to the junior devs—for the vast majority of situations, what you're building dictates which language makes the most sense, not vice versa.

            • neonsunset 7 hours ago
              As a start, Go could separate container and slice types, the way C# did it with T[]/List<T>/other and Span<T>/Memory<T>. No lengthy build process required.
              • jerf 4 hours ago
                I'm not deeply familiar with those C# types, but I think maybe it already does. An array, which includes the size of itself in its type so that a four-element array is not the same type as an eight-element array, is already in Go. Go's language affordances make it easy to just have slices without the underlying arrays, since they're generally not useful on their own, but you can take arrays and slice into them if you like.
              • rednafi 7 hours ago
                Yeah, but the at the same time, I find C# code a sigil soup. Go makes a different tradeoff.

                I've been involved in a few successful large scale projects and never felt like the type system of Go is holding me back too much. Sure the error handling could be better, union type would make interface munging easier, but the current state after generics isn't too bad.

                • neonsunset 7 hours ago
                  > Sigil soup

                  Last time I checked, C# had clean and focused syntax for working with collection types. Could you provide an example?

    • fmbb 5 hours ago
      Can you link some article about these common mistakes? Sounds like good learnings can be had.
  • WolfCop 8 hours ago
    (2009)
    • rednafi 7 hours ago
      It's Go we're talking about. Other than 64-bit the dominant word size, nothing much has changed.
      • mseepgood 7 hours ago
        It's still not news.
        • Cthulhu_ 6 hours ago
          HN is not primarily a news site though, despite the name.
      • 4ad 7 hours ago
        The interface layout has changed since the article (although this specific article doesn't mention interfaces, a later article in the series does). Additionally Go now has generics.

        It's true that little has changed, but very little is changing in the data representation of any language, really. Even ones that are evolving rapidly.

        • Cthulhu_ 6 hours ago
          This is a truth that's easily overlooked; most languages are several levels beyond basic types to the point that people forget about the low level constructs involved. This is one reason why I like Go, it exposes and educates on fairly low-level mechanisms that are not unfamiliar to anyone who's studied computer science, but at the same time you don't have to worry too much about the lower level stuff like memory, pointers, zeroing, etc. I think it's a good tradeoff.