Hitting every branch on the way down

(rachelbythebay.com)

160 points | by zdw 17 days ago

13 comments

  • rob74 17 days ago
    So, someone at some point in some commit that we will never see because it got squashed with other commits thought it would be cooler to use absl::StrCat() instead of the "+" operator, and in the process of doing that, they went "what's this useless code using angled brackets instead of quotes?! It works with quotes too, let's delete it!". Or maybe that part was difficult to test, so they simply deleted it to increase test coverage? Guess we will never know, but still, open source is now a bit shittier because of it. Thanks, anonymous clueless developer!

        --  std::string left = "\"";
        --  std::string right = "\"";
        --  if (use_system_include) {
        --    left = "<";
        --    right = ">";
        --  }
        --  return left + name + right;
        ++  return absl::StrCat("\"", basename, "\"");
    • plq 17 days ago
      In the chromium repository, the use of angle brackets in #include statements is banned -- they only use double quotes. They also don't use any system headers per se since their flavor of clang comes with their flavor of libcpp vendored in.

      So if the chromium repo is representative of the state of C++ in rest of Google, they ditched it silently like this probably because it's so natural to them :)

      • RHSeeger 17 days ago
        So, instead of using

        > <> means system headers and "" means project headers (my naive understanding of the difference)

        they just convert everything to a project header? That seems bonkers to me. It is intentionally removing useful information.

        • plq 17 days ago
          First, here's the actual guide: https://google.github.io/styleguide/cppguide.html#Names_and_... . Of course they still have to use stuff like #include <windows.h> but it's very very limited.

          They also are not removing any info -- most of it IS project headers. To me that's the actual bonkers bit :)

          With GCC/Clang, headers found in paths passed with -isystem are headers that are immune to compiler arguments like -Werror because they are, by definition, out of your control. In Google's case, ALL code is already checked in the project repo, including language stdlib. So none of them are system headers "per se".

        • rileymat2 17 days ago
          The main difference is how the search for the file works. “” prefers local directory before the search path, <> goes to the search path first.
        • hoseja 17 days ago
          Hierarchy is problematic.
    • xorcist 17 days ago
      Why do you think it was squashed? This article is clear about this being a merge commit. This person got a merge conflict and decided that this was the best way to fix it. It probably worked on their machine. Perhaps not necessarily the optimal fix when you have thousands of users depending on this code, but what do I know?
      • Sharlin 17 days ago
        But according to the article, both of the parents contained the same old code. Could be that it was just collateral damage in a larger conflict, but still it seems that someone used the merge commit to make an off-topic, ad-hoc change that was not only superfluous but code-breaking.
      • rob74 17 days ago
        Ok, maybe I'm misunderstanding the line "There's no explanation or other context. Presumably that all got squashed out when it was exported from whatever they use internally." - I was thinking about git squash, but it could have also been some other step in the process of transferring the code from Google's internal systems to GitHub.
    • Applejinx 17 days ago
      It's such a spectacular Chesterton's Fence. Always so frustrating dealing with people who're like "from where I'm standing…" and then they go and make things better, meaning 'more abstract and/or fewer keystrokes'.
    • nebulous1 17 days ago
      Am I misunderstand or is this not on whoever abused the merge commit, whether they made the change personally or not.
      • nerdponx 17 days ago
        It's a shared responsibility.
  • saghm 17 days ago
    > I told it to install "protobuf" since I use that library in my build tool. That actually installed "protobuf-24.4,1" which is some insane version number I'd never seen before. All of my other systems are all running 3.x.x type versions.

    I was curious about this, so I took a look at the list of protobuf releases[0] and they're...confusing, to say the least. Chronologically, the most recent tags at the time I write this comment are:

    v5.27.0-rc1 v3.27.0-rc1 v27.0-rc1 v27-dev v26.1 v5.26.1 v3.26.1 v26.1 v26.0 v5.26.0 v3.26.0

    Does anyone here happen to know what's going on here? As best I can tell, they're simultaneously supporting 3 major versions while keeping their minor and patch versions in lockstep, and then having one major version be implicit?

    I can almost imagine a scenario where they started out with just major and minor version and then realized they wanted to make breaking changes, which led them down the path of adding a third separate number to the versions, but if they already were going down the path of assuming "wider" versions are newer , why not just stop using version numbers with only major-minor and instead just add a 0 or 1 to the front of all of the continuations of that branch? Also, why synchronize every single minor and patch version between all three major versions? I can understand why it might be useful to continue providing support for multiple major versions at the same time, but I'd expect that _sometimes_ there might be a bug or something in only one of them, and pushing out a release of the other two that don't contain any changes would be pretty strange.

    [0]: https://github.com/protocolbuffers/protobuf/tags

    • ori_b 17 days ago
      I think protobuf might just be a performance art piece, exploring the question of how much complexity it's possible to insert into a simple concept.
      • BobbyTables2 17 days ago
        Indeed. Having used it and examined the resulting binary formats it uses, I don’t get it. Google makes some nice things, but protobuf isn’t one of them.

        IIRC (5 yes ago), it was somewhat self describing with respect to struct offsets but had no (data format) versioning or type information. Never understood that design choice, even in mixed endian+architecture environments.

        Ends up being horribly inefficient for small/one-time instances, and not all that great for client/server use.

        We were looking at it to get away from C structs and handmade serialization — only to unexpectedly realize that our methods were still better. (We genuinely did not want this outcome!). JSON/yaml were not options for other reasons.

        Protobuf is the modern realization of “the king has no clothes”.

      • bitcharmer 16 days ago
        That depends. In low latency environments you will see some protobuf but not as much as you'd think. SBE is quite common with the odd exception of some bespoke serialization protocols with delta compression and such.
    • maeln 17 days ago
      It's because they changed the versioning format: https://github.com/protocolbuffers/protobuf/releases?page=5 / https://protobuf.dev/news/2022-05-06/

      But I suppose old version still receive bugfixes.

      • saghm 16 days ago
        I don't understand how either of those links clarify anything; the former shows that they had a version 3.y.z and a version 20.y, and the later shows a change from 3.20.x to 4.21.x, but none of it gives any explanation for why a new major version should "inherit" the old minor version instead of restarting from 0, or why they kept around `x.y` when updating to use `x.y.z` instead of just continuing the existing branch as `0.x+1.y` or `1.x+1.y`. If anything, the fact that they already seem to assume that new major versions should always inherit the minor version would make it _more_ consistent than having one "special" branch that has "narrower" versions.
      • BobbyTables2 17 days ago
        If only they had a language agnostic way of encoding version information…

        Perhaps someone wrote a library for that…

  • datascienced 17 days ago
    I wont claim to understand C and the reason why <> is better than “”. I assume it is.

    But the fact that a merge can have arbitrary changes in it always bothers me!

    This is a case for rebase over merge if there are conflicts.

    You could have a merge of 2 empty repo parents where the result is the complete source of the latest version of Kubernetes!

    • Groxx 17 days ago
      Yep. Stuff like this is part of why I'm a rebaser.

      Rebase is simple. Always. The end result is obvious and clear and can only be interpreted in one way.

      Merge has lots of little sharp edges and surprises if you don't know every single tiniest detail.

      Almost nobody knows it in that level of detail, so it's a terrible choice for interacting with anyone else. If you're on your own, sure, do whatever - many things are not built solo though.

      • noirscape 17 days ago
        My personal preference is merge but using the --no-ff flag. That way you get all the advantages of a rebase (since all your original commits are rebased into the target branch) but you also get a merge commit to confirm that all those changes were a part of the same set of patches.

        That can often help a lot to figure out why something ended up the way it did, but you also don't turn your entire history into a flat pile of disparate commits.

        • Groxx 17 days ago
          Yeah, I do kinda like this setup too. You can have both readable (rewritten) history and structured sub-commits for "a change" rather than a totally flat stack. Plus I don't care about your local history, but I do care about the final history.

          It's definitely how I prefer to review code (big changes broken up to isolated portions that are easier to validate, and the whole thing at once so you don't get lost in the trees), so it's how I would prefer to read it later too.

          It does still have merge commits where stuff can hide though :/ and you've got to remember --first-parent :/ and all not-merge-focused things so have problems with it :/

      • zilti 17 days ago
        "But it is littering the commit history with useless commits!" is what I always hear
        • skywal_l 17 days ago
          And the best answer is: "Why do you do useless commits?".

          With `git amend` and `git fixup` you can arrange your commits to be clean, properly documented and self explanatory (and maybe atomic but that's a little harder). It takes a little time but it is hugely beneficial to code reviews and bug investigation.

          • wdfx 17 days ago
            Some people however see using features like amend, squash, and force push as potentially destructive actions in the hands of a novice, which can lead to loss of not only the author's work but also other people's. Using merge almost never results in any sort of loss and is easier to work with for those who still don't quite understand the risks.
            • skywal_l 17 days ago
              You are associating the use of `amend` and `fixup` with force push. It's perfectly fine to rework your commit history locally and even force push to your own local branch. It should never be possible (except to people administrating your repo) to force-push to any public or even shared branch.

              Nobody should be able to force-push to master (or any public branch) except on specific occasion. In that case, someone is authorized, performs their specific action and then get de-authorized.

              This is pretty basic.

            • pjc50 17 days ago
              "Force push" is something that should be restricted to a very few senior people anyway; once you do that, you can't rewrite shared history any more and a lot of the worries go away.
            • Izkata 17 days ago
              Meanwhile the biggest issues I actually see in novices are when their IDE presents them with buttons that don't coincide with a single version control action (can you guess what "sync repo" will do?) and using those puts their checkout into a weird state.

              Usually when using the commands directly they're more careful, but have the mindset "an IDE wouldn't intentionally break something so this button must be safe to click".

              • Groxx 17 days ago
                Custom terms on top of git is a bafflingly bad decision.

                IDEs in particular should expose the details all the time, and show the command that's being run, so it can teach people passively instead of misleading them and leaving them stuck when it breaks.

          • roodrax 17 days ago
            totally agree here. commits are not for saving "your-current-work". Its about marking a definite step of change in the realm of the project itself.

            making commits atomic is harder because we tend to just write code, without first breaking up the requirement into atomic pieces

            • xorcist 17 days ago
              Commits are for saving your current work. Commit early, commit often. Just clean them up when you're done!

              Don't push half-baked work on other people! You waste their compute cycles needlessly, from now until the end of time.

              • CoastalCoder 17 days ago
                I sometimes wish git supported hierarchical commits.

                I.e., git can retain two representations of a sequence of commits: the original sequence, and also a larger commit that (a) produces the exact same code change as the sequence, and (b) has its own commit message.

                • torstenvl 17 days ago
                  Isn't that what a branch and a merge commit do?
                  • Izkata 17 days ago
                    Yep, as long as you use "--no-ff" to force a merge commit (and presumably edit the merge commit message instead of just using the default).

                    For viewing history you can use "git log --first-parent" to view only those merge commits, which should satisfy the people who love a linear history, without actually losing the intermediate commits.

                • xorcist 17 days ago
                  I have entertained similar thoughts, but then on the other hand people already, and with some righteousness, criticize git for being too complex. It also requires careful assessment where the wormhole ends, how many levels of grouped commits should exist.

                  Then I remember that I have enough trouble getting a few dozen people together to write well formed and understandable commit messages for one level of commit messages alone. This scheme would require people to extend more energy on constructing commits which is at best something very few care about.

                  Then there are tickets and other corresponding information, but they could rot for all I care, as they so often do, unless a decent commit log is in place.

                • twic 17 days ago
                  FWIW, Mercurial has this, and calls it changeset evolution:

                  https://www.mercurial-scm.org/doc/evolution/

        • pjc50 17 days ago
          Merge does that, yes, hence the preference for rebase flows.

          (I'm surprised this got a downvote when that's how we got here: a situation in which a change was ""hidden"" in a merge commit that would have been explicit in a rebase workflow)

    • btilly 17 days ago
      One idiot with rebase destroys history with no trace. I worked with such an idiot in a parallel team. I can't say how many weeks of work randomly got destroyed by said idiot.

      I hate rebase on shared code I don't care how clean jt looks. Don't mess with history.

      • doix 17 days ago
        I really don't understand how you can lose weeks of work. The person that would have done the force push would have the original commit in their reflog. ORIG_HEAD would be set.

        Everyone else that had a copy of the repo would have had a copy of the "lost" commits.

        I really cannot imagine how many things would have to go wrong for weeks of work to be lost.

        • chaorace 17 days ago
          There is a hierarchy to these things:

          - Person who destroys git history

          - Person who hates destroying git history

          - Person who knows how to recover "destroyed" history

          - Person who knows how to truly destroy git history

          • pvdoom 17 days ago
            > Person who knows how to truly destroy git history

            The Gitsatz Haderach

          • pcl 17 days ago
            > - Person who knows how to truly destroy git history

            … tell me more!

            • dale_glass 17 days ago
              A rebase can be undone, because the old commits keep hanging around in the repo. Rebase doesn't delete or rewrite anything, it just creates new commits and adjusts branch pointers, so the old stuff is still there just hard to get at because nothing points at it anymore.

              You just need to find an old commit ID somewhere, normally the reflog.

              The old stuff will go away on its own eventually due to git's self-maintenance procedures removing unreachable commits, or it can be done forcefully by adjusting the gc parameters to get rid of it.

            • doix 17 days ago
              I cannot imagine how one could _truly_ destroy git history. You could destroy it locally, sure no problem. You _might_ be able to destroy it on your remote, but if you're using something like Github/Gitlab/Bitbucket I'm sure they'll have a cache that isn't trivial to remove from. But even if you remove it locally and from remote, there's no way you're removing it from other peoples clones. And other people could have pushed to other remotes.

              Stuff "leaks" so much in git, that it's really hard to lose work. The only way I could see someone losing work is if they never commit or if they never push. But even if you don't push and just rebase, you're not losing work. You would have to go out of your way to delete git history locally.

        • btilly 17 days ago
          What went wrong was multiple teams on unsynchronized 2 week schedules, and a culture that said that we had to accept the force push when other teams released.

          So we released at the end of our cycle. It gets used for, if a vague memory serves, end of months billing. Meanwhile someone on another team "merged" our code, and actually randomly dropped a big chunk of our work. A week later they release. Some (but not all) of our features disappear. We pull that and don't notice because we're on a new sprint. Wait until the end of the month, users try to do billing. "Hey, why did you take away those features you built for us a month ago?" "What, we never...?"

          We had no clue what happened.

          This got to repeat a couple of times before we figured out what must be happening. We made changes to the release process so we could track what was actually released each time, with its history. We tracked down who we thought was making the mistake, but didn't have enough evidence to prove it to his manager. That didn't stop the idiot from making the mistake, but it did streamline the process of recovering it. Meaning we had the version with our feature, we had current code, and "just" had to sort out conflicts rather than rewrite from scratch.

          Now that you've heard the story, can you see how weeks of work could be lost before we figured it out? And can you understand how we could have lost history?

          This was a decade ago. At my next job we had more competent people. But there we had a huge debates between rebase and merge people. There are arguments on both sides. My conclusion was that about 90% of the time, rebase makes things simpler and easier. But that remaining 10% of the time makes the 90% not worth it.

          Just learn how to merge properly.

          • doix 17 days ago
            I'm pretty sure your code was still around at that point, by default git keeps stuff around for 90 days. Although to be fair I don't know if that's the case today nor if it was the case a decade ago.

            What you're describing does sound awful, but I'm pretty sure that idiot could have found a way to mess up a merge. The entire workflow sounds completely fucked, I'm not convinced it's entirely fair to blame rebase in that case.

            > Just learn how to merge properly.

            I know how to merge and rebase properly. My favorite PR merge strategy is rebase + merge --no-ff. So your master branch is nice and linear, but you can still see where your PR merges came in. Let's you have a "all PRs get squashed" view of the world by just adding '--first-parent' to your git commands, but also lets you have the inner details for when you're git bisecting or spelunking trying to figure out why a certain line exists.

            Most people hate what I describe though, similar to mixing spaces and tabs.

            • btilly 17 days ago
              My code may have been around somewhere. I suspect I'd done gc, in which case it wasn't. But my git skills then were certainly not as good as they are now. (I'd only recently switched from svn at that point.)

              I agree that the workflow was a mess in multiple ways. A lot of which were organizational decisions that I was in no position to influence.

              Your favorite PR strategy is fine if you're doing it locally. However when it is done on master, you're going to have to get master again by force. Because changed history creates conflicts. Which means that you're going to have to hope that everyone only did it your way, and no idiot created conflicts in some other stupid way that you'll suffer for later.

              I'd prefer to merge to head early. Merge to head often. Merge from head often. Don't have long-running shared branches. This does take some other forms of discipline though.

              • doix 17 days ago
                I've never worked anywhere on master directly. Always in feature branches that then get merged to master (ideally with my strategy). So basically master always moves forward and it's history is never rewritten.

                Master is always locked down anyway by "something" - no idea what the technical term for Github/Gitlab/Bitbucket is. Stopping people from force pushing to master prevents the sort of stuff that happened to you. Even if you don't have any "idiots", you really don't want a poor intern accidentally slightly pissing off everyone.

                > I'd prefer to merge to head early. Merge to head often. Merge from head often. Don't have long-running shared branches. This does take some other forms of discipline though.

                I agree with everything there, except I rebase instead of merge. So when I merge my branch to master, it's a nice neat little package that sits on top of master. It doesn't have the history of 10 merges I did while I was developing because I don't see the value in those merges.

                But hey, to each their own. When I was younger, I used to get into heated debates about why I was right, now I don't really care. I'm either in a branch of my own and can do whatever I want, or working with someone and then I'll just copy whatever they do to not confuse them.

                • btilly 17 days ago
                  Unless the strategy is really bad, I'd prefer to go along with what everyone else does. When multiple people push their preferred optimum, the resulting inconsistency is clearly worse than a single suboptimal, but consistent, approach.
      • arghwhat 17 days ago
        One never rebases shared code. They rebase their own work branch. Messing with history of master/main/integration branches should be blocked.

        Rebase is a necessary part of a workflow even if you like merge commits. You're severely missing out if interactive rebases are not part of your toolbox.

      • Feathercrown 17 days ago
        I wouldn't consider rebasing your own local commits on top of a more recent remote master to be messing with history in any meaningful way, and that's the most useful method of rebasing.
        • quectophoton 17 days ago
          I can give an example scenario.

          Assuming "H" is the hash of the current state of the repository content, consider this initial state of the repository (most recent first):

              H(3) Implement feature B
              H(2) Implement feature A
              H(1) Initial commit
          
          Now you implement "shiny feature", so your history in your branch looks like this:

              H(5) Shiny feature, improvements.
              H(4) Shiny feature, initial implementation.
              H(3) Implement feature B
              H(2) Implement feature A
              H(1) Initial commit
          
          You tested H(4) and H(5), and everything looks good.

          Then you `git pull --rebase`, and your history looks like this:

              H(10) Shiny feature, improvements.
              H(9) Shiny feature, initial implementation.
              H(8) Pulled commit C
              H(7) Pulled commit B
              H(6) Pulled commit A
              H(3) Implement feature B
              H(2) Implement feature A
              H(1) Initial commit
          
          You test H(10) because it's the current state of your repo, looks good, and merge (or create PR, whatever).

          With the usual pull request flows, `H(9)` (i.e. anything between your new "base" and your most recent commit) usually stays untested, entirely ignored by the developers, and you would only ever find out if you ever need to bisect.

          Not usually a problem, unless you have a rule of "every commit should be verified/tested" and the untested commits have a change that doesn't prevent a build but still causes issues (e.g. something that's only visual, or a new config file was added to a "conf.d" directory and its presence changed some behavior, stuff like that).

          • citrin_ru 17 days ago
            To avoid this you can squash H(9) and H(10) before pushing to a shared branch, this way only one tested commit will be added on top of existing commits.
        • lmm 17 days ago
          Rebasing unpushed commits is ok. But I have yet to see a workflow that provides good enough guardrails to make it something you can do safely.
          • saagarjha 17 days ago
            Protect your main branch?
            • lmm 17 days ago
              One of the great advantages of git is being able to pull from other people's feature branches, not just master. So protecting just master isn't good enough.
              • saagarjha 17 days ago
                Yeah so you have them go through the workflow that doesn’t ruin things, like pull requests?
                • lmm 17 days ago
                  I don't want to have to go back and forth with someone to pull their branch. I want to just be able to pull anything they've pushed.
                  • iainmerrick 17 days ago
                    Isn't "protect your main branch" still the answer to this?

                    Your two feature branches would be unprotected so you can merge away if you like. When one of you wants to commit something to master, that's when you'd check for dodgy merges.

                    Also, "git cherry-pick" is a good alternative to merging for this use case.

                    • otherjason 17 days ago
                      Protecting the main branch is definitely a good practice, but the other potential hazard is:

                      - Having a developer on your team that rebases their own feature branch

                      - Then tries to "git push", only for it to be rejected since a force push is required

                      - Then performs a "git push --force", which will force-push all of their local branches, including feature branches from other developers that they may have checked out previously

                      Our team uses merges because they are safe from this kind of problem, although a rebase workflow would have cleaner history. I wish that "git push --force" would not push all branches by default, and just fail unless a (remote, branch) pair or --all is given.

                      • danaris 17 days ago
                        > - Then performs a "git push --force", which will force-push all of their local branches, including feature branches from other developers that they may have checked out previously

                        This is (part of) why, for most common operations, I use a Git GUI (SourceTree). Force pushing all branches can only be done by very explicitly selecting them all and initiating a force push; the default when pushing is to push only the currently active branch.

                        It's also overall much clearer and more intuitive to use than the Git CLI. I use it when I have to—there are things that I can't do through SourceTree, and a few things that are complicated enough that I just want to be 100% sure I know exactly what's happening—but for 99% of the Git operations I do, it handles them perfectly and without any worry that I've mistyped something or forgotten to specify a branch.

                    • lmm 17 days ago
                      > Isn't "protect your main branch" still the answer to this?

                      No, the feature branches need to be protected or something, to enforce that they only rebase locally and don't rebase the parts that I've merged into my branch (and vice versa).

                      > Also, "git cherry-pick" is a good alternative to merging for this use case.

                      No it isn't, it means you get multiple unrelated commits for the same change, which causes conflicts and can be disastrous if a commit is deliberately reverted.

                  • NewJazz 17 days ago
                    I'll often just do a

                       git reset --hard origin/branch-name
                    • lmm 17 days ago
                      Right but that doesn't help if you've done your own work on top of their changes.
                      • iainmerrick 17 days ago
                        I think rebase is generally the correct approach here. If you've done your own work on top of their old changes, rebase your work on top of their new changes.
                        • lmm 17 days ago
                          That's possible but it requires a bunch of manual tracking and results in wasted/duplicate effort with people resolving the same conflicts multiple times.
                      • cnity 17 days ago
                        Just use:

                        > git pull --rebase

                        • pjc50 17 days ago

                              git config --global pull.rebase=true
                        • lmm 17 days ago
                          Right but assuming I have a branch that's diverged from theirs I have to do a fiddly git rebase --onto and likely resolve the same conflicts again.
                          • cnity 17 days ago
                            This to me is a sign that some commits should be squashed, because it implies the same lines have changed multiple times in the commits that are ahead of the remote branch. It's worth doing the rebase interactively and squashing them up.
                          • xorcist 17 days ago
                            If you find yourself fixing the same rebase conflicts over and over again, because you for some reason need to work on conflicting changes simultaneously (which is of course best avoided for other reasons), use "git rerere".
                            • lmm 17 days ago
                              I don't trust rerere, it can do major damage in cases like where a commit was reverted. And you still get multiple people solving the same conflicts, so even if each person only resolves each conflict once it's still wasteful compared to a merge workflow.
                      • phanimahesh 17 days ago
                        git pull --rebase --autostash
          • from-nibly 17 days ago
            --force-with-lease

            And only on working branches. I do this every single day.

            • lmm 17 days ago
              Not good enough, that can mean you rebase changes that someone else has based further work on (but hasn't pushed it yet, or has pushed it to a different branch).
              • from-nibly 15 days ago
                Why are you having people base their work off your in progress work? Git is not the issue with what you are describing.
                • lmm 15 days ago
                  > Why are you having people base their work off your in progress work?

                  To collaborate more closely and reduce (or get ahead of) conflicts. The whole point of using git at all is to be able to base your work off other people's in-progress work; if you're not interested in doing that then Subversion works better.

      • globular-toast 17 days ago
        Rebase cannot destroy "weeks of work". No git command can delete commits. Unless you have some insane garbage collection policy that is very far from any defaults. This is your fault for not understanding your tools.
      • ansible 17 days ago
        Didn't you guys have filesystem backups of a shared git repository?

        This is exactly what backups are for.

    • nvy 17 days ago
      I believe that the semantics of < > vs "" is actually compiler-dependent but on every compiler that matters, #including with angle brackets is the semantic for "the system header" whereas using quotes gives preference to files in your local source tree.

      So for example if you #include <foo> then the compiler (actually the preprocessor, but whatever) looks in the system's standard location, whereas if you #include "foo" then it looks in the local tree.

      • ephimetheus 17 days ago
        I think that’s just the ordering though. “” will also end up searching the system paths, it will just check the local paths first.
      • ripe 17 days ago
        You are right; a good explanation of the rules is in the C FAQ [1], which points to a newsgroup posting by Kaz Kylehu [2].

        I am posting the summary here, although please do read the original if you have time:

        The most portable thing to do is to use "" for including files within your project, and to use <> only for implementation supplied files.

        (Disclosure: I was one of the contributors to the C FAQ).

        [1] https://c-faq.com/cpp/inclkinds.html

        [2] https://c-faq.com/cpp/inclk.kaz.html

    • prerok 17 days ago
      What I find strange is that <> traditionally included system header files and "" included local files. They used different include paths, so you could have a header file in your sources with the same name as the system header file and then could control whether you are including one or the other based on using <> or "".

      Anyway, I thought the distinction was lost in later compilers in favor of a single include path and then just taking the first file found when looking at potential matches through include path.

      It seems the author of that merge thought the same thing. So, the distinction is actually still used by compilers?

      • pavon 17 days ago
        With both GCC and Visual C++, the “” form first searches local paths and then system paths, while the <> form only searches system paths. Guess some BSDs are stricter about local paths.
    • Gibbon1 17 days ago
      > But the fact that a merge can have arbitrary changes in it always bothers me!

      After that xy thing where they were trying to install a back door having changes that are hidden like this is a big red flag.

      In fact changing include <something.h> to include "something.h" with a hidden commit like this isn't a red flag it's a big rotating alarm with a siren. Someones trying set things up to include malicious code via a faked system lib.

      • saagarjha 17 days ago
        Sadly, not all of us can live in the tech equivalent of Bond films. There is only so many xz backdoors to go around.
        • hoseja 17 days ago
          There could be thousands of similar manchurian developers right now and it wouldn't even be a significant effort.
          • saagarjha 17 days ago
            Until they're activated how are you going to know?
    • arghwhat 17 days ago
      All but the first commit has parents. All commits point to the state of the file tree at that point.

      A "merge commit" is nothing more than a commit claiming any number of parents greater than one. It is still its own file tree reference that decides how the tree looks, and nothing dictates that it should be related to the parents.

      • tsimionescu 17 days ago
        If that were true, then git log -p would have worked. The reality is that merge commits are treated differently from other commits by many parts of git. Saying that they are "just a commit with multiple parents" gives people the wrong impression.

        Git is more than the data structure backing it. And many parts of git make all sorts of assumptions that treat things that are more or less identical in the data model as actually being different. Tags are not the same thing as branches, for example, even though they are stored in virtually the same way.

        • arghwhat 17 days ago
          Well, yes - git log does have special handling of commits with multiple parents because everything it shows is a special cased lie. Why? Because commits do not contain diffs or patches, but are instead full snapshots of the repository as a whole at a point in time.

          git log -p is a convenience tool that tries to show code progression, and so it comes up with these simple diffs. Showing a graph of N-way conflict resolutions would not help the user trying to get an overview. Other tools exist to track the origin of a particular change, such as git blame.

          It is important to understand what a git commit actually is exactly because of the caveats of such convenience interpretations. Otherwise you'd have no idea where a conflict resolution is stored, and you'll run into the surprises mentioned here.

          In my opinion, git also becomes a lot easier to work with once you understand the low-level details, as you realize which high-level tools are similar, compatible, or fit or unfit for a a specific purpose.

          • tsimionescu 17 days ago
            What git log shows is not "a lie", it is a part of its data model. Git is all of its commands, not just the low level details. Commits are both snapshots of the entire repo, and diffs, and delta compressions - none of these is "a lie".
            • arghwhat 16 days ago
              > Commits are both snapshots of the entire repo, and diffs, and delta compressions - none of these is "a lie".

              Commits are never diffs. Commits are snapshots, and sometimes git computes a diff between two commits. Commits are also never delta compressions, but can be stored within a delta-compressed packfile.

              Whether you like it or not, git is primarily its low level details. The porcelain stacked on top changes, and differs depending on the user's client (e.g., a GUI using libgit2). However, that "git log -p" is "part of git" that git log -p is not trying to convince you that commits are diffs and show you a true chronicle. It instead assumes that you know what commits are, and that you are asking for an easy to read overview of what has been going on.

              Accepting that commits are always solely snapshots will make the issues you run into when working with the porcelain easier to understand, especially when exposed to more than one client.

              (Knowing about packfiles and delta compression can also be useful when looking into performance/resource utilization.)

              • arghwhat 16 days ago
                *However, that "git log -p" is "part of git" does not change that git log -p is...
      • kccqzy 17 days ago
        You are right that conceptually this is okay. But it is a UI problem that commands the author tried didn't manage to show the difference between the merge commit against any of its parents.
      • robin_reala 17 days ago
        Technically you can have multiple first-commits in a Git repository. For example, Linux had 4 initial commits in 2017: https://www.destroyallsoftware.com/blog/2017/the-biggest-and...
        • arghwhat 17 days ago
          Indeed, through commits with multiple parents (merges), you can end up having multiple orphan commits (initial commits).

          Multiple initial commits are a bit rarer, usually stemming from merging in entirely different git repos with their own separate history as part of consolidation.

    • devjab 17 days ago
      Wouldn’t you have the same amount of merge conflicts with rebase? Especially if you don’t do it often, which you frankly also should with merge?

      I have to admit that I never really understood the advantages of rebase, and what I mean by this is they I actually don’t understand how the dangers of rebase out-weighs any form of advantages. Especially because on of the major advantages of merge is that you can squash your local commit history when you submit it to your main branch.

      What we do is that we tie every pull request to a relatively small feature task, and because we do this, we genuinely don’t care about the individual commits developers do. Which means they can commit really silly messages if they are heading to a meeting or if they are just tired at the end of the day. It also helps with them merging main into their branch often, because it doesn’t taint the history.

      The biggest advantage we’ve seen, that maybe we didn’t expect, is that nobody ever fucks up our tree in a way that needs someone who actually understands git to solve. We’ve also locked down the use of force push so that is not available to anyone unless it’s absolutely needed. Part of the reason I set this up initially was to protect myself from me, but it’s been a good thing since.

      But I’m actually curious if it’s wrong.

      • doix 17 days ago
        > Especially because on of the major advantages of merge is that you can squash your local commit history when you submit it to your main branch.

        Squashing is in no-way limited to merging and is actually done by doing an interactive rebase. Nothing is stopping you from squashing without creating a merge commit. It's entirely separate.

        If you're squashing everything anyway, what does merging even give you? Is your main branch just:

        * merge B

        * squashed commit B

        * merge A

        * squashed commit A

        If you didn't merge, you'd have:

        * squashed commit B

        * squashed commit A

        > What we do is that we tie every pull request to a relatively small feature task, and because we do this, we genuinely don’t care about the individual commits developers do.

        Except eventually there is a large feature task and then you end up with a giant commit that is annoying when git-bisecting.

        But at the end of the day, these things only matter if you are spelunking through git history and/or using things like git bisect. If your git history is "write-only & rollback", then none of this stuff matters.

      • xorcist 17 days ago
        > advantages of merge is that you can squash your local commit history

        No, it's the other way around. Squashing is a type of rebase.

        Most workflows involve both. Merges can also be fast-forward merges, which are indistinguishable from rebases. Choosing between a rebase and a merge operation is often the wrong question to ask. The question is what state you wish the repository to end up in.

        > I’m actually curious if it’s wrong

        Look at "git log". It is readable and easy to understand? It is obvious why each commit was made, and why alternative solutions were turned down?

        Are you able to use "git bisect" to track down problems?

        Then you're doing it right. If not, think about what a functional commit log would look like and how you would get there. Working together is culture, and what type of merges you decide to use is just a tiny part of that culture.

    • paulddraper 17 days ago
      But a rebased commit can also have arbitrary changes!

      ---

      P.S. Any commit can have any change. Or no change.

      A "commit" is a version...a message, a tree, some metadata, and 0 or more parents. In fact it's not even a change/diff/patchset per se. Though will often compare it against its assigned parents. If it has multiple parents, you'd have to choose which to compare. If it has zero parents, you can't compare against any parents.

      • ptsneves 17 days ago
        Yes, except git log will show all the commits that got into the branch, while with merge you need git log -m otherwise there are invisible commits(and diffs) in a pretty common workflow. I don’t know why this is the default behaviour.

        Git log only shows one tree not parallel trees from the merge.

        • paulddraper 17 days ago
          ? Not sure what you mean.

          git log will show all ancestors.

          And git diff shows any difference between two refs.

          Nothing invisible unless you deliberately make it so.

          • PhilipRoman 17 days ago
            Git log (and many other tools as well) pretend that merge commits do not introduce changes. I learned about it in the hard way when someone managed to implement an entirely new feature, contained within a hidden merge commit.

            It's only partially the fault of Git - the entire idea of a merge requires new concepts like 3-way diff, which are not needed for rebased commits. I'm not even sure that most software like GitHub can display such a diff.

          • tsimionescu 17 days ago
            The blog post explains it pretty clearly: git log -p doesn't show the diff for those merge commits like it does for a normal commit.
    • red_admiral 17 days ago
      The <> version searches the library path (usually /usr/include/* but can be modified with flags) whereas the "" searches the current working directory.
    • thaumasiotes 17 days ago
      > I wont claim to understand C and the reason why <> is better than “”. I assume it is.

      That one's obvious, you can type <> and you can't type “”.

  • bananskalhalk 17 days ago

        $ git show  d85c9944c55fb38f4eae149979a0f680ea125ecb  | wc -l
        11067
        $
    
    From `man git-log`: "Note that unless one of --diff-merges variants (including short -m, -c, and --cc options) is explicitly given, merge commits will not show a diff, even if a diff format like --patch is selected, nor will they match search options like -S. The exception is when --first-parent is in use, in which case first-parent is the default format."

    Presumably the author would have been happier using the -m-flag in addition to -p.

    • thriftwy 17 days ago
      And that's one of the reasons why I advocate against merges in the codebase on every project I work for and in every HN thread where the topic of merges is mentioned.
      • baud147258 17 days ago
        That might reveal the depths of my ignorance of Git, but how do you manage moving changes from one branch to the other if you don't use merge? Edit: continuing to read the discussion, it seems it's rebase? I have some reading to do
        • shoo 17 days ago
          rebase is git's swiss army chainsaw.

          i use rebase frequently, but i never remember which direction the operation goes in. do you need to checkout the source branch or target branch? truly, it is unknowable. my workflow is to type `man git rebase` and hit space to page through the manual until the first ascii tree surgery diagram appears. then i stare at it until i remember that i need to have checked out my feature branch and am meant to type `git rebase main`. i have trained myself to read the man page every time, perform the operation correctly, then immediately forget.

          https://git-scm.com/docs/git-rebase

          • thriftwy 17 days ago
            That's why I always use git cherry-pick for specific commits that I want.

            It's essentially a "cp thing-i-want ."

            Combined with git reflog your repo becomes as understandable as a floppy disk.

          • pjc50 17 days ago
            I use "branch.autosetuprebase=local" and/or --set-upstream-to ; then I can just type "git rebase" while on the thing I want rebased and not have to think about it. (Useful in the gerrit workflow which kind of forces you to have stacks of rebased changes)
        • iainmerrick 17 days ago
          When people talk about avoiding merges, I think they mean this: https://trunkbaseddevelopment.com

          The approach described on that site doesn't strictly rule out "git merge", but it emphasises short-lived branches and unidirectional commit flow. If you do things that way you find you just don't really need merges. The next step is to think "merges are rarely useful and sometimes dangerous, so let's just avoid them completely".

          • RHSeeger 17 days ago
            > Streaming small commits straight into the trunk

            Gotta say, I find that horrifying. What about peer reviews? What about breaking up a change into smaller commits, none of which make sense until they're all together (changing the signature of a method, then changing the places that call that method, etc)?

            It's worth noting that is mentions that work on a branch and the use of PRs are acceptable, but... the two statements appear to contradict each other. Why say "only do x" and "do thing that isn't x" on the same page?

            • msteffen 17 days ago
              I can’t find the quoted text in the parent post or article, but as someone who’s fought the short-lived branches crusade in the past (in favor of it) I’ll do my best to answer the criticism you raise :-)

              > What about breaking up a change into smaller commits, none of which make sense until they're all together (changing the signature of a method, then changing the places that call that method, etc)?

              The general-form solution to this is to make a three-part change: 1. add the new code, 2. migrate all the callers, 3. delete the old code in three separate PRs (or more, if migrating the callers takes several PRs, or some of the old code can be deleted earlier than the rest). I believe arbitrarily large changes can be made this way, and as your origination grows, eventually all large changes _have_ to be made this way.

              Isn’t that a lot of extra work? IME it’s a lot less work (and risk!) than resolving a massive merge conflict.

              The problems with long-lived branches all derive from the basic problem that eventually the complexity of maintaining two parallel implementations affects the work of everyone at the company.

              New person joins before `Big_Refactoring` is merged? You’re either onboarding them twice into two branches, or they’re getting nothing done while they wait for the merge so they don’t have to learn a bunch of code that’s going away soon anyway.

              Someone else wants to make a significant change? They either carefully patch each PR into both branches, or they decide “screw it” and do all their work in `Big_Refactoring` anyway, diverging the branches _even more_ and creating more risk in the ultimate merge and more of an incentive for others to start developing in `Big_Refactoring` and make the problem even worse. Soon the feature branch is a de facto main with failing tests while there’s incredible pressure to just ram the merge through so all these changes can go out.

              The only way to make it work is to demand that only one person develop in `Big_Refactoring` and everybody else manually cherry-pick their changes into both branches (which quickly just means implementing them twice). IME everyone finds this so annoying that small branches, feature flags, and three-part changes (which makes code sharing between the old and new implementations much easier) become broadly preferred anyway.

              As far as PRs, I can’t speak to the linked article, but everywhere I’ve worked that implemented short-lived branches still did PR review. But the PRs had to be small and quick to review (which IME actually helps catch bugs too)

              • RHSeeger 16 days ago
                > Isn’t that a lot of extra work? IME it’s a lot less work (and risk!) than resolving a massive merge conflict. > > The problems with long-lived branches all derive from the basic problem that eventually the complexity of maintaining two parallel implementations affects the work of everyone at the company.

                This seems to imply that there's a choice between A) committing to the main branch, and B) long lived branches. I always work on branches, almost always with multiple commits, and then merge it into the main development branch once the feature is complete. I almost never have to deal with complicated merge conflicts.

                That being said, you're talking about short lived branches. The article talked about committing directly in the main trunk/master branch; which is what horrified me.

        • BHSPitMonkey 17 days ago
          Or use squash merges.
  • throwawayffffas 17 days ago
    > 7764c864b and 0264866ce, right? I should be able to sync to those with git checkout and see which one dropped it, yeah? Well, I'll spare you the effort and just say that BOTH OF THEM have the old code in it.

    When you make a merge commit, the merge commit contains all the changes. It's what happens when you fix a merge conflict. The fix for the conflict only exists in the merge commit. Similarly you can just add whatever you want in the merge, and it won't appear in any other commit.

    • throwawayffffas 17 days ago
      > That actually installed "protobuf-24.4,1" which is some insane version number I'd never seen before. All of my other systems are all running 3.x.x type versions.

      They obviously changed their version numbering convention at some point, and this is protobuf 2.4.something.

      Honestly, I see the shit she tripped on all the time. It doesn't even register anymore.

  • pjc50 17 days ago
    Reminds me of https://github.com/protocolbuffers/protobuf/issues/1491 , which has effectively been WONTFIX (why does github not have this useful distinction?) because Google are happy with how it works and it's really difficult to make this particular thing work with the (also broken) Python module import system.
    • squigz 17 days ago
      What distinction? Issues can have custom labels, and there is a default `wontfix` label
  • TheRealDunkirk 17 days ago
    Seems like precisely the sort of thing ESR's new de-autotools tool is designed to eliminate. https://gitlab.com/esr/autodafe
  • jeffrallen 17 days ago
    This reminds me of a comment my new boss made, "you like learning on hard mode". He meant that instead of following doc to learn, I want to go find out how it works from first principles and then follow the docs, maybe improving them, based on what I saw from "beneath them" looking up.
    • iainmerrick 17 days ago
      I like to think of that as "actually learning"
    • BirAdam 17 days ago
      If more people did this (across many different industries) life would likely be substantially better. However, humans always optimize for the wrong things.
    • xorcist 17 days ago
      That's just regular "learning". It's just that for some people it's a bit out of fashion.
  • microtherion 17 days ago
    Rather than the sed post-processing, the author could also have used -iquote for the place where protobuf is installed, which makes it findable by quoted includes.
    • lloydatkinson 17 days ago
      Is this a common solution or documented in obvious places?

      It wasn’t until I just read her article that I’d even considered some systems/distros doing weird things like rewriting C include syntax for questionable reasons.

      What a terrible thing to deal with, simply frustrating.

      • microtherion 16 days ago
        It is documented in the gcc and clang documentation. It's also described in the gcc manpage, but not the clang one.

        I don't know how "common" it is, but when dealing with third party code, I run across <> / "" confusion quite commonly, so when that's a significant part of your job, you'll probably stumble upon this flag eventually.

  • zokier 17 days ago
    I don't want to victim blame too much, but this line stood out to me

    > There's no "body" to this commit. It's just a "Merge:" and two other commits

    Commits are snapshots of repository state, and merges "obviously" have differences from its parents. So not having "body" for a commit is bit nonsensical in git (yes, technically you can make empty commits but that's a special case). These sort of things are where having good mental model of git is useful.

    As I have my share of hairy merges, it is pretty intuitive that merge commits can, and in many cases need to, have changes that are not part of either parent.

    Maybe something like pijul (/darcs) would handle things differently here, but I believe that merges are fundamentally difficult problem.

    • planede 17 days ago
      Merges of course should have changes, but IMO they shouldn't have changes that are not resolving conflicts (either actual conflicts marked by git, or conflicts that manifest in failing tests, etc...). An entirely unrelated change stuffed into a merge commit is inappropriate.
    • tsimionescu 17 days ago
      > So not having "body" for a commit is bit nonsensical in git (yes, technically you can make empty commits but that's a special case). These sort of things are where having good mental model of git is useful.

      This isn't a problem with the author or her mental model, it is a problem with `git log -p`. The output she is describing is exactly how merge commits show up there, with no other flags.

    • thriftwy 17 days ago
      That's why reasonable projects limit their usage to the smallest reasonable scope, and use rebase/squash. You don't have to adopt a hard problem.

      It actually escapes me why Linus decided to make git a merge-first VCS in the first place. There aren't many projects which are more linear in nature than Linux kernel.

    • nebulous1 17 days ago
      Merging a branch is one of the "special cases" where an empty merge commit can be used. Git will use fast forward by default but if you want to preserve the history as a separate branch you can use an empty merge commit instead.
  • kidintech 17 days ago
    Piper and Protobufs lmao. Based on personal experience, the two services are best used as litmus tests of a person’s character — the most insufferable googlers I've met are fans of both. Any other usage of either generally results in undescribable frustration.