It's been quite a while since GitHub started gating code search behind a login. However, they recently started gating other types of search as well. The worst part is, you don't notice it immediately for two reasons. First, it doesn't happen on all repositories. I didn't experience this yet with NixOS/nixpkgs for example, possibly because the high volume of activity going on there prevented the switch. Second, the search results do show up right after you hit the search button. However, it shows up a login screen as soon as you start to navigate around the search results. To me, I can't help but feel like they're testing the waters by not making it immediately obvious that this is happening.
The inconvenience doesn't stop there. As you can notice from the screen recording, once you get shown the login page, GitHub will continue to show the login page even if you hit the back button. On top of that, the search query is sometimes not included in the URL, making search results difficult to share.
I get GitHub wanting to require logins for code search, since it takes up computing resources. However, there's something to be said about gating search for issues and pull requests of open source projects without the project maintainers being notified.
This has been the case since the new code search replaced the old one ~9 months ago. The new code search is more resource intensive so Github chose to only make it available to users. I agree that it sucks to not have code search available when logged out, but it's not a new change and I don't think it was done with malicious intent.
How does that work? You don't want to sign in to the site so now you will replace it with another site. Presumably you don't want to sign in to that other site as well, so what are you using the service for in the first place?
I was very unhappy when they did this. The new search btw is shittier than the old one and this was a classic case of breaking the rule of "If ain't broke, don't fix it".
I’ve been running into this too and it’s very annoying. Although I don’t think it’s particularly new either.
Just feels like an unnecessary step. Sure I have an account and can log in, but why? I just want to know which file has X function, so i can read the implementation. I don’t want to have to download the repo or sign in.
Most likely anti-scraping measure. So they can detect and shut down bots or really anyone they feel like if activity looks nonstandard. Not suggesting it’s good, but it’s consistent with the in vogue trend to lock down recipient public APIs nowadays.
As an illustration, near the end of last year, bots from a renowned Email API provider spotted in less than 1 hour the leak of a public key from my public GitHub code repo. My account got suspended on their platform. It was stunning to see the speed at which they acted and automated the process to "lose" and "recover" reputation.
they have to serve the source without being logged in, otherwise gpl projects would just move (and we know gpl projects are the opensource trend setters).
so they will always allow you yo download the source. and thats what i do all the time. git clone, grep, rm.
or if you are logged, do it anyway checking code out with ssh which is more expensive for them.
remember kids, after Microsoft bought it, a github account is a social network account.
There are a lot more AI projects hungry for data to train their models on. This puts content companies in an uncomfortable situation: trademark infringement claims, loss of intellectual property, and more.
That's true, but there's an interesting parallel with GitHub's corporate parent, Microsoft, and Microsoft's other platform company LinkedIn[1]. LinkedIn sued scrapers for retrieving data from the site.
LinkedIn isn't a content company either, nor do they really own any content posted there (they don't right?), but a large part of their business moat comes from the network of people posting content there. Scrapers and bots undermine this, something the AI boom facilitates.
There is a cost to serving up all that content, and if hundreds of AI start ups are all trying to pull in data, that can add up fast. It’s not typical user behavior.
If it’s just static content it wouldn’t really be that expensive. In reality egress traffic is extremely cheap compared to what Azure/AWS etc. are charging.
How often are you not signed in to GitHub? You’re presenting this as a practical pain, but I can’t remember the last time I wasn’t signed in to GitHub.
I don't have a work GitHub account and I keep personal stuff out of my work computer. The number of times I've wanted to search an open source codebase without cloning it yet is significantly nonzero.
I've also wanted to do code search on my phone, where I have no need to be logged in so I'm not.
Can they let their business customers know that? Companies making their employees create new accounts for their org is not uncommon. I’m not talking about Enterprise instances.
If I'm considering reporting a bug I came across in a free-software project while at work, that's already something I probably couldn't justify with a strict cost-benefit calculation for my employer.
In practice my employer would be happy for me to spend my time doing it anyway, because it's the right thing to do. But asking them to pay Github for the privilege is pushing it.
I don't want to login to github.com when I'm on my work computer. That's going to take me one step closer to uploading company internal stuff by accident.
It's especially bad because I don't really remember my passwords, so I always have to reset password when logging in again, and that refreshes the dev keys so my terminal git push also stops works - a complete PITA.
GitHub still knows all of your individual visits to GitHub, and which repos you viewed. Most of the time I don’t want to be browsing GitHub itself whilst logged in. I don’t like or trust Microsoft with my location or waking hours or browsing history.
For several months now I refuse to stay logged in to Github. I login on an incognito window when I need something, and log out when I'm done. Github javascript is disabled outside of incognito windows.
This is the same playbook I have followed with success to disconnect from Facebook, Linkedin, Google, Reddit and Twitter. (And Quora. And Medium..) When it's inconvenient it just makes me try to avoid these sites.
It's unfortunate but it's not as bad as Twitter/other forced logins for features. The difference is that making an account is free, you don't have to pay anything and there are no ads, whereas Twitter has incentive because it needs ad revenue and wants to look like it has more people on it
Github was rate-limiting me from doing just basic searches. I hit them, paging through results too fast, and it times me out. My account is a few years old, but admittedly, it lacks engagement à la social media (favorites, repos, forks, etc).
Anyway, I don't buy Github's excuse that the new search 'takes more CPU power'; this must be to prevent scraping data for LLMs. Have you hit the new search rate limits?
On another computer that I was not logged in, something else happened. It didn't say log in, but the search results were displayed and then one second later it removed them and displayed an error message (which did not explain the problem, but it did not ask you to log in, either). I was able to view the search results by pausing execution of scripts using the debugger, though.
I’ve always found GitHub to be quite generous with what functionality they give free users, especially visitors that don’t have an account. They kind of paved the way in that regard…
Some sites would probably put everything behind a login-wall. I can imagine some alternative version of git hosting with the following message:
This will greatly reduce Github ability to be an App Store. Reminder that GitHub isn't banned* or censored by Microsoft in China, so it's a good way to download VPNs and other banned apps.
*There is some frequent outages on an individual level, but if you try enough, it works.
EDIT : This doesn't affect search of repos. My bad.
They also completely removed the activity from people you follow from your homepage, claiming everything should be under "explore", while promoting their recommendations provided by an algorithm. They claimed it "used too many resources".
Everyone complained about this decision, fast-forward 6 months, Microsoft doesn't give a fuck.
I used the homepage to learn about new projects people I follow are working on or interested it, but now it's nearly impossible to find out.
At least for issues, their search got a lot worse recently - it doesn't do substring search in the same way (or maybe at all - I haven't checked in detail), so you can't search for non-initial words in compound-words. This makes searching through German issues an exercise in frustration... .
The old search was much better than the new search. The new search can never find exact strings in my repos, even when I have copy-pasted those strings from my repo to the search bar!
Here's a screen recording of that happening:
https://imgur.com/a/BT6uRIe
It's been quite a while since GitHub started gating code search behind a login. However, they recently started gating other types of search as well. The worst part is, you don't notice it immediately for two reasons. First, it doesn't happen on all repositories. I didn't experience this yet with NixOS/nixpkgs for example, possibly because the high volume of activity going on there prevented the switch. Second, the search results do show up right after you hit the search button. However, it shows up a login screen as soon as you start to navigate around the search results. To me, I can't help but feel like they're testing the waters by not making it immediately obvious that this is happening.
The inconvenience doesn't stop there. As you can notice from the screen recording, once you get shown the login page, GitHub will continue to show the login page even if you hit the back button. On top of that, the search query is sometimes not included in the URL, making search results difficult to share.
I get GitHub wanting to require logins for code search, since it takes up computing resources. However, there's something to be said about gating search for issues and pull requests of open source projects without the project maintainers being notified.
Or you can self host! Github's changes pushed me to self host recently.
https://voussoir.net/writing/git_dot_voussoir_dot_net
what other purpose could it hold other than to harvest your data for their own undisclosed purposes?
Maybe anti-bots or something, but there are other ways to do that. Besides, a bot might just make an account.
Wild!
Just feels like an unnecessary step. Sure I have an account and can log in, but why? I just want to know which file has X function, so i can read the implementation. I don’t want to have to download the repo or sign in.
If necessary, let's focus on the use case of searching a single repo.
they have to serve the source without being logged in, otherwise gpl projects would just move (and we know gpl projects are the opensource trend setters).
so they will always allow you yo download the source. and thats what i do all the time. git clone, grep, rm.
or if you are logged, do it anyway checking code out with ssh which is more expensive for them.
remember kids, after Microsoft bought it, a github account is a social network account.
There are a lot more AI projects hungry for data to train their models on. This puts content companies in an uncomfortable situation: trademark infringement claims, loss of intellectual property, and more.
LinkedIn isn't a content company either, nor do they really own any content posted there (they don't right?), but a large part of their business moat comes from the network of people posting content there. Scrapers and bots undermine this, something the AI boom facilitates.
1: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn
I've also wanted to do code search on my phone, where I have no need to be logged in so I'm not.
Telling employees to make their own personal account to do company work seems like a bad idea.
In practice my employer would be happy for me to spend my time doing it anyway, because it's the right thing to do. But asking them to pay Github for the privilege is pushing it.
The use of a password manager makes re-logging-in effortless.
My account got flagged.
This is the same playbook I have followed with success to disconnect from Facebook, Linkedin, Google, Reddit and Twitter. (And Quora. And Medium..) When it's inconvenient it just makes me try to avoid these sites.
Can't help but feel that's what we're moving towards.
Anyway, I don't buy Github's excuse that the new search 'takes more CPU power'; this must be to prevent scraping data for LLMs. Have you hit the new search rate limits?
whoever fucked this up thinking he's doing good for humanity deserves to be hit with 65,535 lightning bolts in exponentially increasing amperages.
Some sites would probably put everything behind a login-wall. I can imagine some alternative version of git hosting with the following message:
- It makes it simpler for millions of CIs to check out the code unauthenticated,
- It enables usecases such as “Here is public data as a JSON, it’s our list of IP addresses, just integrate that to your firewall”,
- NPM. NPM entirely.
- Isn’t it required for open-source? If they restricted it behind a login, could we still say we deliver the code to our customers?
*There is some frequent outages on an individual level, but if you try enough, it works.
EDIT : This doesn't affect search of repos. My bad.
Everyone complained about this decision, fast-forward 6 months, Microsoft doesn't give a fuck.
I used the homepage to learn about new projects people I follow are working on or interested it, but now it's nearly impossible to find out.
Enshittification is real.
I think they really improved the code/repository search, but finding issues got really worse.
And IMO finding issues easily should be a top priority for a platform like GitHub.