Having a limit of 5 million files is perfectly reasonable. Failing to document that such a limit exists and refusing to publicly confirm it (which apparently is STILL the case) is extraordinarily poor customer service/communication.
Google KEEPS setting new records for poor customer communication, to the point where I (and much of the HN crowd) now expect it. Android developer banned from the app store? There is no meaningful way to appeal but you'll probably never be able to find out why. Your best hope is to post on HN and hope someone with power at Google notices.
Leadership at Google ought to recognize this; they ought to make an effort to improve the channels by which "customers" can communicate with Google. But I see no signs that they are even aware of the issue; I see no effort to change anything.
I would try to tell them but... there's no communication channel. Maybe I should post about it on HN.
Google could have (and many expected them to) eat Amazon's lunch for cloud compute back when it was new. However, Amazon actually had infrastructure around customer support while Google famously really doesn't. From what I gather Google cloud is technologically better, but no one wants to use a cloud from someone who doesn't do customer support as well.
I've got a tale of two clouds. I complained about an issue with AWS on Twitter and two hours later was on a call with the EKS product manager who connected me with someone on their team who documented the issue, gave me a workaround, and then pushed out a fix to everyone a few days later. It was super impressive!
I complained about a separate issue with GCP. The product manager found my tweet and told me it was my fault for using a service marked as "preview". He then told me I should have used a different service, which was also marked as "preview". They were rude and defensive and made no attempt to resolve my issue. We decided to just not use GCP for the project as a result.
Neither of these are isolated stories. I have multiple examples of GCP staff just being super rude for no real reason, while AWS regularly goes above and beyond with customer service.
This matches my experience. GCP was like IBM a decade earlier: they assumed the name alone would close the deal and we didn’t feel like paying more for the privilege of doing their jobs for them.
One time two very competent guys shows up on site and let me reproduce the issue, agrees it is a serious bug and promises to take care of it.
Next time I get connected with what seems to be two part time students or something who doesn't seem to have a clue about the product.
Then in a third case some technical architect or something looks into my stack overflow question, verifies there is a known bug and it will be fixed in next monthly rollout, but I never saw a fix.
What's better is when their community forum "expert" redirects you to the first Stackoverflow link on Google - even though it is a band-aid solution that doesn't even solve your problem.
I had a similar experience, though it was years ago. Found a minor bug in Elastic Beanstalk and opened a ticket. They confirmed the bug and rolled out a fix in the afternoon. I was very impressed.
I've had very similar experiences with Google products. Their "support" for many products tends to be primarily community forums. And on those forums, "if you don't like it then don't use it" seems to be the standard type of answer.
> From what I gather Google cloud is technologically better, but no one wants to use a cloud from someone who doesn't do customer support as well
I have a hard time using any of google's stuff - cloud, whatever. Have had to deal with 'google search console' recently. "Give us your sitemap!". OK... "Could not fetch". The explanation for "could not fetch" might be "couldn't fetch" or "hasn't fetched yet" or "read but not processed" or "error reading" or.. anything else. On March 9, I had a sitemap entry listed as "couldn't fetch" updated to "last fetched date" of March 10 (9am UTC-4, so not even close to march 10 UTC time). It's just.... buggy. Colleagues currently moving stuff to google cloud (by edict of cto) are encountering bug after bug after slowness after flakiness. Google "support", to the extent they answer, says "we don't know".
Might it have been 'technically better' on day one? perhaps. But if it's buggy/flaky to deal with, and they have no support, not sure how you'd even verify the "technically better" part. How would you trust any numbers you see reported in their own tools?
Had to use GCP stuff about 6 years ago. It was flaky/slow and relatively unsupported then. Watching colleagues go through stuff today, in 2023, it seems no better.
I use both in my day to day work. My takes right now:
Roles/Permissions in GCP is just done better. The whole system of having to switch roles to be able to see stuff across multiple accounts in AWS is opaque and confusing. TrustPolicies are powerful but feel unnecessarily complex for almost every use case. Google has its own warts around permissioning (doing things often requires an opaque role that you have to look up, the error message is often unhelpful). However, it’s better than unraveling the series of permissions needed to, for instance, have an app pull from an S3 bucket in AWS.
AWS sucks at naming things. Everything is some nonsensical acronym that only makes sense to salespeople at Amazon. When you wonder what Google’s load balancer product is called, you look it up and it’s perhaps unsurprisingly called Elastic Load Balancer.
Another plus for Google: Having IAP as a first class citizen is a nice way to avoid having to set up bastions etc when prototyping.
On the other hand, we just spun up a Karpenter instance in EKS, and according to my colleague it’s much better than Google’s Autopilot product.
Also there is a whole industry around getting support from Google, lol. We use DoIT at my place of work, which is a company whose entire business is to pool together customer accounts for volume pricing and white glove support from Google. Interestingly, the cost savings from volume pricing are so significant there’s no cost to end users for using DoIT.
Elastic Load Balancer is in fact an AWS product. I believe GCP's equivalent is called Cloud Load Balancing.
Your point is valid and something I've felt many times as well (Route 53 means DNS somehow???), you just happened to choose literally the only AWS product that is well named.
True. But if you asked me what Route 53 does I wouldn’t be able to tell you without checking. Cloud DNS, OTOH,
is a pretty descriptive product name. I’m not saying you disagree, just reiterating the point that AWS naming is just a little too silly and clever for its own good ;)
I interviewed with Google Cloud a few years back and they straight up told me they were behind both AWS & Azure as a product and were lacking in user support. They said they were trying to build out an org to address it.
> From what I gather Google cloud is technologically better
It's a shitshow. Their design is ridiculous, overcomplicated and opinionated. It's clear they are run by engineers because no sane human would choose to use GCP knowing how bizarre and painful it is.
Creating an IAM role isn't strongly consistent. You have to build your own polling logic to wait for it to be done to attach a policy to it. Even using their own Terraform provider. If I worked there I'd be embarrassed.
As someone now in a position to make such decisions on a larger scale, GCP isn't even on my radar for this reason. I've had enough experiences with Google's lack of support when needed that it would take years of turnaround to undo that black mark.
AWS and Amazon as a whole has always had stellar support. And they'll continue to receive my business in both as long as that holds true, even though we rarely use it.
Well, the problem with Google is that AFAIK they have no plans to onboard their important services like Search onto GCP. I believe a few years ago they have tried but found the GCP tools too lackluster for internal use.
I think it’s basically the attitude “customer support is expensive and annoying, and since we’re Google and can do whatever we want, let’s skip it.” This kind of made sense when they had a massive tech advantage such that their products also needed less support.
Now they seem to have lost that edge and are just another competitor offering products that come with poor support.
The common practice among other companies is this:
- Give old users approaching or exceeding limit with an additional 10% or 20% buffer.
- Ensure to notify all users about the change
- Update relevant support documentation.
Clear communication is crucial in addressing new storage limitation. For example, organizations might be archiving all their emails in a single Google Drive for legal discovery, AI training, or similar purposes. These companies may quickly reach the 5 million limit, but the situation can be resolved by dividing the content across multiple Google Drives or utilizing alternative storage solutions like Amazon S3 or Dropbox. No big deal.
The crux of the matter lies in maintaining clear communication.
Agreed, it's absurdly low. My rootfs is ~1 million inodes, with ~61 million free too; I only reinstalled Gentoo a couple of months ago, so I'm only using 180GB / 1TB.
They're fully aware of how bad it is. They do not care. The Ad gravy train is still here, and they'll milk it until they can't anymore and then retire to their multiple homes (or go off and do it again to some other company).
I don't know why people expect leadership to do anything about a shitty company. They have literally no incentive to change. They'll always be hired somewhere else. Business as usual.
Essentially, Intel employees are 'chattle', and while communicating with them doesn't transfer ownership or control of them away from Intel, it does interfere with Intel's ongoing use of them.
The spirit is the same, but I don't think we're talking about the same thing. It's "chattel", and according to wikipedia, for whatever that's worth, it's "trespass to chattels".
I didn't see any reference there about treating people in a customer relationship as the "chattel" referred to in this law.
It wasn’t originally meant to be applied to employees. That’s a modern interpretation. It seems inevitable that some court will extend the interpretation to customers too.
I wonder if this message is getting through to Google, because the evidence suggests not. Any corp considering Google services has to seriously evaluate the risk that one day google will just decide to whack the service. Even in GCP, things that you might have thought would be solid for the long term, like IoT, can go away. Once you lose that trust, it's very hard to get it back. Azure and AWS I'm sure do their share of dumb things, but it really sticks to Google because everyone knows that they are infamous for killing products.
>Having a limit of 5 million files is perfectly reasonable
For ordinary users, this would pretty much equate to an unlimited account. Much like ATT's old unlimited bandwidth accounts. It's not until power users start hitting these ceilings to realize unlimited doesn't mean what they think it means. I don't know if Googs every used that kind of phrasing or not, but you can see where some dev never thought that limit would be achievable so there's no reason to really mention it.
I don’t think it’s reasonable. I think it’s poor engineering.
Imagine if a hard drive had a limit on the number of files, not just the size. Would that be reasonable.
Or imagine if someone designed a storage protocol and submitted it to IETF with such a limitation saying “it doesn’t affect the vast majority of users so we set this arbitrary limit to make our lives easier.” It would not make it through review.
https://en.wikipedia.org/wiki/Comparison_of_file_systems almost every filesystem has a limit. Some document it, some not. Some limit it per directory rather than globally. There are also path limits which are effectively a file number limits as well. But the limit itself is very normal and would make it through reviews without issues.
NTFS and ext4 have a 2^32 limit according to this table. The Reddit poster mentioned in the article [0] has 2 TB worth of space. I’m not sure if that’s decimal or binary TB, but if we assume binary, you get an average of 512 bytes per file, which is pretty reasonable (and on a real drive, you might be limited by sector size).
But even if you had a 20 TB hard drive, and you wanted a lot of files smaller than 5120 bytes, you can just create multiple partitions. Maybe a little inconvenient, but you can still use your drive to the fullest. Not so much with Google’s 5 million limit (for a 2 TB Google Drive, the minimum average file size is around 430 KB [2])
Sure, everything has some limit. But the lowest limit I see on that page is 2^32, which is just a touch higher than 5 million. (/s) And far more importantly, none of those filesystems just decided one day to reduce the limit, which is kind of a big deal. (Although funny, in a horrible way; imagine booting up your machine one day and it tells you that your home filesystem needs to drop a few million files before it can mount. Can you imagine?)
> none of those filesystems just decided one day to reduce the limit
Yes and no. This works differently from one filesystem to another, but search for "running out of inodes" to find many ways this can happen in reality, way below the technical limit of the filesystem itself. You can end up in situations where you can't create a new file and need to remove more than one to solve the situation. (Or even where removing many files doesn't seem to make a difference)
The thing about running out of inodes is that the format command scales them with the size of the drive, and the default is 1 inode per 16KB. So for a standard google account at 2TB or maybe 5TB you'd have 125 million or 312 million inodes.
They've basically gone and set it up like "largefile" mode, 1 inode per 1MB, forced upon all users. That's a bad default and an even worse mandate.
Of course, with a physical disk there are also things you can do to get around those limitations, like switch to a different filesystem or use multiple filesystems on one disk.
Using the cloud leaves you exposed to the whims of the cloud provider. They select and switch technologies based on their needs instead of yours.
If you manage to hit five billion files (which seems to be the standard limit), you might be in the market for object storage rather than new frontiers in file systems.
Then where are we getting 5 billion? The most common filesystem limits listed seem to be 2^32, which is a touch over 4 billion, and 2^64, which is Big.
(i.e. I can see now what I think you meant, but "5 billion" is a much closer match to "5 million" than 2^32)
> Of course, with a physical disk there are also things you can do to get around those limitations, like switch to a different filesystem or use multiple filesystems on one disk.
So, use another Google account? If someone's workflow depends on one Google account, sounds like they need to understand the concept of sharding.
Not a limit as small as Google set, and not a limit added overnight without any previous announcement or grace period. If you want to limit storing data in names of blank files, you could say “blank files, files smaller than 1024 bytes, and folders are counted as 1024 bytes for quota purposes”. A minimum billable size seems fair to me, and there is prior art (e.g. AWS S3 has minimum sizes for some storage classes).
on real filesystems, directories are effectively files and take up space. I see no reason for them not to have zero byte files take from the storage limits due to metadata.
“ a safeguard to prevent misuse of our system in a way that might impact the stability and safety of the system."
Google: We have identified modern web development as a threat to our systems, and have taken measures to ensure npm users cannot store their npm_modules directories on GoogleDrive. Please consider rewriting your nodejs projects in Go.
Google: We have identified modern web development as a threat to our systems, and have taken measures to ensure npm users cannot store their npm_modules directories on GoogleDrive. Please consider rewriting your nodejs projects in Go.
That's a bit unfair on Node and NPM. The modules directory rarely grows to more than 4 million files.
Exactly, I fell into this trap back when I was starting out as a developer. GDrive is a terrible backup solution and it'll overwrite and corrupt files there if you're actively working on them. As of a few years ago anyway.
It also doesn't help that there's no way to exclude a folder with wildcards so you can't blacklist all node_modules folders.
So they are not manually choosing what to sync, they just put all of the work and other files they want to automatically sync inside of that folder. This ends up including things like node modules and things like that
> a folder that auto syncs with Google Drive...Same way that Dropbox works
I’m on a Mac and find that google drive’s Mac client is so unreliable as to be useless. It frequently crashes mysteriously, and even when running it doesn’t sync reliably. It feels like nobody is working on the software, and after all the layoffs that could well be true.
Say what you want about Dropbox, but I’ve never had a problem with it.
> I’m on a Mac and find that google drive’s Mac client is so unreliable as to be useless
True.
I used Dropbox for many years without issue and moved because of the cost and the issue a few years back with not allowing it on certain Linux file systems.
Went to Google Drive and performance was shocking in comparison. At the time (don't know about now) it also felt like they had a deliberate policy to avoid showing cumulative file/folder sizes, meaning it was easier to pay for more storage than to find out what took the space and clean it up.
Since then I've spent a few years using PCloud, which was almost as fast as DropBox and more reliable at syncing than Google. Then I lost the ability to connect the client on a Mac for a couple of days, had a deeper look, and noticed a few files had had silent collisions between changes on Mac and Windows (not at the same time) leaving multiple versions (names suffixed with the machine they came from).
So I left there and went to OneDrive. To be honest I've found it okay other than one issue on a Mac where if I delete a file for some reason then when it vanishes to the recycle bin it reappears in the OneDrive root on the laptop. delete-immediately gets around that.
I'm now at the point where I use OneDrive for convenience/reliability but despite the cost have been considering a return to DropBox as cloud syncing seems to be impossible for anyone else to do properly. Is it still as reliable as it used to be?
I don't sync folders. I mount my accounts as network drives.
As for corruption or overwriting files, I have rarely encountered issues, but Drive has file revision histories as well as a Trash can, so if you notice the problem and it's not spread over 5M files, you can roll back to last-known-good versions.
> I don't sync folders. I mount my accounts as network drives.
Implementation-wise under the hood is it really much different?
In my simplistic, probably overly naive idea it seems that a “network drive” built on top of Google Drive would still cache the files in memory, and improper shutdown or loss of network access could still leave your files in an inconsistent state. Probably they have a lot of advanced logic on top, that could roll back partial updates, or keep track of things so that the updates are not applied before sync is complete.
But anyways from this point of view sync vs network mounted drive feels like sync is a kind of network mounted drive that flushes to local disk in addition to working with the remote.
One of these days I want to build such things from scratch myself so that I can experience first-hand the horrors that lie underneath sync / network disk service setups modelled on my idea of Dropbox / Google Drive etc
Not saying that the approach taken via G Drive is smart, but to your question: For a full backup of your application just storing package.json/package-lock.json is not enough. In case of things going south this might leave you in an unrecoverable state - imagine a package is removed for whatever reason from npm or npm becomes unreachable.
Now in your risk assessment the question of course is how likely it is that you lose the copy alone your dev system, CI system and productive version at the same time that npm loses it, but the first three happening isn't as unlikely as it seems: You work on a major refactoring and while deploying you run into a major problem and want to roll back, then artifacts from the old version might be gone. Of course there are ways to prevent that. But that's the risk assessment and prevention you have to make. A full backup including node_modules is one way of dealing with that.
I use both git and Dropbox, because I have multiple devices to sync: laptops, a desktop, and linux servers.
If you use a desktop and a laptop back and forth, git is quite inconvenient. You don't need to commit and push from the desktop to switch to the laptop with a folder level realtime sync solution.
Also, git only protects files that are staged, but Dropbox can restore any files to any point of file event (not periodic snapshot, but it keeps all file events). So combining both you are safe from any kinds of mistakes.
I use Dropbox because Google Drive sync wasn't stable enough for this kind of workflow but I am not sure if it has been improved lately.
Because git combined with Dropbox and the likes can be slow, lead to broken repositories, mad sync conflicts and so on? At least that is what I've witnessed with colleagues. This was years ago though, and I assume you dont have these issues or maybe a different workflow, so this is more like a word of warning for others: test this properly first before just diving into it.
Hmm I haven't have significant issues for almost 10 years. And a few years ago Dropbox introduced a folder-level rewind feature so you can easily restore any broken folder.
Maybe if you shut a laptop while it's syncing and then start editing the same things on a different computer without considering that you should finish the sync, but that seems pretty niche.
Are you sure those people were using their own personal dropbox folders? If they were sharing with each other that's a different thing entirely and requires a lot more care to avoid problems.
Are you sure those people were using their own personal dropbox folders?
Yes, one single account.
while it's syncing and then start editing the same things on a different computer
IIRC this was one of the issues. Not necessarily because of shutting down, but just when using multiple computers one after the other and Dropbox not having synced in between. Not that niche depending on line of work, think lab-like setups where one thing gets deployed to multiple machines and deployment then gets tested. In one company the 'oh but you have to wait for dropbox to sync before doing that' and 'how?' responded by 'urm, yeah, check the timestamp on all files in that directory' was a rather common thing, often leading to stupid issues. Put git in the picture there and things won't get any better.
Oh, using it for deployment, so you're moving directly from one computer to another right after making a change and then using those files? I could see how that causes problems sometimes. I think the use case of "here is my checkout, I edit it on multiple computers, I don't use it as anything else" is a lot more reliable.
I think you're on to something here. It might be both safe and convenient to use git with Dropbox iff you create a bare git repo that is not in a Dropbox folder, and then specify a separate worktree that is in a Dropbox folder. That might give you the best of both worlds, but I haven't tested the idea.
There aren't any fundamental issues with putting the git repo in Dropbox. Under normal circumstances, to cause a conflict you'd have to do something that writes to the git folder, then switch computers, pull up the same checkout, and do something else that writes to the git folder, all within less than half a minute.
And even then it's just something like having disagreement about an index file. Easy to resolve.
The fundamental issue is that Dropbox changes are non-atomic w.r.t. what git considers "atoms." See my other comment in this thread. You're right that there's usually not a problem with a single contributor, but if you ever have two or more contributors sharing a Dropbox-contained repo it will be bad news.
Sure, don't share. But there's a good use case for putting a single-user checkout into Dropbox, and that's because it adds convenience and gives you continuous backups.
Though outside of garbage collecting, git doesn't have many files that actually get changed. Mainly the ref pointers which are easy to resolve and the logs which aren't very important.
I've done this in the past but stopped. The problem is there are no atomic transactions in Dropbox. If two people are pushing changes to a repo at the same time using only git, it's not a problem because git understands atomic transactions. But if that repo happens to be in Dropbox all hell can break loose because the various files inside the .git directory are not being synched atomically.
A git repo in Dropbox can appear to work okay with a single contributor where that contributor is only working on one computer at a time. But as soon as you add a second contributor, you very much should take Dropbox out of the process.
There's a surprisingly large number of groups of people that use Google Drive, Dropbox, and similar for "source control" and/or collaboration on projects instead of real VCS.
I have worked with colleagues who expect to be able to keep their current work in Dropbox/GoogleDrive, so they don’t need to take their work laptop home every day but still have easy access to their current work from home.
“Hey, why can’t I get Dropbox to work on this laptop? I need it so can do emergency bug fixes from my personal laptop at home!” “Yeah, nah. Remember that security policy you signed as part of your employment contract? The one that says exfiltrating company or client data or code to non corporate systems is a fireable offence?”
I want to say that the sort of place that had GDrive as their "approved cloud solution" is unlikely to also be the sort of place that has the sort of data the required firing-offence rules for exfiltration to non company systems, but I know that's not the world we actually live in...
I'm reminded that Lastpass got popped via an employee running a well out of date version of Plex with known RCE exploits and getting a keylogger installed:
“This was accomplished by targeting the DevOps engineer’s home computer and exploiting a vulnerable third-party media software package, which enabled remote code execution capability and allowed the threat actor to implant keylogger malware,” LastPass officials wrote. “The threat actor was able to capture the employee’s master password as it was entered, after the employee authenticated with MFA, and gain access to the DevOps engineer’s LastPass corporate vault.”
The hacked DevOps engineer was one of only four LastPass employees with access to the corporate vault.
I mean, I know exactly why. But why the fuck was the corporate Lasspass vault available to a staff member's home machine running Plex? Is the expense of a corporate vpn-locked laptop and a pair of Yubi keys too much for a fucking _Password Vault_ senior developer??? "Should we spend a couple of grand buying a work laptop for this guy who literally has the keys to our kingdom? Or do we save a few bucks and get him to work on the machine he already owns, the same one he runs outdated media server software and probably bittorrents all his porn with? What could possibly go wrong?"
I don't do that, but I was once bitten by iCloud. I always put my code in ~/code and copy it over when I get a new machine. I once had the idea to just put it in ~/Documents/code, thinking it'd be automatically synced. Turns out that iCloud doesn't sync dotfiles.
I use Git for version control but for some of my personal projects I also have a copy in Dropbox, so I can make changes on my laptop downstairs, not be ready to commit it, save it, go upstairs, and continue work from my desktop pretty much seamlessly.
Some of my projects aren't set up this way, and it can get annoying to have made a few small changes and be forced to commit them in order to pull them from upstairs.
If we somehow changed the rules so that Mc Donald and Starbucks were illegal, we would not end up with local artisinal coffeeshops on every corner, we would likely have a lot fewer places to get good coffee.
McDonalds is my favorite answer to the popularity argument. Popularity is a data point to factor, but it's only one and it's meaning and value are entirely context dependant. Basically means nothing by itself.
Get over yourself. McDonald's is far from "lowest common denominator." Go travel the world a bit and see how real people live and what they actually eat every day.
We had a drive thru grass fed burger place open up across the street from McDonald’s and it was gross. It lasted maybe a year. McDonald’s makes pretty good burgers, not what I want every day for sure but in a pinch they aren’t bad.
I use Google Workspaces at work, and I have configured my notebook more like a netbook. Every possible file and folder gets stashed in Drive, and the agent runs to integrate with the Windows filesystem. This is especially important for Company Data and Confidential Materials.
I have modest storage needs in my role, but imagine a developer, Big Data user, or video editor used it. As long as they could tolerate the latencies, they might stash quite a bit in there.
If I was teaching anybody programming I would teach them to copy the folder over every time they added a new feature, rather than teach them git at the same time.
With some C++ libraries or node_modules that can run up fast.
It's not teaching them the wrong way. It's teaching them a simple version of the right way.
You can't do everything at once and git is counter-productive on day one, and the most important thing was they find programming remotely interesting at all.
The next most important thing is always working on a new copy.
How exactly you make those new copies barely matters at all compared to just having them at all.
Using something arcane and unforgiving and inscrutable and complex like git to accomplish those copies is way way down on the list.
The new student is barely interested in the actual coding to make a game if you're lucky. They would rather mow lawns than learn administriva like git.
It could even be argued as irresponsible to hand git to babies and expect everything to be fine. Just because we all have about 5 git commands we use 99.9% of the time, and have no problem 99% of the time because we learned how to "hold it right" and just always carefully do things in the right order, does not mean git can actually be made safe and simple. Those 5 or so commands aren't actually enough.
I thought about it more, and I'd say we actually agree on the principle of teaching simple and building up. It's more a question of how different teaching techniques present the value of source control.
The hard thing about source control is not the how — using the git CLI is only slightly more complicated than copying a folder. Git turns that into a very straightforward directed graph, and the CLI gives you a few commands to move around the graph, sprinkling some more edges and nodes wherever you like. It takes about an afternoon to figure out once you're onboard with the why.
The why of source control is the thing — and our industry is already full of people who don't understand it. That's how we get people who treat git as a "save" mechanism to be invoked whenever it's been a while since the last commit, or branches with one sloppy commit message after another [1], or entire repos that are just spaghettified carelessly-merged hairballs of commits.
Mastering the why of source control means learning when to cut a commit, what shape that commit should be, and why that is.
And I think that's more my objection to the teaching approach outlined here — without any other context, it reads to me like it's not quite focusing on the right things. At worst, it's inducting the student into the industry-standard cargo-cult approach to source control. [2] If the why isn't learned, then it doesn't really matter whether the specific motion is copying folders or typing in git CLI commands.
Ideally I think source-control techniques could be introduced right when they're going to be interesting or fun — when it's time to build something together with someone else. Then that set of lessons could start by copying folders and eventually building up to git — and each lesson could show how good, disciplined use of source control concretely improves the collaboration process.
i don't entirely agree with not teaching git to a new programmer, but if we look at the extreme it makes more sense. how would i teach a nine year old? exactly like that: copy the project.
actually, a good way to think of it is releases. even with git, i keep a copy of each project release, that itself is not stored in git.
A new programmer can just use a git UI like the Github app. It abstracts everything away and leaves you with the obvious benefit of being able to commit a succession of snapshots that represent logical groupings of changes.
I haven’t used cli git since uni. GUI is better for basically everything except for a few rare and advanced commands.
Yet that is still just one more impediment to getting someone off the ground. You are already throwing programming concepts (variables, and loops, and conditionals, oh my!) + big scary IDE + likely first-time command line, etc onto their plate. Invoking yet another weirdo program into the paradigm is not helping a beginner.
Jeremy Howard (of Fastai fame) has a good analogy that we do not make people sit in a classroom for a semester to learn all of the theory of baseball. Instead, we give them a bat and a ball, and layer on the instructions of how the game works. You can get a good approximation of the game with just a couple minutes of instruction and refine the understanding from there.
if someone is at that level they are learning the language and do tutorial or student projects where version control doesn't yet matter. which is fine. once they have done a few of those, it's time to introduce basic version control.
I managed to finish college without a real VCS. Would it have made some things easier? Sure, but again, add it to the pile of things I needed to learn and did not have enough time to study.
as far as college is expected to prepare students for work, version control should absolutely be part of that.
i don't expect a fresh graduate to be a skilled git user but if the nature of development at work is substantially different from what the student learned at school, then they will have a much harder time adjusting.
they should at least be familiar with all the concepts that they will encounter in a junior job and not be faced with having to relearn a completely different development process.
You can’t teach all of the things at the same time. People will just freak out, be confused and learn nothing. There’s a lot of the things that you’ll teach “wrong” at the beginning. You’ll never talk about CI/CD at that stage, deployment, packaging, etc. either, despite a lot of this being a requirement for a lot of software engineering projects.
As others have pointed out, it is not teaching others wrong things: copying the directory is (modulo optimization) what Git does. However if you dump people straight into everything at once they will become extremely confused and won't learn anything.
What I want to do is take one step at a time, and build on what they already know.
Then when they can program, you can show them how to use Git and the superpowers that comes with it.
I always like the Odin Project way of teaching - you start with setting up virtual machines, install IDE, learn git before you get to write your first line of code.
This sounds a bit like Lockhart's Lament about how we teach math is why everyone hates math.
The analogy was imagine if we taught music that way, where you spent years learning fundamentals and theory before ever being allowed to touch an instrument.
His idea was to begin with one's voice as a musical instrument, so one would interact with music directly at first, learning the theory and the notation along the way.
The lady quoted in the article agrees with you but also seems to support why I thought of Kodály in the first place:
> "Eleven years of piano lessons taught me something about playing the piano
> but almost nothing about music," she has said. "I was skating on the
> surface. If a child is shown a written crotchet they have no physical
> understanding what’s behind that. Kodály musicianship puts petrol in the
> tank in that it gives them a profound experience of music-making, through
> the voice, building up a repertoire of songs and giving them the
> unconscious knowledge of pitch-matching, walking the pulse, rhythm,
> phrasing and improvising – before making it conscious."
So you will not be touching the piano before all of this takes place and you will be learning the fundamentals and theory, even if in a playful way.
Playing a classical instruction you would normally do that. Maybe you learn how to hold, change reids, do scales first but without theory you can't read notes or understand what to do with them. Without reading notes you can't play songs off of sheet music.
~/Development contains 3 projects I worked on this past week. Only 3 because this is a new laptop. I'd hate to think what this looks like on my old laptop. But I'll run it later.
Hmm, there was a HN thread about this a few days ago [1] where everyone seemed to attack people for even considering the idea of storing 5M files in a cloud storage solution, going so far as to argue that even disclosing such a limit would be unreasonable to expect.
In this thread, the prevailing thought seems to be that having a 5M file limit is unreasonable and adding it without disclosing it is egregious.
The two threads are talking past each other, not really debating the same thing. This thread is pretty much ignoring the potential unreasonableness of storing 5 million files on a free/cheap cloud service without other backups, which is key point over there. The other thread is mostly ignoring the fact that Google seems to have imposed this limit arbitrarily and without notice, possibly causing data loss, which is the key point here.
Shows the power of framing -- focusing on different components of the same situation and facts can often lead to entirely opposite opinions or conclusions.
Very true, I agree - but before there's a comment, there's a headline.
"5M item limit for Google Drive: File unable to generate or upload due to 503" (mostly passive, it's a user problem) vs "Google Drive does a surprise rollout of file limits, locking out some users" (Evil Google secretly screwing customers over)
I agree, to the point that very frequently I just collapse the top comment immediately upon arriving on the HN discussion page because I want to get more perspective.
Interesting observation, I have noticed too that contradictory opinions are very frequent. My completely subjective impression has been that there is a loud contingent that just likes to have a contradictory opinion, a form of mansplaining that is frequent in engineering circles.
Or maybe it’s just that people in general are driven to give their opinion on things they disagree with.
> people in general are driven to give their opinion on things they disagree with
I’m definitely more likely to reply when I disagree than when I agree.
I do try to make a habit of giving positive replies also (and not just building on what was said before, as I’m doing here, nor just upvote silently), as constant contrarianism is… kinda dull, especially from the outside.
I pay for 5 TB and planned to use the drive to store a copy of my data.
Things I store that have lots of files:
- The frames for my Timelapse videos = 400,000 files
- The files in my Eagle app photo database = 400,000 files
- Other image files, my programming repositories, documents, music, stable diffusion Deforum frames = 400,000 files
80% of these files I've accumulated in the last 12 months and can see myself easily hitting this 5,000,000 file limit well before I run out of TB's
So now I know I will never be able to use all the space I'm paying for, I'm going to stop uploading my files and instead search for a proper backup service, something I should of researched in the first place.
Anyone here have any recommendations for a backup service?
+1 for rsync. Have used them for a couple of years. I set up a script to backup my stuff there a couple of times a day. Their daily snapshots have also saved me from my own mistakes on several occasions.
This is the first time I hear about Jottacloud and I'm pleasantly surprised. Their combination of bucket-like storage + mobile app + web app + rclone support seems to tick all the boxes for a Google Drive replacement.
Dropbox’s omission is interesting. I’m using a large paid account to backup a few TB of data spanning many years. Is there something I should know? Losing it would be devastating.
Dropbox is an American company. Most hackers does not trust American hosting companies because of the NSA leaks confirming everyones paranoia. I wouldn't trust Dropbox with a ten foot pole.
I tried using this for a media streaming project, but I couldn’t get good enough bandwidth to stream high-bitrate media. I ended up paying more to stream files from a storage provider that caches on CDNs.
I do recommend it for other storage though: snapshots, SMB means you can mount it directly on Windows (and iOS!), cheap, and I personally like Hetzner.
Their Storage Share service (running on Nextcloud) is also good if you want managed syncing, a-la Dropbox. Same server as a Storage Box but with managed Nextcloud on top.
I was thinking the same thing but I tend to create a large compressed backup file and upload that instead. My backup file is currently under 100 gigs, the most critical files not my whole system's files. If I backed up my gaming account (I don't need to since I use Steam Cloud) and compressed it, it'd probably only be another 50 to 100 gigs. This way I should never hit the 5 million file limit.
Having said that, I feel 5 million files is too low for a lot of people.
I've used Backblaze for years and never had a problem. There's no upload file/data limit, but downloading it back can be slow. There's an option for them to mail you a hard drive but I never tried it.
Backblaze is terrible, but you will likely only find out when you need to restore.
When backup excluded some files, they blamed antivirus software and recommended to go without.
They actively deleted their only backup because the client couldn't read the original files (which was due to disk error).
They admitted that there isn't any guarantee or even effort to make the backup consistent - after asking me to explain the concept.
The only reason I still have my data is because of the very expensive cleanroom disk rescue that they of course refused to pay for - because why try to do what you can to compensate for failing at your only job?
It sounds like these problems are related to their end-user backup solution - can't comment on that as I've never used it.
However, when referring to Backblaze, I think most people here refer to their nice (and cheap) S3-like cloud storage solution, which works perfectly with the likes of restic, rclone and friends. That's probably what you should use if you care about control.
At least 2-3 times per year Backblaze goes into a safety freeze, and the only way to unclog the works is reinstalling+inheriting the backup. Which involves re-uploading every single file.
Would not choose Backblaze if I was choosing today.
Due to the safety freezes and the high expense of Backblaze when backing up multiple machines, I have switched to iDrive. So far it seems to be working ok.
I've used it for many years. About 18 months ago had a complete motherboard failure on a MBP and had to restore a bit over 2TB to a new computer. I used the "mail me a drive" option and it worked fine. Only challenge was that it took about 1 week for them to create the disk from their backups.
I've had a good experience with Backblaze on MacOS. But I'd still consider it as a secondary backup (a backup-backup?) to use in addition to a local backup like Time Machine.
Time lapse videos are created from photos taken over a period of time. If you create a 30fps 10 minute video, that's already 18000 photos.
It's quite a pain in the ass to import, color grade, deflicker etc. So one would usually wait till there are a few projects are taken, and process them all at once. If this is your hobby, you can easily hit 5000000 photos in a couple of years. Especially if you're doing something like an ultra long time lapse of say, the construction of a building using two cameras.
If the question is "why don't you delete the photos after creating the time lapse video?" It's called photo hoarding haha.
Anyway that's irrelevant, because if I paid for a certain storage size to backup my hoarding habits, I expect to be able to use the capacity, and not have some random ass limits.
If the number of users affected is as 'vanishingly small' as a Google spokesman indicated then you'd think they'd be able to contact them - at least the paying customers?
Abusive behavior is almost always long tail (unless it becomes well known that folks can get away with it, then it ‘fattens up’).
They almost certainly did contact those specific customers, but are sending a warning out publicly for folks who would be a problem so they go somewhere else/don’t do it to begin with.
It’s the ‘police press release’ method of community moderation, like announcing a speed trap.
Cloud services like G Drive depend on oversubscription anyway. If every customer hit the limit, it would be a problem for them.
Faulty logic. Just because the user count is small doesn't mean the problem is.
Banning the letter ö from names on hacker news would affect a vanishingly small amount of users. But if threads needed 100 times more processing and were much slower to load when a user with a ö posted, the problem itself is quite large.
Without knowing the full circumstances of why they're doing this, what problems it solves you can't really say if it's a good solution or not. We can however absolutely complain about the "no notice, no advice, no assistance" way it was implemented.
Probably a very small amount of people abusing it somehow (How a lot of files can be abused i dont know), like how a small amount of people abused the old gsuite unlimited plan to store petabytes for $12/m
Then why doesn't Google use their standard solution and just ban those vanishingly few users, pointing to an ambiguous "terms of service violation" and then ignoring all their support requests for a lifetime?
I see it speculated, downthread, that this is a response to modern web-dev and node (?) creating millions of files, etc.
I can’t comment on that but I do know that modern, encrypted, archive tools such as duplicity and borg and restic “chunk” your source files into thousands (potentially millions) of little files.
We see tens of billions of files on our zpools and have “normal” customers breaking 1B … and the cause is typically duplicity or borg.
At a previous company the biggest cause of files in a particular context was that "oh my zsh" was installed for each of thousands of engineers by default. It often seems as though "modern framework" just means it is an abusive piece of junk.
Billions seems weird for normal customers. Duplicity appears to put its chunks into 25MB files, I know for sure that restic puts them into 4MB or more recently 16MB files, and Borg looks like it puts them into 500MB files.
Well I interpreted "normal" to mean "does not have many petabytes of data", and if those tools are reaching a billion files with smaller amounts of data then I'm very confused about how it's happening.
Good remainder again that "the cloud is just someone's else computer"!
In my experience, GDrive is a piece of crap with a lot of weird behaviors and easy ways to lose your data if you sync your computer with it.
The worse here, as said by multiple persons, is not to have a limit. A limit on their service is fair. It is that this limit is undocumented, and that their key selling point is to shout everywhere that if you pay you will have "unlimited" storage. And that it will scale more easily than using your own "not cloud" backups.
Do they advertise unlimited storage? When I looked, the top plan was 30tb. To hit the file cap, you have to have 30tb of files averaging no more than 6mb.
And I did not search for example, but I'm quite sure that they also advertised google one like that. Something like no need to keep files in your phone anymore ever, unlimited storage with the cloud.
Drive is a consumer product. Having to explain to a user why they were charged an extra 5 cent because of a feature they don't even understand is not viable. If you want no limits granular pricing. Use S3 or the Google version of it.
They provide exactly the product you want, it just isn't Drive.
I've noticed over the years they have been cutting off all the unlimited stuff. Docs used to be unmeasured and unlimited, now they actually count the doc size to your storage, which is still effectively unlimited under non abuse use cases.
Once again: Don’t use Google for anything crucial or critical. Not Google Cloud, Google Docs, Google Drive, even Gmail is becoming a liability.
Real Engineering involves developing forward looking designs and maintaining backwards compatibility. It involves a release schedule. It involves communication channels and releases notes. It’s hard. It’s unsexy.
Google treats their product lineup with the seriousness of a social media platform. They don’t care about your puny business; even if it means the world to you, it means nothing to them.
"Vanishingly small": a number of users small enough to be downplayed, but large enough so that neither an individual approach to the problem would work, nor that the problems could be ignored. Suspected to be a complex number.
Anyone knows how this works legally? You buy a service, suddenly without notice the services changes features. Does the small print allow for that? And how is this 'ok' in software but probably not anywhere else (pretty sure a service contract for an elevator doesn;t allow the service company to just say "we're going to limit the ?amount of times your elevator goes up and down to 100 times a day now")
Some people will let technical limitations define a product. Others will have the product dictate the technical design. This, to me, is an example of the former.
I don't know the serverside implementation of Google Drive but imagine the files on your Drive correspond to files on something like an ext4 filesystem. In this scenario, each file has a cost (eg in the inode table) and there is wastage depending on what your block size is. Whatever the case, Drive seems to treat files as first-class objects.
Compare this to something like Git. In Git (and other DVCSs like Mercurial), you need to track changes between files so the base unit is not files, it's the repo itself. It maps to your local filesystem as files but Git is really tracking the repo as a whole.
So if you were designing Google Drive, you could seamlessly detect a directory full of small files and track that directory as one "unit" if you really wanted to. That would be the way you make the product dictate the design.
Very interesting that google chose to do this instead of fixing the software that caused the limitation. No wonder that their products are seen as a joke in the business world.
Blows my mind people still trust them TBH. How many years of poor behaviors and bad decisions does it take for people in the know to move on. Understand there are a lot of non tech savvy users who wouldn't know, but I would expect hackernews users to be better informed on average and would have moved on
The challenge with running cloud storage is that you have to think around the corners for usage and shape customer behavior with pricing. Seems like google didn't want to do this or was too lazy (sorry). Millions of files will always be a problem. The metadata costs more for these users, it's impossible to manage, hard to cleanup, etc.
The problem with google is if they fuckup their service they make it the customers problem. Other places if they fuckup, its more viewed as a one way door. You can sunset old products with (in this case, unlimited files), but you never put in a new restriction.
It's easy to hit that many files, and they made the change without warning.
If you're on the $10 2TB plan and your files average 100KB, 5 million files means you can only use a quarter of the space you're paying for.
And before you call that unrealistic, my system drive averages 200KB per file and my main personal data drive is close to 300KB per file. Both would hit the limit if I wasn't using something fancy to pack files together.
And then there's the $18 5TB plan with the same limit on file count. Even completely ignoring the option to pay for extra terabytes.
..could you not split it up over multiple, smaller drives then?
I get that this could be frustrating (being surprised specifically), but it seems a pretty reasonable restriction so it seems odd to criticize it like this.
Google Drive is not a device, it’s a cloud storage service. There’s no metaphorized “drive device” either, it’s all under the same virtual root folder.
> could you not split it up over multiple, smaller drives then?
Well, you could create multiple Google Accounts and create all the files in the same Drive with sharing enabled, because the limit isn’t on what is in one user’s Drive, but what one user can create across all Drives.
The problem is that some people actually were well above the 5m files limit, and this sudden update ends up functionally locking them out of their account. They'd have to delete millions of files in some cases if they just want to add more files.
That can be extremely disruptive especially for small businesses, who would be the most likely to depend on a Drive-based workflow.
That's not really a lot. Imagine a script that runs daily and processes demographic data, creating one output file per US zip code. That's over 40k files per run, so you'd hit 5 million files after just a few months.
I wonder if you could create a block-level virtual filesystem backed by Google Drive so that you could store many small logical files in one physical remote "block" (file).
Maybe, but most storage providers don’t expose a seek/range API for their consumer stuff, so you’re doomed to fetch & upload the whole block every time.
For about a decade, Dropbox was the only one to offer delta sync, OneDrive added that a few years ago, and they’re the only two IIRC that do delta syncing. The rest basically redownload & reupload.
Seems like an Engineering issue more than a User issue. They could just take the node_modules folders and zip them up behind the scenes without changing the user interaction.
This is why despite G Suite being in many ways a superior product, it's made almost no inroads in Corporate America vs. Microsoft Office. Enterprises need to be able to specify a business workflow and depend on it, and if there are nasty surprises it fucks with their money.
Microsoft software is much worse than many competitors but it's documented, the behavior doesn't change suddenly, and it's backwards compatible.
Probably the usual corporate reason that the nail that sticks out gets hammered down and that the messenger is often shot. Especially with all the layoffs it's better to not be the one in that position. And even if you do convince them it's a bad idea and don't get punished for it what do you gain? A half sentence on your performance review? Risk not worth the potential reward.
I believe Google Drive for Workspace always have a file count limit, and IIRC it's as low as 500k or something, despite having "unlimited" capacity.
To be totally fair to Google, I know this precisely because there are communities of data hoarders that actively abuse various cloud storages. In Google Drive's case, They have ways to create "free" Google Workspace accounts via registration exploitation from various institutions. People use them to store PB-level data.
(For the interested, there are also ways to apply free MS developer accounts that are supposed to expire in 3 months but can be re-refresh indefinitely. This comes with 5TB "free" cloud storage x 5 (10?) separate sub-accounts.)
What if Microsoft changes their mind and you wake up one day with your cloud backup being nuked due to the account expiring or due to your use being non-development-related?
At this point people are just hoarding for the sake of hoarding. The data itself isn't really that important: collecting the entire Netflix catalog is pretty neat, but if it's gone it's fine, you can basically re-find anything in piracy world.
I have multiple backups, but they're all on drives in my house. If my house burns down, there go all my backups. That's why I keep all the important stuff on a cloud backup service as well.
I wonder what jury-rigged solution may lead to breaking into 5M limit? I can't believe it was just digital hoarding. In the end, hoarders know better to keep things in zip-archives.
I do exactly that for storing the result thumbnails for some of the dbs in my reverse image search engine (SauceNAO). Non compressed zip files allow quickly/easily seeking to and accessing component files without extraction. A few tens to hundreds of thousands per zip file works great. Millions would probably not be too different, but would use more resources/take more time when loading the zip file index.
Haven't looked into it, but it sounds like it would work similarly (with some nice benefits such as also being able to easily store other metadata/etc). Feasibility would depend on how quickly the indexes and such load, and the resource consumption associated with opening/closing dozens of them at a time 24/7.
In my screwy case there are hundreds of thousands of zip files which are randomly accessed on the fly to grab one or two thumbnails at a time. The random access speed on unloaded files is critical, and for zip files it's extremely quick.
.jar/.apk (internally a zip archive) comes to mind.
AppImage (internally is either a ISO 9660 Rock Ridge or a SquashFS filesystem), .deb (internally ar archive), .rpm (internally a cpio archive) are I think relevant examples too.
I wonder how that works for companies using Google Workspace. My company has Workspace users close to 6 digits I believe, I'd think we collectively store way more than a few million files.
https://news.ycombinator.com/item?id=35329135
https://news.ycombinator.com/item?id=35395001
Please delete 2M files to continue using your Google Drive account - https://news.ycombinator.com/item?id=35395001 - March 2023 (109 comments)
5M item limit for Google Drive: File unable to generate or upload due to 403 - https://news.ycombinator.com/item?id=35329135 - March 2023 (133 comments)
Google KEEPS setting new records for poor customer communication, to the point where I (and much of the HN crowd) now expect it. Android developer banned from the app store? There is no meaningful way to appeal but you'll probably never be able to find out why. Your best hope is to post on HN and hope someone with power at Google notices.
Leadership at Google ought to recognize this; they ought to make an effort to improve the channels by which "customers" can communicate with Google. But I see no signs that they are even aware of the issue; I see no effort to change anything.
I would try to tell them but... there's no communication channel. Maybe I should post about it on HN.
I complained about a separate issue with GCP. The product manager found my tweet and told me it was my fault for using a service marked as "preview". He then told me I should have used a different service, which was also marked as "preview". They were rude and defensive and made no attempt to resolve my issue. We decided to just not use GCP for the project as a result.
Neither of these are isolated stories. I have multiple examples of GCP staff just being super rude for no real reason, while AWS regularly goes above and beyond with customer service.
One time two very competent guys shows up on site and let me reproduce the issue, agrees it is a serious bug and promises to take care of it.
Next time I get connected with what seems to be two part time students or something who doesn't seem to have a clue about the product.
Then in a third case some technical architect or something looks into my stack overflow question, verifies there is a known bug and it will be fixed in next monthly rollout, but I never saw a fix.
I have a hard time using any of google's stuff - cloud, whatever. Have had to deal with 'google search console' recently. "Give us your sitemap!". OK... "Could not fetch". The explanation for "could not fetch" might be "couldn't fetch" or "hasn't fetched yet" or "read but not processed" or "error reading" or.. anything else. On March 9, I had a sitemap entry listed as "couldn't fetch" updated to "last fetched date" of March 10 (9am UTC-4, so not even close to march 10 UTC time). It's just.... buggy. Colleagues currently moving stuff to google cloud (by edict of cto) are encountering bug after bug after slowness after flakiness. Google "support", to the extent they answer, says "we don't know".
Might it have been 'technically better' on day one? perhaps. But if it's buggy/flaky to deal with, and they have no support, not sure how you'd even verify the "technically better" part. How would you trust any numbers you see reported in their own tools?
Had to use GCP stuff about 6 years ago. It was flaky/slow and relatively unsupported then. Watching colleagues go through stuff today, in 2023, it seems no better.
Roles/Permissions in GCP is just done better. The whole system of having to switch roles to be able to see stuff across multiple accounts in AWS is opaque and confusing. TrustPolicies are powerful but feel unnecessarily complex for almost every use case. Google has its own warts around permissioning (doing things often requires an opaque role that you have to look up, the error message is often unhelpful). However, it’s better than unraveling the series of permissions needed to, for instance, have an app pull from an S3 bucket in AWS.
AWS sucks at naming things. Everything is some nonsensical acronym that only makes sense to salespeople at Amazon. When you wonder what Google’s load balancer product is called, you look it up and it’s perhaps unsurprisingly called Elastic Load Balancer.
Another plus for Google: Having IAP as a first class citizen is a nice way to avoid having to set up bastions etc when prototyping.
On the other hand, we just spun up a Karpenter instance in EKS, and according to my colleague it’s much better than Google’s Autopilot product.
Also there is a whole industry around getting support from Google, lol. We use DoIT at my place of work, which is a company whose entire business is to pool together customer accounts for volume pricing and white glove support from Google. Interestingly, the cost savings from volume pricing are so significant there’s no cost to end users for using DoIT.
Your point is valid and something I've felt many times as well (Route 53 means DNS somehow???), you just happened to choose literally the only AWS product that is well named.
It's a shitshow. Their design is ridiculous, overcomplicated and opinionated. It's clear they are run by engineers because no sane human would choose to use GCP knowing how bizarre and painful it is.
Creating an IAM role isn't strongly consistent. You have to build your own polling logic to wait for it to be done to attach a policy to it. Even using their own Terraform provider. If I worked there I'd be embarrassed.
AWS and Amazon as a whole has always had stellar support. And they'll continue to receive my business in both as long as that holds true, even though we rarely use it.
Otherwise what would they do when they inevitably get discontinued?
Now they seem to have lost that edge and are just another competitor offering products that come with poor support.
1. Offer a cheap (free) service
2. Gain significant market penetration
3. Provide little to no customer service
4. Raise prices
5. Let the product/services degrade until significant public outcry or regulator involvement
- Give old users approaching or exceeding limit with an additional 10% or 20% buffer.
- Ensure to notify all users about the change
- Update relevant support documentation.
Clear communication is crucial in addressing new storage limitation. For example, organizations might be archiving all their emails in a single Google Drive for legal discovery, AI training, or similar purposes. These companies may quickly reach the 5 million limit, but the situation can be resolved by dividing the content across multiple Google Drives or utilizing alternative storage solutions like Amazon S3 or Dropbox. No big deal.
The crux of the matter lies in maintaining clear communication.
Google does this all the time, anyone who uses GCP, particularly BigQuery, knows all too well about stealth changes that break things.
Is it? On this desktop I have 1.5 million files, "df -i" says I've used 1.1 million inodes and I have 61 million inodes free.
The 5 million file limit on Google Drive seems to be excessively low.
Note to Google, consider re-imagining your PhD's and MBA's as food service personnel.
EDIT: s/files/inodes
You have 400,000 files with 2 hard links? Or 100,000 files with 5 hard links each?
How? Why?
They're fully aware of how bad it is. They do not care. The Ad gravy train is still here, and they'll milk it until they can't anymore and then retire to their multiple homes (or go off and do it again to some other company).
I don't know why people expect leadership to do anything about a shitty company. They have literally no incentive to change. They'll always be hired somewhere else. Business as usual.
https://digital.sandiego.edu/cgi/viewcontent.cgi?article=316...
Essentially, Intel employees are 'chattle', and while communicating with them doesn't transfer ownership or control of them away from Intel, it does interfere with Intel's ongoing use of them.
I didn't see any reference there about treating people in a customer relationship as the "chattel" referred to in this law.
I started a company in 2019 -> didn't choose G Suite, because I don't trust Google. Same for GCP.
For ordinary users, this would pretty much equate to an unlimited account. Much like ATT's old unlimited bandwidth accounts. It's not until power users start hitting these ceilings to realize unlimited doesn't mean what they think it means. I don't know if Googs every used that kind of phrasing or not, but you can see where some dev never thought that limit would be achievable so there's no reason to really mention it.
It’s not a recognition problem, there’s no “aha” moment that will come from the right person just explaining it to them.
They simply don’t care, and have no incentive to ever change.
Imagine if a hard drive had a limit on the number of files, not just the size. Would that be reasonable.
Or imagine if someone designed a storage protocol and submitted it to IETF with such a limitation saying “it doesn’t affect the vast majority of users so we set this arbitrary limit to make our lives easier.” It would not make it through review.
But even if you had a 20 TB hard drive, and you wanted a lot of files smaller than 5120 bytes, you can just create multiple partitions. Maybe a little inconvenient, but you can still use your drive to the fullest. Not so much with Google’s 5 million limit (for a 2 TB Google Drive, the minimum average file size is around 430 KB [2])
[0] https://www.reddit.com/r/google/comments/123fjx8/google_has_...
[1] https://www.wolframalpha.com/input?i=%282%2F2%5E32%29+TiB+in...
[2] https://www.wolframalpha.com/input?i=%282%2F5000000%29+TiB+i...
Yes and no. This works differently from one filesystem to another, but search for "running out of inodes" to find many ways this can happen in reality, way below the technical limit of the filesystem itself. You can end up in situations where you can't create a new file and need to remove more than one to solve the situation. (Or even where removing many files doesn't seem to make a difference)
They've basically gone and set it up like "largefile" mode, 1 inode per 1MB, forced upon all users. That's a bad default and an even worse mandate.
NPM developer?
<ducks>
Using the cloud leaves you exposed to the whims of the cloud provider. They select and switch technologies based on their needs instead of yours.
Article says 5 million? Still a big number, but smaller than 5 billion.
(i.e. I can see now what I think you meant, but "5 billion" is a much closer match to "5 million" than 2^32)
So, use another Google account? If someone's workflow depends on one Google account, sounds like they need to understand the concept of sharding.
“ a safeguard to prevent misuse of our system in a way that might impact the stability and safety of the system."
Google: We have identified modern web development as a threat to our systems, and have taken measures to ensure npm users cannot store their npm_modules directories on GoogleDrive. Please consider rewriting your nodejs projects in Go.
That's a bit unfair on Node and NPM. The modules directory rarely grows to more than 4 million files.
The article says the limit is 5 million files, so... 2 NPM projects? That's not really better...
It also doesn't help that there's no way to exclude a folder with wildcards so you can't blacklist all node_modules folders.
Same way that Dropbox works
So they are not manually choosing what to sync, they just put all of the work and other files they want to automatically sync inside of that folder. This ends up including things like node modules and things like that
I’m on a Mac and find that google drive’s Mac client is so unreliable as to be useless. It frequently crashes mysteriously, and even when running it doesn’t sync reliably. It feels like nobody is working on the software, and after all the layoffs that could well be true.
Say what you want about Dropbox, but I’ve never had a problem with it.
True.
I used Dropbox for many years without issue and moved because of the cost and the issue a few years back with not allowing it on certain Linux file systems.
Went to Google Drive and performance was shocking in comparison. At the time (don't know about now) it also felt like they had a deliberate policy to avoid showing cumulative file/folder sizes, meaning it was easier to pay for more storage than to find out what took the space and clean it up.
Since then I've spent a few years using PCloud, which was almost as fast as DropBox and more reliable at syncing than Google. Then I lost the ability to connect the client on a Mac for a couple of days, had a deeper look, and noticed a few files had had silent collisions between changes on Mac and Windows (not at the same time) leaving multiple versions (names suffixed with the machine they came from).
So I left there and went to OneDrive. To be honest I've found it okay other than one issue on a Mac where if I delete a file for some reason then when it vanishes to the recycle bin it reappears in the OneDrive root on the laptop. delete-immediately gets around that.
I'm now at the point where I use OneDrive for convenience/reliability but despite the cost have been considering a return to DropBox as cloud syncing seems to be impossible for anyone else to do properly. Is it still as reliable as it used to be?
But much better than google which just silently fails.
As for corruption or overwriting files, I have rarely encountered issues, but Drive has file revision histories as well as a Trash can, so if you notice the problem and it's not spread over 5M files, you can roll back to last-known-good versions.
Implementation-wise under the hood is it really much different?
In my simplistic, probably overly naive idea it seems that a “network drive” built on top of Google Drive would still cache the files in memory, and improper shutdown or loss of network access could still leave your files in an inconsistent state. Probably they have a lot of advanced logic on top, that could roll back partial updates, or keep track of things so that the updates are not applied before sync is complete.
But anyways from this point of view sync vs network mounted drive feels like sync is a kind of network mounted drive that flushes to local disk in addition to working with the remote.
One of these days I want to build such things from scratch myself so that I can experience first-hand the horrors that lie underneath sync / network disk service setups modelled on my idea of Dropbox / Google Drive etc
Now in your risk assessment the question of course is how likely it is that you lose the copy alone your dev system, CI system and productive version at the same time that npm loses it, but the first three happening isn't as unlikely as it seems: You work on a major refactoring and while deploying you run into a major problem and want to roll back, then artifacts from the old version might be gone. Of course there are ways to prevent that. But that's the risk assessment and prevention you have to make. A full backup including node_modules is one way of dealing with that.
I use both git and Dropbox, because I have multiple devices to sync: laptops, a desktop, and linux servers.
If you use a desktop and a laptop back and forth, git is quite inconvenient. You don't need to commit and push from the desktop to switch to the laptop with a folder level realtime sync solution.
Also, git only protects files that are staged, but Dropbox can restore any files to any point of file event (not periodic snapshot, but it keeps all file events). So combining both you are safe from any kinds of mistakes.
I use Dropbox because Google Drive sync wasn't stable enough for this kind of workflow but I am not sure if it has been improved lately.
Because git combined with Dropbox and the likes can be slow, lead to broken repositories, mad sync conflicts and so on? At least that is what I've witnessed with colleagues. This was years ago though, and I assume you dont have these issues or maybe a different workflow, so this is more like a word of warning for others: test this properly first before just diving into it.
Anyway, YMMV.
Are you sure those people were using their own personal dropbox folders? If they were sharing with each other that's a different thing entirely and requires a lot more care to avoid problems.
Yes, one single account.
while it's syncing and then start editing the same things on a different computer
IIRC this was one of the issues. Not necessarily because of shutting down, but just when using multiple computers one after the other and Dropbox not having synced in between. Not that niche depending on line of work, think lab-like setups where one thing gets deployed to multiple machines and deployment then gets tested. In one company the 'oh but you have to wait for dropbox to sync before doing that' and 'how?' responded by 'urm, yeah, check the timestamp on all files in that directory' was a rather common thing, often leading to stupid issues. Put git in the picture there and things won't get any better.
And even then it's just something like having disagreement about an index file. Easy to resolve.
Though outside of garbage collecting, git doesn't have many files that actually get changed. Mainly the ref pointers which are easy to resolve and the logs which aren't very important.
A git repo in Dropbox can appear to work okay with a single contributor where that contributor is only working on one computer at a time. But as soon as you add a second contributor, you very much should take Dropbox out of the process.
“Hey, why can’t I get Dropbox to work on this laptop? I need it so can do emergency bug fixes from my personal laptop at home!” “Yeah, nah. Remember that security policy you signed as part of your employment contract? The one that says exfiltrating company or client data or code to non corporate systems is a fireable offence?”
I'm reminded that Lastpass got popped via an employee running a well out of date version of Plex with known RCE exploits and getting a keylogger installed:
“This was accomplished by targeting the DevOps engineer’s home computer and exploiting a vulnerable third-party media software package, which enabled remote code execution capability and allowed the threat actor to implant keylogger malware,” LastPass officials wrote. “The threat actor was able to capture the employee’s master password as it was entered, after the employee authenticated with MFA, and gain access to the DevOps engineer’s LastPass corporate vault.”
The hacked DevOps engineer was one of only four LastPass employees with access to the corporate vault.
I mean, I know exactly why. But why the fuck was the corporate Lasspass vault available to a staff member's home machine running Plex? Is the expense of a corporate vpn-locked laptop and a pair of Yubi keys too much for a fucking _Password Vault_ senior developer??? "Should we spend a couple of grand buying a work laptop for this guy who literally has the keys to our kingdom? Or do we save a few bucks and get him to work on the machine he already owns, the same one he runs outdated media server software and probably bittorrents all his porn with? What could possibly go wrong?"
Some of my projects aren't set up this way, and it can get annoying to have made a few small changes and be forced to commit them in order to pull them from upstairs.
I’m not entirely happy with the idea of living in a world conquered by the lowest common denominator solutions.
I have modest storage needs in my role, but imagine a developer, Big Data user, or video editor used it. As long as they could tolerate the latencies, they might stash quite a bit in there.
https://support.tresorit.com/hc/en-us/articles/217103697-Exc...
With some C++ libraries or node_modules that can run up fast.
You can't do everything at once and git is counter-productive on day one, and the most important thing was they find programming remotely interesting at all.
The next most important thing is always working on a new copy.
How exactly you make those new copies barely matters at all compared to just having them at all.
Using something arcane and unforgiving and inscrutable and complex like git to accomplish those copies is way way down on the list.
The new student is barely interested in the actual coding to make a game if you're lucky. They would rather mow lawns than learn administriva like git.
It could even be argued as irresponsible to hand git to babies and expect everything to be fine. Just because we all have about 5 git commands we use 99.9% of the time, and have no problem 99% of the time because we learned how to "hold it right" and just always carefully do things in the right order, does not mean git can actually be made safe and simple. Those 5 or so commands aren't actually enough.
The hard thing about source control is not the how — using the git CLI is only slightly more complicated than copying a folder. Git turns that into a very straightforward directed graph, and the CLI gives you a few commands to move around the graph, sprinkling some more edges and nodes wherever you like. It takes about an afternoon to figure out once you're onboard with the why.
The why of source control is the thing — and our industry is already full of people who don't understand it. That's how we get people who treat git as a "save" mechanism to be invoked whenever it's been a while since the last commit, or branches with one sloppy commit message after another [1], or entire repos that are just spaghettified carelessly-merged hairballs of commits.
Mastering the why of source control means learning when to cut a commit, what shape that commit should be, and why that is.
And I think that's more my objection to the teaching approach outlined here — without any other context, it reads to me like it's not quite focusing on the right things. At worst, it's inducting the student into the industry-standard cargo-cult approach to source control. [2] If the why isn't learned, then it doesn't really matter whether the specific motion is copying folders or typing in git CLI commands.
Ideally I think source-control techniques could be introduced right when they're going to be interesting or fun — when it's time to build something together with someone else. Then that set of lessons could start by copying folders and eventually building up to git — and each lesson could show how good, disciplined use of source control concretely improves the collaboration process.
[1]: https://tbaggery.com/2008/04/19/a-note-about-git-commit-mess...
[2]: https://xkcd.com/1597/
happy path
actually, a good way to think of it is releases. even with git, i keep a copy of each project release, that itself is not stored in git.
I haven’t used cli git since uni. GUI is better for basically everything except for a few rare and advanced commands.
Jeremy Howard (of Fastai fame) has a good analogy that we do not make people sit in a classroom for a semester to learn all of the theory of baseball. Instead, we give them a bat and a ball, and layer on the instructions of how the game works. You can get a good approximation of the game with just a couple minutes of instruction and refine the understanding from there.
i don't expect a fresh graduate to be a skilled git user but if the nature of development at work is substantially different from what the student learned at school, then they will have a much harder time adjusting.
they should at least be familiar with all the concepts that they will encounter in a junior job and not be faced with having to relearn a completely different development process.
What I want to do is take one step at a time, and build on what they already know.
Then when they can program, you can show them how to use Git and the superpowers that comes with it.
FWIW, my thoughts are basically that it's a question of sequencing the lessons and framing the overall purpose: https://news.ycombinator.com/item?id=35411405
The analogy was imagine if we taught music that way, where you spent years learning fundamentals and theory before ever being allowed to touch an instrument.
$ find .vim/plugged/coc.nvim/node_modules/ | wc
In this thread, the prevailing thought seems to be that having a 5M file limit is unreasonable and adding it without disclosing it is egregious.
Just a curious thing I noticed.
[1]: https://news.ycombinator.com/item?id=35329135
Shows the power of framing -- focusing on different components of the same situation and facts can often lead to entirely opposite opinions or conclusions.
"5M item limit for Google Drive: File unable to generate or upload due to 503" (mostly passive, it's a user problem) vs "Google Drive does a surprise rollout of file limits, locking out some users" (Evil Google secretly screwing customers over)
I’m definitely more likely to reply when I disagree than when I agree.
I do try to make a habit of giving positive replies also (and not just building on what was said before, as I’m doing here, nor just upvote silently), as constant contrarianism is… kinda dull, especially from the outside.
If your cloud storage account can’t gracefully handle that and keep that handling totally transparent to you you should move on.
I’m sure there must be some provider out there that can handle that uninteresting workload …
Things I store that have lots of files:
- The frames for my Timelapse videos = 400,000 files
- The files in my Eagle app photo database = 400,000 files
- Other image files, my programming repositories, documents, music, stable diffusion Deforum frames = 400,000 files
80% of these files I've accumulated in the last 12 months and can see myself easily hitting this 5,000,000 file limit well before I run out of TB's
So now I know I will never be able to use all the space I'm paying for, I'm going to stop uploading my files and instead search for a proper backup service, something I should of researched in the first place.
Anyone here have any recommendations for a backup service?
https://backblaze.com
https://jottacloud.com
[1]: https://dropbox.com/backup
This is europe, maybe from america it gets a bit slow though.
I do recommend it for other storage though: snapshots, SMB means you can mount it directly on Windows (and iOS!), cheap, and I personally like Hetzner.
Their Storage Share service (running on Nextcloud) is also good if you want managed syncing, a-la Dropbox. Same server as a Storage Box but with managed Nextcloud on top.
Any gov or military contract data usually has specific requirements stating data storage requirements.
There are other data laws in other countries that are not USA gov that explicitly talk about customers data storage and it's handling.
Two or more NASes, rotated out on a regular basis to satisfy the 3-2-1 Backup Rule.
Having said that, I feel 5 million files is too low for a lot of people.
We have a HN discount and we’ll get you up and running this weekend.
You can even direct transfer from google drive to your new account here - no reason to use your own bandwidth.
When backup excluded some files, they blamed antivirus software and recommended to go without.
They actively deleted their only backup because the client couldn't read the original files (which was due to disk error).
They admitted that there isn't any guarantee or even effort to make the backup consistent - after asking me to explain the concept.
The only reason I still have my data is because of the very expensive cleanroom disk rescue that they of course refused to pay for - because why try to do what you can to compensate for failing at your only job?
However, when referring to Backblaze, I think most people here refer to their nice (and cheap) S3-like cloud storage solution, which works perfectly with the likes of restic, rclone and friends. That's probably what you should use if you care about control.
Would not choose Backblaze if I was choosing today.
It's quite a pain in the ass to import, color grade, deflicker etc. So one would usually wait till there are a few projects are taken, and process them all at once. If this is your hobby, you can easily hit 5000000 photos in a couple of years. Especially if you're doing something like an ultra long time lapse of say, the construction of a building using two cameras.
If the question is "why don't you delete the photos after creating the time lapse video?" It's called photo hoarding haha.
Anyway that's irrelevant, because if I paid for a certain storage size to backup my hoarding habits, I expect to be able to use the capacity, and not have some random ass limits.
They almost certainly did contact those specific customers, but are sending a warning out publicly for folks who would be a problem so they go somewhere else/don’t do it to begin with.
It’s the ‘police press release’ method of community moderation, like announcing a speed trap.
Cloud services like G Drive depend on oversubscription anyway. If every customer hit the limit, it would be a problem for them.
The fact that threads like these even exist indicate that Google did not, in fact, tell their users:
- https://issuetracker.google.com/issues/268606830?pli=1
- https://forum.rclone.org/t/new-limit-unlocked-on-google-driv...
Banning the letter ö from names on hacker news would affect a vanishingly small amount of users. But if threads needed 100 times more processing and were much slower to load when a user with a ö posted, the problem itself is quite large.
Without knowing the full circumstances of why they're doing this, what problems it solves you can't really say if it's a good solution or not. We can however absolutely complain about the "no notice, no advice, no assistance" way it was implemented.
Well, yeah, I imagine they’re moving elsewhere.
Seriously though, do people actually trust them not to randomly intentionally break stuff at this point?
Just look up the number of issues developers have with the Play Store.
I can’t comment on that but I do know that modern, encrypted, archive tools such as duplicity and borg and restic “chunk” your source files into thousands (potentially millions) of little files.
We see tens of billions of files on our zpools and have “normal” customers breaking 1B … and the cause is typically duplicity or borg.
What I mean is, they just have a plain old rsync.net account and pay no premium nor incur any penalty for using all of those inodes.
You are correct: the average customer does not use billions of inodes.
In my experience, GDrive is a piece of crap with a lot of weird behaviors and easy ways to lose your data if you sync your computer with it.
The worse here, as said by multiple persons, is not to have a limit. A limit on their service is fair. It is that this limit is undocumented, and that their key selling point is to shout everywhere that if you pay you will have "unlimited" storage. And that it will scale more easily than using your own "not cloud" backups.
https://support.google.com/a/answer/6034782?hl=en Unlimited storage With G Suite Business, each user in your organization can store unlimited Gmail messages, Google Photos, and files in Drive.
And I did not search for example, but I'm quite sure that they also advertised google one like that. Something like no need to keep files in your phone anymore ever, unlimited storage with the cloud.
They provide exactly the product you want, it just isn't Drive.
GDrive does have API rate limiting which either way is going to make it slow/useless for REALLY large data storage.
Real Engineering involves developing forward looking designs and maintaining backwards compatibility. It involves a release schedule. It involves communication channels and releases notes. It’s hard. It’s unsexy.
Google treats their product lineup with the seriousness of a social media platform. They don’t care about your puny business; even if it means the world to you, it means nothing to them.
Almost always, yes.
I don't know the serverside implementation of Google Drive but imagine the files on your Drive correspond to files on something like an ext4 filesystem. In this scenario, each file has a cost (eg in the inode table) and there is wastage depending on what your block size is. Whatever the case, Drive seems to treat files as first-class objects.
Compare this to something like Git. In Git (and other DVCSs like Mercurial), you need to track changes between files so the base unit is not files, it's the repo itself. It maps to your local filesystem as files but Git is really tracking the repo as a whole.
So if you were designing Google Drive, you could seamlessly detect a directory full of small files and track that directory as one "unit" if you really wanted to. That would be the way you make the product dictate the design.
I have to agree with Google in this case, 5 million files for an account is borderline abuse for object storage system.
The problem with google is if they fuckup their service they make it the customers problem. Other places if they fuckup, its more viewed as a one way door. You can sunset old products with (in this case, unlimited files), but you never put in a new restriction.
If you're on the $10 2TB plan and your files average 100KB, 5 million files means you can only use a quarter of the space you're paying for.
And before you call that unrealistic, my system drive averages 200KB per file and my main personal data drive is close to 300KB per file. Both would hit the limit if I wasn't using something fancy to pack files together.
And then there's the $18 5TB plan with the same limit on file count. Even completely ignoring the option to pay for extra terabytes.
I get that this could be frustrating (being surprised specifically), but it seems a pretty reasonable restriction so it seems odd to criticize it like this.
Well, you could create multiple Google Accounts and create all the files in the same Drive with sharing enabled, because the limit isn’t on what is in one user’s Drive, but what one user can create across all Drives.
That can be extremely disruptive especially for small businesses, who would be the most likely to depend on a Drive-based workflow.
Like my dad has 300+ unread emails with who knows how many gigs of attachments.
For about a decade, Dropbox was the only one to offer delta sync, OneDrive added that a few years ago, and they’re the only two IIRC that do delta syncing. The rest basically redownload & reupload.
Microsoft software is much worse than many competitors but it's documented, the behavior doesn't change suddenly, and it's backwards compatible.
Here is a thread discussing it on the rclone forum:
https://forum.rclone.org/t/new-limit-unlocked-on-google-driv...
It would be nice to have official confirmation of the limit rather than relying on speculation.
I'll never understand how such a large organisation can let this kind of stuff happen.
I believe Google Drive for Workspace always have a file count limit, and IIRC it's as low as 500k or something, despite having "unlimited" capacity.
To be totally fair to Google, I know this precisely because there are communities of data hoarders that actively abuse various cloud storages. In Google Drive's case, They have ways to create "free" Google Workspace accounts via registration exploitation from various institutions. People use them to store PB-level data.
(For the interested, there are also ways to apply free MS developer accounts that are supposed to expire in 3 months but can be re-refresh indefinitely. This comes with 5TB "free" cloud storage x 5 (10?) separate sub-accounts.)
And they usually have multiple backups.
https://www.sqlite.org/sqlar.html
AppImage (internally is either a ISO 9660 Rock Ridge or a SquashFS filesystem), .deb (internally ar archive), .rpm (internally a cpio archive) are I think relevant examples too.
Yay.
I work at a company with many more than 5 million files overall where this has not been an issue.
https://github.com/awesome-selfhosted/awesome-selfhosted
I wonder what vanishingly small is. 0.001% of a million is still thousands.
So yeah, S3 for the win.