Now that I'm working on my own company, I'm curious about alternatives to AWS.
I'm keeping things simple, so I've got mostly Go services, pg + caching, and a svelte webapp. I deployed my Go services on a low-ish end bare metal provider, and for now it is fine. Deployments are triggered via scripts, and so far so good. Is it sexy, using all the latest and greatest tech? No, its just simple shell scripts. But it works?
I also benchmarked each endpoint with tens of millions of records (not a whole lot but still) and I'm seeing a pretty good latency to throughput ratio. In fact, the performance is better than what we got during peak at work, and this setup is costing me tens of dollars a month.
That makes me think if I even ever really need AWS. If I ever need to do multi region, I can just spin up a new machine there. CDN covers all static content.
Am I wrong to think that I could probably scale like crazy and avoid AWS completely with my stack? Why should I pay hundreds/thousands per month plus a premium for bandwidth? I'm also enjoying staying sane avoiding IAM.
Don't change a thing. This is perfect.
> Am I wrong to think that I could probably scale like crazy and avoid AWS completely with my stack? Why should I pay hundreds/thousands per month plus a premium for bandwidth? I'm also enjoying staying sane avoiding IAM.
You're not wrong at all. Check out the hardware stack for Stack Overflow as of 2016: https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...
Don't over think it. Focus on the software and its features. Focus on getting users and ramping up your MRR.
Good luck.
Don't be afraid to add CDN/Caching when its appropriate.
You can always buy a beefier instance, or a dedicated box.
Deal with the scale later when its truely needed. Give yourself permission, or plan to hire a team, to handle the scale issues later.
Being able to automatically build production from a recent snapshot is a profoundly important point.
When we bootstrapped on the cheap we would always maintain a production infrastructure at one company and a test infrastructure at another with a relatively low TTL dns infrastructure hosted at a third party like dnsmadeeasy. Backups would be pushed to the second site. We then would “build our demo” from the backup data (stripping PII at runtime). But the demo served as a second clone production infrastructure. If the primary went down, it was always a quick restore and scale up at the secondary location with recent backups!
Worked really well, but you needed to make sure it was automated or it would blow up.
Definitely cheap and works like a charm.
I’m sure that’s over complicated for a lot of people, but it’s really just a few scripts and a couple of days to set up, and you can be up and rolling with a working system in the time it takes to change DNS.
I would also point out that StackOverflow isn't an incredibly good performer compared to a number of other websites. Check it out on PageSpeed Insights. I also used a couple other tools that suggest that it loads slower in some regions than others.
I can get 60k IOPS on a nanode at $5/mo rate. You actually get a random amount between 20k-60k depending on the type of machine they provision your nanode on.
You want 60k IOPS on AWS? Be prepared to pay like 4-5k/mo. Want multiple systems with it so you can cluster a reasonably performance database? Pay for each and you need to special request the ability to go past 100k IOPS in a region.
Want the privilege of a dedicated vCPU? It's $1500/mo per region just to turn on the feature.
I tried to use AWS just as you mention here and the cost was 100-1000x what it would cost me at akamai cloud or Hetzner. All these fees I was unfamiliar with popped out and the cost was crazy high.
The system I tried to provision was functionally similar to the bottom end dedicated akamai cloud offering and it was going to cost 15-20k/mo instead of $72. Over 200x. And that was just the hourly provisioned rate, not including egress and other bits...
You couple to AWS with your whole system architecture or you stay away.
It's a real problem if you build on a VPS as your base. I was asked to prep such a system for scale on AWS and choked on the pricing. So we settled for what they offer at reasonable pricing and if further growth is needed, will need to hit eject on AWS and just go elsewhere.
I recommend starting elsewhere and save the migration of you just use EC2.
Most places will let you scale a VPS quite well and there is a logical handoff to your own hardware where you can scale to a system with 512 vCPU and TBs of ram for less than Amazon is charging for step two.
But the base configuration, for the money, is pretty sad compared to what you get else where. Scaling up is horrendously expensive compared to other providers.
Why would you bother with AWS for this I don't even know. There is no draw to AWS if you are only using EC2. You couple your architecture to their specialized services and go all in and eat the platform risk, and long term cost.
Or... don't even bother with AWS. It makes no sense to use just EC2.
Yes, of course. Every cloud provider also does PaaS, but often it is also not the most cost effective option - often it is the most expensive one!
The cheapest option is often the option where toy get reliant on cloud-specific sdks, serverless, etc.
A few years ago, I was the VPoE for a small-ish startup in Denmark. I was the only ops person, and I was able to provide 100% uptime for a year, including a migration from k8s clusters, AWS AZ outages and whatnot. During that time, I also reduced our monthly spend from $15k to $5k. This was a fully auto-scaling, redundant and highly available system.
When I started, every night I got woken up by alerts (working alongside the CEO and another engineer to fix things). By the time I had stabilised things, the only thing waking us up were our providers having outages.
I used to run a similar business earlier, without AWS. We owned our own hardware and had ops people going to data centres. I can guarantee you we did not spend less than $12-13k/month (AWS fee + my salary) in that company. Think closer to $100-150k/month.
AWS can be cheaper when you factor in the cost of employees. It can also be stupidly expensive when you use it the wrong way.
Another example: I host a small service that gets very seldom, seldom use. Maybe 20-50 people discover it and use it per month. I have the backend running on a Lambda, and it costs me about $0.5 per month. It took me 20 minutes to write the CloudFormation for that and push it through my CI pipeline to have it deployed. There is no way I could get cheaper hosting, uptime/availability, and faster time to market than with a Lambda.
If and when that service becomes more used where it warrants running full time, I’ll rewrite the request handling and throw it in my k8s cluster. But until then, I do believe this is the cheapest solution (for me).
I believe you, but most people don't do this, sadly. You very clearly _get_ Cloud, but most don't, and financial optimisations often come much later down the line.
That said, it was a long time ago, and I'm sure competition and prices have changed dramatically.
It’s not actually required. And I don’t mean the completely unrealistic “if you run stuff on your private network that never needs to connect to the internet, then you don’t need a NAT gateway!”. I just mean you can implement routing to the internet in cheaper ways.
If you don’t need 100Gbps highly available outbound traffic routing like a NAT gateway provides… you run a t3.nano instance doing forwarding and NAT that will sustain 30Mbps and burst to 5Gbps for $3.80/mo. Just on raw bandwidth, unless you need more than a t3.large can push (~500Mbps continuous, 5Gbps burst), it’s cheaper to run an EC2 instance as your gateway rather than use a NAT gateway.
And yet Stack Overflow has such strong SEO that outright cloning their content on a malware page will get you top 10 results.
I think people seriously overestimate the value of Web Core Vitals. At the end of the day a website is usually trying to deliver some value to a visitor: if you deliver more value by rejecting complexity, but sacrifice some speed to do it: users will still value the end result, and search engines will reward you.
Depending on your current provider, there may be advantages moving your existing bare metal server to EC2. For example, what happens if the drive on your server fails? You'll probably need to write a backup script and write documentation on how to restore from it. If that happened to an EC2, AWS would just restart your instance on a new host and your EBS volume would come with it, and automated snapshots can be set up with a single click. Security groups are simpler to set up than configuring a Linux firewall. Lastly, EC2 has support for automated horizontal scaling with on-demand pricing and Spot instances. But none of these are a must-have, they are conveniences, and you pay a premium cost for it. Hetzner has servers with 64 GB memory for €37 per month, and a similarly-specced EC2 will easily be 10x that cost.
I actually tried this, it's closer to 200-300x. The number of premium add on fees that appear as you build a higher performing EC2 server is just stupid.
It's not so bad if you scale out horizontally instead of vertically. But that complicates the architecture when other providers don't charge crazy pricing for vertical scaling.
Max IOPS without paying for provisioned IOPS is 3000 (gp3 stock setting). Much lower than other providers and will cap your database performance. That's combined too, 1500 read and 1500 write. Throughput is 125mb/s.
Hetzner gives you 45k if memory serves, vultr gives you a ton, akamai cloud gives you at least 40k and some instances give you 125k. With throughput in the >3GB/s range.
Provisioned IOPS cost thousands per month to reach these levels via io2 volume. Io2 is capped at 45k if memory serves unless you have a bare metal instance. Which is a whole new pricing tier I didn't investigate.
I believe it is cheaper per IOP to go wide with gp3, complete with additional compute, than to provision io2. Which really kills the whole "go tall not wide" architecture strategy.
Big price bump with dedicated vCPU encourages wide rather than tall also. You can buy a ton of compute for that $2/hr fee.
Moral of the story. Don't try to go tall on Amazon.
1: When you use a Dedicated Host, you won't see a "fee" on your bill, but you are charged more. As an example, in us-east-1, an m5 Dedicated Host costs 10% more than a m5.metal AKA m5.24xlarge. It's about $0.40/hr more expensive. You'd think it would get cheaper since you're buying in bulk, but I can understand this actually maybe is not preferable for AWS because it is difficult to find a completely unused server on demand.
Yeah... Complicated...
What do you do for disk IOPS? Seems like they make you sip through a straw unless you pay them big bucks.
And from my own experience, basically what all cloud providers are relying on is ops being lazy.
Set up some decent hardware in a good data center, set up a good platform on top of it, automate the hell out of it and with all current open source software you can run everything by yourself.
At least, that's something I am currently actively researching because our cloud bills are going through the proverbial roof and I rather would spend that money on our own stuff, than Jeff's next house.
This makes no sense to me, I've worked jobs where the app was hosted entirely on perm and jobs where it's hosted in AWS. By far the AWS shops required more work. There's so many more limitations, random footguns, arbitrary rate limits, capacity issues, having to do "cost optimization" because it happens that what is simple and maintainable is expensive. Once you have a setup you like on-prem it will hum along until you decide to break it.
S3 is the one of the 7 technical wonders of the world but I would rather run Postgres/Redis/Kafka/RabbitMQ on my own hw than the AWS managed service given the choice.
That's what I mean with that.
Funny thing is that you can easily do that on your own hardware using a solution such as KubeDB just as well these days, after you get through the pain of getting Kubernetes on bare metal operational.
This sticks out to me. IAM isn't crazy, and having access controls in place is going to be the very first thing you want to do when you bring someone else on board. Maybe you're not there yet, but it would be a wise time investment to understand how these work, to the point where they don't feel confusing.
If they're not doing that, and their 1 server is talking to AWS resources, it means its using superadmin credentials. If that 1 server is compromised, can you see why that would be a bad idea?
It might be enough for now, but if you grow big enough to be a target it's very likely your home-spun lock won't stand up to professionals.
If it's only ever going to be a shed, there's no need for the infrastructure to support that many keys.
Most things have accounts (database, servers). Are you using separate ones or a global admin?
Any solution that you build is going to be more complicated and less secure than IAM. In IAM, your workload/server can have an identity. The software running on the server is issued temporary credentials as needed, and only has access to resources linked to the role. How do you do this without identity and access management? Roll your own because IAM is too crazy?
Also, be careful with b) since you still need people that know how to do stuff, otherwise, you will experience either data breach or long downtimes.
What most people won't admit when recommending cloud is that, yes, cloud has all sorts of utility built into, like data replication, backups, scaling, etc. but things go wrong in cloud as well and if you don't have good knowledge of how things work, you will be in for a wild ride.
IMO, it is much better to learn setup things yourself, even in non-optimized way and have some knowledge on what to do when stuff begins to break.
Cloud is for DECLARE the setup, if you don't care or you don't know then avoid cloud.
On-premise you can find always dodgy scripts with no copy, no backups, some people praying everyday if the service restart he'll quit, or something working but you don't know why and the worst one, everyone afraid of update anything
Couldn't it be the other way? On lower scale saving significant amounts of time just to save a few hundreds or thousands might not be worth it. OTH if you're paying millions to AWS hiring an extra person or two to save 20-50% (or whatever) might be a very good deal.
Remember to consider more annoying points that can often become hidden costs, or hidden risks: - backups (and more importantly, restores :) ) - maintenance and security (you are in charge of the lower levels too) - decent authentication, authorization, and accounting (AAA) - SLA, both 'in theory' (what does the contract with your provider say) and 'in practice' (how much would an interruption could cost you and what happens if the provider does not respect SLA at all) - how do you handle outages (ideally you want to have a plan to know what to do when the obvious things go wrong. Then for the non-obvious things) - how do you transfer knowledge to a potential new employee. Is there only one person that know how something works. (called 'bus factor')
And yes, most of those also apply if you use AWS, but typically they would cover different parts or have different risks. And usually some associated product you can pay to manage that risk
The appeal of AWS' "insane" bills are that they are not so "insane" when compared to the salaries required to maintain a home grown stack in-house, especially as complexity increases.
It's not just application complexity, it's organizational complexity. Doing everything yourself is fine until that complexity increases and you start finding out that all your dozens or hundreds of expensive employees are spending a lot of time on home grown overhead.
For example, when you use shell scripts for your deploys, you can't as easily prove things to auditors like you can with a CI/CD system that's tied in to role based access control. That sort of thing will be dealbreaker for customers who expect your company to maintain industry compliance standards.
When you use a product like RDS you can tell your customers that Amazon is the one responsible for handling backups, security patching, etc.
Also keep in mind that anyone with a significant amount of AWS infrastructure should be buying reservations and savings plans. The sticker price of AWS is not what corporations pay.
As you grow your company, I encourage you to focus on the activities that make your business profitable and deliver value to customers. Every hour you spend upgrading a database or patching an operating system is an hour you could have been spending on developing a unique feature in your product that nobody else has.
For now, I'm sure using this setup is much better than getting something too complex for your size, but that can change quickly if you're lucky enough to reach a larger scale.
Nothing described is that weird, nor should it have difficulty horizontally scaling when and if the time comes - add another node, and front it with a load balancer.
The problem is that DevOps has shifted so heavily to Dev that “I’m running services natively on a Linux box” is somehow seen as Byzantine and arcane.
> when you use shell scripts for deploys… can’t as easily prove things to auditors as you can with a CI/CD system
What do you think a CI/CD system is running? Also, if /var/log/auth.log isn’t enough, there are other auditing systems available that could make this more granular.
> industry compliance standards
IME, these are a joke, and auditors routinely miss a dizzying amount of glaring problems, because they don’t look beyond records that humans generated.
> every hour you spend upgrading a database or patching an OS…
Patching the OS should be automated. If it isn’t, that’s on you. Ansible isn’t hard to learn.
As to the database, if you can’t read docs (both MySQL and Postgres have excellent documentation on this procedure) and perform them as written, frankly you shouldn’t be dealing with RDS either. It’s not like AWS docs aren’t confusing, spread over a million pages, and sometimes contradictory.
Pardon the seething undertone, but as an ops-heavy SRE/DBRE, I’m very tired of seeing people lambast ops as being somehow beneath modern practices, not worthy of their time, or worse, not cost-effective. A well-written app can absolutely run just fine on a tiny server, and does not need the miasma of shit that is cloud-native. Computers are blindingly fast. Stop demanding infinitely-scaling vCPUs because optimization is hard. Stop pretending that your time is more valuable than minutiae like “my ORM is producing garbage queries resulting in hideous latency because I don’t understand SQL or schema design.”
If I throw my application on some PaaS thing and put the database on RDS, let’s say I can hire 5 developers who know almost nothing about infrastructure to develop and deploy the application. Each developer working on the application delivers valuable business logic. Let’s say each developer makes $200,000 total compensation and they bring in $500,000 in revenue. This setup makes me $1,500,000 in operating income per year.
Now you’re saying I should find someone who knows enough about infrastructure architecture who database administration and all the other bare metal Linux fundamental topics to manage it in house rather than paying AWS all this wasteful money.
So now I have 4 developers and 1 Ops/DBA person. My AWS bill was eliminated, saving $100,000 thanks to the infrastructure and database expert doing a wonderful job cost optimizing.
But now my 4 developers have one less person to deliver features that motivate customers to sign deals with us, so now we’re making $1,300,000 operating income. (Losing $300k from the loss of a developer and gaining $100k back from cost savings.
Obviously this is a made up scenario in my favor but that’s basically how it works. Businesses shopping for employees can’t get every skill they want in a job description. One of the most important parts of business strategy is making trade-offs.
This scenario reminds me of a landlord that I had who fixed a simple problem with my dishwasher by just replacing it with a new one. They didn’t give a shit that a little bit of extra knowledge and a spare part would have fixed it for no exaggeration 100x lower cost, because resolving the problem quickly with no risk was worth more to them. In any event, I was giving that landlord many times the cost of that dishwasher every month.
The compliance certifications are kind of like the real estate agent that has a big car payment on their Mercedes so that they can sell more expensive houses to wealthier clients. Yes, we all know that buying a Mercedes isn’t the most cost effective way to get from point A to B, but you can’t sell mansions if you show up in a Kia Rio.
I don’t think you _need_ a DBA/DBRE initially, but I do think if you’re in the range of hiring five devs, one of them should know about infrastructure.
The team of 5 sysadmins setup, installed, secured and managed 2000 such systems, mostly running SunOS 4.1.x , which didn't have nearly the automation that we have now on newer Unix based OSes.
Using only the bash shell and the many scripts they wrote for managing them. Guess the profitability of that kind of setup? 1 sysadmin per 400 systems... the kind of overhead people think exists, doesn't.
A small to medium business won't have the 500k to 1M per year pay the sysadmins salaries.
For them AWS or other managed services are the correct choice.
- Lower chance of fatal bugs/downtime - More reliable load balancing - Proven backup startegies - Not having to learn everything
I've spent 7 years at a "unicorn" and the first five years we used Digital Ocean VPSs for everything. My main takeaways are that the smaller providers don't have "real" load balancing and that you absolutely should not manage your own databases or logging/metrics, it's a pain in the ass unless you have a team for it. I spent countless hours on learning infrastructure instead of building the product. It worked out in the end, but we would have gotten there much quicker if we'd paid the premium.
If I did it again I'd still use VPSs, but with one of the big players and pay for dbs and observability.
For me, the strength of AWS is in scalability and reproducibility. I use AWS CDK to define infra as code. Which allows me to make changes with confidence. Test changes in other environments. Use CI/CD and pipelines even on small projects.
I only needed to figure this out once. Then all of it is reused across any project from first commit to deployed infra is matter of minutes. When I’m done I can tear it down in minutes.
If you’re concerned about security, spend some time learning SELinux. I’m assuming you’ve already done the reasonable defaults like public key required for ssh, no root login, fail2ban, etc.
If I can choose to not pay rent, I will choose to not pay rent.
A low complexity org with low complexity tech stack (both good things) can serve a ton users, and don't have much benefit from huge scale, elasticity and features of AWS.
In particular network transfer fees on AWS are ridiculous, IMO.
I have scripts but use Ansible to setup snapshots to spin new servers from. And Terraform to configure everything. The biggest downside as I see it, is that _if_ I had to scale up the stack there would be issues regarding auto-scaling and additional regions.
However, Im not worried about that and if it happens, I probably have money to move back AWS - at least in part.
So my advice is that if it works and the servers stay up - keep it. AWS is no silver bullet and moreover, if you have little to no revenue it's better to keep costs down. As a small company I think you get a little more lee-way also regarding server downtime.
You don't need AWS for that (although they do a good job convincing you otherwise).
AWS didn't invent anything. They just simplify (or make complex) systems in order to send you hefty bill.
For the rest, of course. AWS, GCP, Azure, all too expensive.
Chances are very high you won't even need more than 3 machines, if you'd like to play it safe, 1 if downtime is acceptable.
The challenge is long term. You need to be able to switch after EOL of your current OS version, and easily. Data is something that's expensive. It requires constant backups. Snapshots. If you can afford it, have a dedicated machine or cluster for data. This way you can swap the webapi layer when you receive better offers over the years (cheaper hardware, EOL OS, etc) Do think in blocks or components and layers, but I'm guessing you do that anyway.
What cloud does is it adds convenience for ridiculous costs.
Personally I have an incubation server where all trials and or developing projects are hosted. Once and if something takes off, it receives dedicated hardware. In the unlikely case that this dedicated setup isn't enough I'll start thinking about cloud. But even in that case, I'm pretty sure, a private cloud with dedicated personnel is cheaper than AWS etc
When you’re spending over $1 million per year on either of those then you’ll save a lot of money by moving to on-prem.
The ideal cloud lifecycle is to iterate quickly on the cloud while you find product market fit, then when you know what you need start moving to on-prem to save money.
I don’t think you’ll ever need to migrate to AWS.
Either you have to scale than you should have the revenue or VC for doing so or you don't need it.
Cloud provider is about were to spend your resources: do you want to build features which make money or do you want to manage your deploy scripts and build monitoring etc yourself.
But if you see yourself reimplementing AWS services on your own stack, you should fire up Excel and do the math which is more cost-effective.
Your business is delivering product to your customers, not building and maintaining the technology to do so.
But when the cost of you doing something (either direct or opportunity cost) becomes higher than that of using aws (or another cloud provider), you should use them.
Yes, you can get really really really far without needing to scale if you spend a lot of energy and effort on optimisation, but that's time you're not doing biz dev, or building out capability.
And even now you probably are currently spending a lot of energy on things that are useless outside of saving $ you'd pay to Amazon. As you grow, the cost:value of doing these things yourselfor even in-house changes.
If you're successful and scaling, it's often waaaaay easier and cheaper to throw $$ at a problem short term than getting engineers to actually prioritise and look at it.
As much as folks here might not want to hear it, throwing an army of mediocre (or, ideally, decent) developers at a problem and paying through the nose for managed infrastructure often winds up being a much (much!) cheaper way to arrive at a good outcome than a smaller number of more skilled engineers with a shoestring infrastructure budget.
Have you ever worked with mediocre developers? Have you ever been responsible for a built-out implementation of the sort you are talking about?
And, have you ever actually run intensive tasks on hardware where you owned the entirety of the system and could have visibility into the full OS? (Many VM providers oversubscribe their CPU and RAM, and, you have no visibility into memory bandwidth performance either.)
If you are worried about redundancy, get a second server at a different company with the server in a different part of the country, and back up to it every hour (incremental backups) with a full backup every night, or whatnot.
- how is the complexity & scale of your system going? Make projections
- identify potential breaking points in this
- figure out roughly how long it'd take to migrate to e.g. AWS
- the cost of migration (not only the diff between your current costs and AWS costs, but the opportunity costs all the projects you won't do while you're migrating)
- now you have your "buffer" figured out, as in, when you need to start acting.
Personally, I have used Heroku in production at more than one place, and plan on using it again.
You could also just use the wheel.
When I pay for cloud infrastructure myself, my first stop is Hetzner because they combine low prices with good service. I do see some complaints on HN occasionally about Hetzner, so do some research. I have never had problems myself with them.
I worked at Google for a while ten years ago, so out of nostalgia I sometimes use GCP but I watch my spend.
Though, I'd comment that you don't need to go all-in in AWS, you can use EC2 instances the same way as bare metal servers from Hetzner and get some cool benefits, if you need them (and ok with 10x+ extra costs) - easy backup/snapshots, migrations, better server access and management (I like having SSM to connect to the server with MFA).