I've contributed a few optimisations to some implementations in these benchmarks, but as I read the code of many other implementations (and some frameworks) I lost most of the trust I had in these benchmarks.
I knew that once a benchmark is famous, people start optimising for it or even gaming it, but I didn't realise how much it made the benchmarks meaningless. Some frameworks were just not production ready, or had shortcuts made just for a benchmark case. Some implementations were supposed to use a framework, but the code was skewed in an unrealistic way. And sometimes the algorithm was different (IIRC, some implementation converted the "multiple sql updates" requirements into a single complex update using CASE).
I would ignore the results for most cases, especially the emerging software, but at least the benchmarks suggested orders of magnitudes in a few cases. I.e. the speed of JSON serialization in different languages, or that PHP Laravel was more or less twice slower than PHP Symfony which could be twice slower than Rails.
I always liked these benchmarks, I've been following them since the earliest rounds.
One thing to note is how much things have improved over that time. Numbers that used to top the benchmarks would now be seen as "slow" compared to the top performers.
The other useful thing about these benchmarks is being able to easily justify the use of out of the box ASP.NET Core.
For many languages, the best performers are custom frameworks and presumably have trade-offs versus better known frameworks.
For C# the best performing framework (at least for "fortunes") is aspnet-core.
That side-steps a lot of conversations that might otherwise drag us into "Should we use framework X or Y" and waste time evaluating things.
Are the benchmarks gamed? Yes of course, the code might not even be recognisable as Asp.NET Core to me, but that doesn't really matter if I can use it as an authoritative source to fend off the "rewrite in go" crowd, and it doesn't matter that it is gamed, because the real-world load is many orders of magnitude less than these benchmarks demonstrate is possible.
I really liked these benchmarks, and would check in with them from time to time.
No benchmark is perfect, but these ones cover such a wide variety of different languages and frameworks, it's a good resource for getting a rough idea of the kind of performance that a given stack is capable of.
I don't know much about TechEmpower the company, it seems to be a small consultancy, maintaining this project probably takes non insignificant resources from them.
The end of the project seems kind of unceremonious, but they don't owe anything to anyone.
It's cool in a 'how much can you tune it' kind of way, but has little practical value. Most sites would be tickled with a 4 digit requests per second number, so does it matter if your chosen framework does 50k/sec or 3 million/sec? Not really.
I think the biggest problem was it just had too many entries, most of which seem tuned to cheating benchmarks. Would probably be more valuable just choosing the top 3 by popularity from the top 15 languages or so.
> too many entries, most of which seem tuned to cheating benchmarks
Even for entries that didn't cheat, the code was sometimes unidiomatic in the sense that "real programmers can write Fortran in any language".
This[0] article articulates the issue with by highlighting an ASP.NET implementation that was faster than more 'honest' Java/Go implementations primarily by not using ASP.NET features, skirting some philosophical line of what it means to use something.
For me, the more interesting discussion of whether a language/library is faster/leaner than another exists in actual idiomatic use. In some languages you are actively sweating over individual allocations; in some you're encouraged to allocate collections and immediately throw them away. Being highly concerned with memory and performance in the latter type of language happens, but is seldom the dominant approach in the larger ecosystem.
For anyone wondering, the ASP.NET Core benchmark applications appear to be largely the same.
However it also appears that as of the last benchmark (round 23), “aspnetcore“ has fallen to 35on the fortunes leaderboard. The code for that result, really just uses kestrel. It doesn’t even import any of the usual ASP.NET Core NuGet packages, just what’s provided by the web sdk. [0]
Sad to see this. I had so much fun implementing a http server (called httpbeast) from scratch to get as far up these benchmarks as possible.
I do agree with others here that it was possible to game them, but it still gave a good indication of the performance bracket a language was in (and you could check if interpreted languages were cheating via FFI pretty easily).
Indeed. It's weird they write so much with addressing the elephant.
So lets discuss it...
From the start I thought that the TechEmpower Benchmarks were testing all the metrics the JVM is good at, and non the JVM is bad at (mainly: memory usage, start-up time, container size). I got the idea back then than they were a JVM shop (could not confirm this on their current website).
Lately the JVM contenders are not longer at the top. And the benchmark contains many contenders with highly optimized implementations that do not reflect real life use.
well done to the techempower team for the work done.
though the benchmarks were not exactly 100% accurate - they gave good estimates on how different frameworks / perform in handling web tasks.
they also helped people move to simpler / lighter web frameworks that are more performant and kind helped usher in the typical 'Sinatra/express' handlers for most web frameworks e.g .net core
they also showed the performance hit of ORMs vs RAW. so yeah well done.
Engineering has kind of moved on in a weird way from web frameworks. Now AI just writes document.getElementById('longVariableName') javascript and straight SQL without complaining at all. The abstraction isn't as important as it used to be because AI doesn't mind typing.
I knew that once a benchmark is famous, people start optimising for it or even gaming it, but I didn't realise how much it made the benchmarks meaningless. Some frameworks were just not production ready, or had shortcuts made just for a benchmark case. Some implementations were supposed to use a framework, but the code was skewed in an unrealistic way. And sometimes the algorithm was different (IIRC, some implementation converted the "multiple sql updates" requirements into a single complex update using CASE).
I would ignore the results for most cases, especially the emerging software, but at least the benchmarks suggested orders of magnitudes in a few cases. I.e. the speed of JSON serialization in different languages, or that PHP Laravel was more or less twice slower than PHP Symfony which could be twice slower than Rails.
One thing to note is how much things have improved over that time. Numbers that used to top the benchmarks would now be seen as "slow" compared to the top performers.
The other useful thing about these benchmarks is being able to easily justify the use of out of the box ASP.NET Core.
For many languages, the best performers are custom frameworks and presumably have trade-offs versus better known frameworks.
For C# the best performing framework (at least for "fortunes") is aspnet-core.
That side-steps a lot of conversations that might otherwise drag us into "Should we use framework X or Y" and waste time evaluating things.
Are the benchmarks gamed? Yes of course, the code might not even be recognisable as Asp.NET Core to me, but that doesn't really matter if I can use it as an authoritative source to fend off the "rewrite in go" crowd, and it doesn't matter that it is gamed, because the real-world load is many orders of magnitude less than these benchmarks demonstrate is possible.
No benchmark is perfect, but these ones cover such a wide variety of different languages and frameworks, it's a good resource for getting a rough idea of the kind of performance that a given stack is capable of.
I don't know much about TechEmpower the company, it seems to be a small consultancy, maintaining this project probably takes non insignificant resources from them.
The end of the project seems kind of unceremonious, but they don't owe anything to anyone.
Hopefully an active fork emerges.
I think the biggest problem was it just had too many entries, most of which seem tuned to cheating benchmarks. Would probably be more valuable just choosing the top 3 by popularity from the top 15 languages or so.
Even for entries that didn't cheat, the code was sometimes unidiomatic in the sense that "real programmers can write Fortran in any language".
This[0] article articulates the issue with by highlighting an ASP.NET implementation that was faster than more 'honest' Java/Go implementations primarily by not using ASP.NET features, skirting some philosophical line of what it means to use something.
For me, the more interesting discussion of whether a language/library is faster/leaner than another exists in actual idiomatic use. In some languages you are actively sweating over individual allocations; in some you're encouraged to allocate collections and immediately throw them away. Being highly concerned with memory and performance in the latter type of language happens, but is seldom the dominant approach in the larger ecosystem.
[0] https://dusted.codes/how-fast-is-really-aspnet-core
However it also appears that as of the last benchmark (round 23), “aspnetcore“ has fallen to 35on the fortunes leaderboard. The code for that result, really just uses kestrel. It doesn’t even import any of the usual ASP.NET Core NuGet packages, just what’s provided by the web sdk. [0]
[0]: https://github.com/TechEmpower/FrameworkBenchmarks/blob/57d9...
The fix would have been requiring tests to catch the cheating. There were suggestions but it didn't happen.
It was definitely possible to catch not having sent date headers (or caching them) etc.
I do agree with others here that it was possible to game them, but it still gave a good indication of the performance bracket a language was in (and you could check if interpreted languages were cheating via FFI pretty easily).
Feels like the end of an era.
We all know some of us take our language and framework choices as seriously as religion. I wouldn't be surprised if there was a lawsuit involved.
So lets discuss it...
From the start I thought that the TechEmpower Benchmarks were testing all the metrics the JVM is good at, and non the JVM is bad at (mainly: memory usage, start-up time, container size). I got the idea back then than they were a JVM shop (could not confirm this on their current website).
Lately the JVM contenders are not longer at the top. And the benchmark contains many contenders with highly optimized implementations that do not reflect real life use.
though the benchmarks were not exactly 100% accurate - they gave good estimates on how different frameworks / perform in handling web tasks.
they also helped people move to simpler / lighter web frameworks that are more performant and kind helped usher in the typical 'Sinatra/express' handlers for most web frameworks e.g .net core
they also showed the performance hit of ORMs vs RAW. so yeah well done.
I got a newer model that bypasses all that. It takes out Wireshark and send bytes straight.