I'm building feature flags and I feel like architecture decisions for this is a solved problem and would love to hear anecdotes. How did you go about building feature flags at your company?
I work for Flagsmith these days, but formerly I was in charge of managing a lot of Trust and Safety related concerns for Clubhouse. This involved deleting a lot of records to appease regulators and creating custom features to handle stuff like CSAM and we used a lot of feature flags there to keep systems safe in case of an event.
The most important architectural decision that we made was pushing some of the feature flagging into the software layer. So, for example, every task had a module name and a task name that together would form the feature flag name. So out of the box any task could be disabled without adding further code. Combined with other good practices it went a long way. Another good option is to enable local evaluation mode[0] which allows a balance between keeping your feature flags up to date while avoiding API calls frequently.
[0] As an aside, I've worked on the implementation of local evaluation mode for one of our clients (I think Python?) at Flagsmith.
I'm trying to onboard onto Flagsmith right now and it seems like the identities feature [0] is exactly what I need but I can't seem to figure out how to enable it for an arbitrary string value. For example - I want to feed the SDK "mintlify" and get a response on whether "mintlify" should have a feature or not.
Fascinating. This is exactly what I'm juggling between - whether I want to store the flag in our DB vs. on-the-fly evaluation. Just signed up for a demo w/ Flagsmith :)
Easiest way is a DB entry / Environment Variable / Config that turns a feature on and off, some central class/object that reads it and is a source of truth and a bunch of "if" statements in the right place to hide the feature and make it unavailable via APIs if turned off.
We do this at $BIG_TECH. Nothing too crazy. I should add that it tied into our company user directory though, so toggles can happen per user as well and support multiple data types
Choosing a good solution heavily relies on whether your product is multi-tenant or not.
If multi-tenant things get complicated quite quickly. It may be worth looking into existing packages/libraries for your stack. I would be careful about using a third party service for this due to the latency it may introduce.
If not then just have a FEATURES constant. The value can be as primitive as an associative array, dict, hashmap, or struct. The keys are your flags and their values are booleans.
Not your forever choice, but very simple, quick, and easy to use.
Disclaimer: I work at Amplitude, where we offer a feature flagging and experimentation platform
Hey hahnbee, I echo a bunch of other folks sentiment that local evaluation mode is pretty important so you’re not constantly making network calls to figure out the state of your flags, that’s one of the biggest things.
The other thing I’d add is that it’s pretty important (esp as a homegrown system) to build user tracking into your flags so you know that for a given flag, which user saw which version at what time. This is so critical for debugging and issue resolution so you’re not left guessing (and doubly important for experiments where you want to run stats on different populations).
You probably will not be surprised to find out that we have this stuff built into our system, and that you can get started with our flags for free! So here’s my shameless plug to head over to Amplitude.com and check out our stuff
We at https://flipt.io are putting on a buy vs build webinar in a couple of weeks to discuss this very thing as it's a common question that engineering teams seem to have.
I'm pretty happy with our setup, though we use flags mostly for feature releases and only have a few long-lived ones.
State kept in a database table indexed on customer. One row per customer/flag name, only there when the flag is set.
We keep active flags names as constants and put into an array for easy looping in our admin ux. This makes it easy to find usage and clean them up after launch.
These are passed to the browser so the frontend can check flags, and via grpc context to any downstream services.
There are Global, Org, Store, User level settings type of <string,any> each. User has the highest priority over others and values are read of highest to least priority and one final set of flags are fetched. When page loads we fetch them from cache(Browser + Backend - Redis) OR DB.
This works well with two tables in DB, CRUD APIs, UI for settings on various level.
I worked at an adtech company that had our own feature flag system built. Absolute pain to work with. To be fair, it had been built 6 years prior, and I worked there 4 years ago, so there were less off the shelve solutions at the time.
Switched to a company that used an off the shelve solution and it was 100x easier to work with.
You want to be able to enable features for certain customers. For example, you might want to roll out a feature to only the US or only English speaking users because you haven't internationalized it yet. Or you might want to enable it for select customers. Or all sorts of other examples.
You may also want the ability to gradually roll out a new feature over time. This lets you test in small scale and also lets you ramp up load on new backend services.
You can also schedule features to be released in advance. This helps you align releases with things like marketing or customer service training.
I only hit on a few points, but you can see it's a lot more than boolean flags.
That's not really the domain of feature flags in their typical use case. That sounds like you would need to deploy multiple versions of the app and switch between them at the load balancer level or something like that.
"We want to test modifying two lines of this header bidding library against 5% of traffic ONLY IF we bought that traffic from an Outbrain Ad, not a Facebook Ad. If revenue increases after 24 hours we'll enable it on traffic we bought from Outbrain and Facebook ads, but NOT Taboola. Oh and we only want to run it against international traffic, but only if their language setting IS NOT English."
The most important architectural decision that we made was pushing some of the feature flagging into the software layer. So, for example, every task had a module name and a task name that together would form the feature flag name. So out of the box any task could be disabled without adding further code. Combined with other good practices it went a long way. Another good option is to enable local evaluation mode[0] which allows a balance between keeping your feature flags up to date while avoiding API calls frequently.
[0] As an aside, I've worked on the implementation of local evaluation mode for one of our clients (I think Python?) at Flagsmith.
[0] https://docs.flagsmith.com/basic-features/managing-identitie...
https://docs.flagsmith.com/clients/server-side#get-flags-for...
If multi-tenant things get complicated quite quickly. It may be worth looking into existing packages/libraries for your stack. I would be careful about using a third party service for this due to the latency it may introduce.
If not then just have a FEATURES constant. The value can be as primitive as an associative array, dict, hashmap, or struct. The keys are your flags and their values are booleans.
Not your forever choice, but very simple, quick, and easy to use.
Hey hahnbee, I echo a bunch of other folks sentiment that local evaluation mode is pretty important so you’re not constantly making network calls to figure out the state of your flags, that’s one of the biggest things.
The other thing I’d add is that it’s pretty important (esp as a homegrown system) to build user tracking into your flags so you know that for a given flag, which user saw which version at what time. This is so critical for debugging and issue resolution so you’re not left guessing (and doubly important for experiments where you want to run stats on different populations).
You probably will not be surprised to find out that we have this stuff built into our system, and that you can get started with our flags for free! So here’s my shameless plug to head over to Amplitude.com and check out our stuff
If you're interested in attending its taking place on LinkedIn on April 17: https://www.linkedin.com/events/buildvs-buy-pickingafeaturef...
State kept in a database table indexed on customer. One row per customer/flag name, only there when the flag is set.
We keep active flags names as constants and put into an array for easy looping in our admin ux. This makes it easy to find usage and clean them up after launch.
These are passed to the browser so the frontend can check flags, and via grpc context to any downstream services.
There are Global, Org, Store, User level settings type of <string,any> each. User has the highest priority over others and values are read of highest to least priority and one final set of flags are fetched. When page loads we fetch them from cache(Browser + Backend - Redis) OR DB.
This works well with two tables in DB, CRUD APIs, UI for settings on various level.
I worked at an adtech company that had our own feature flag system built. Absolute pain to work with. To be fair, it had been built 6 years prior, and I worked there 4 years ago, so there were less off the shelve solutions at the time.
Switched to a company that used an off the shelve solution and it was 100x easier to work with.
I am genuinely curious how essentially a boolean lookup can be hard to work with.
You want to be able to enable features for certain customers. For example, you might want to roll out a feature to only the US or only English speaking users because you haven't internationalized it yet. Or you might want to enable it for select customers. Or all sorts of other examples.
You may also want the ability to gradually roll out a new feature over time. This lets you test in small scale and also lets you ramp up load on new backend services.
You can also schedule features to be released in advance. This helps you align releases with things like marketing or customer service training.
I only hit on a few points, but you can see it's a lot more than boolean flags.
Stuff starts getting complicated.
There are millions variants to implement a product, even within the same environment/infra constraints.
If you elaborate your specific goals and products - that may be a very fruitful topic.