Why Twilio Segment moved from microservices back to a monolith

mjr00 • 2 months ago

> Once the code for all destinations lived in a single repo, they could be merged into a single service. With every destination living in one service, our developer productivity substantially improved. We no longer had to deploy 140+ services for a change to one of the shared libraries. One engineer can deploy the service in a matter of minutes.

If you must to deploy every service because of a library change, you don't have services, you have a distributed monolith. The entire idea of a "shared library" which must be kept updated across your entire service fleet is antithetical to how you need to treat services.

wowohwow • 2 months ago

I think your point while valid, it is probably a lot more nuanced. From the post it's more akin to an Amazon shared build and deployment system than "every library update needs to redeploy every time scenario".

It's likely there's a single source of truth where you pull libraries or shared resources from, when team A wants to update the pointer to library-latest to 2.0 but the current reference of library-latest is still 1.0, everyone needs to migrate off of it otherwise things will break due to backwards compatibility or whatever.

Likewise, if there's a -need- to remove a version for a vulnerability or what have you, then everyone needs to redeploy, sure, but the centralized benefit of this likely outweighs the security cost and complexity of tracking the patching and deployment process for each and every service.

I would say those systems -are- and likely would be classified as micro services but from a cost and ease perspective operate within a shared services environment. I don't think it's fair to consider this style of design decision as a distributed monolith.

By that level of logic, having a singular business entity vs 140 individual business entities for each service would mean it's a distributed monolith.

mjr00 • 2 months ago

> It's likely there's a single source of truth for where you pull libraries or shared resources from, when team A wants to update the pointer to library-latest to 2.0 but the current reference of library-latest is still 1.0, everyone needs to migrate off of it otherwise things will break due to backwards compatibility or whatever.

No, this misses one of the biggest benefits of services; you explicitly don't need everyone to upgrade library-latest to 2.0 at the same time. If you do find yourself in a situation where you can't upgrade a core library like e.g. SQLAlchemy or Spring, or the underlying Python/Java/Go/etc runtime, without requiring updates to every service, you are back in the realm of a distributed monolith.

rbranson • 2 months ago

This is explicitly called out in the blog post in the trade-offs section.

I was one of the engineers who helped make the decisions around this migration. There is no one size fits all. We believed in that thinking originally, but after observing how things played out, decided to make different trade-offs.

+1

nine_k • 2 months ago

mjr00 • 2 months ago

> There is no one size fits all.

Totally agree. For what it's worth, based on the limited information in the article, I actually do think it was the right decision to pull all of the per-destination services back into one. The shared library problem can go both ways, after all: maybe the solution is to remove the library so your microservices are fully independent, or maybe they really should have never been independent in the first place and the solution is to put them back together.

I don't think either extreme of "every line of code in the company is deployed as one service" or "every function is an independent FaaS" really works in practice, it's all about finding the right balance, which is domain-specific every time.

wowohwow • 2 months ago

FWIW, I think it was a great write up. It's clear to me what the rationale was and had good justification. Based on the people responding to all of my comments, it is clear people didn't actually read it and are opining without appropriate context.

benregenspan • 2 months ago

Having seen similar patterns play out at other companies, I'm curious about the organizational dynamics involved. Was there a larger dev team at the time you adopted microservices? Was there thinking involved like "we have 10 teams, each of which will have strong, ongoing ownership of ~14 services"?

Because from my perspective that's where microservices can especially break down: attrition or layoffs resulting in service ownership needing to be consolidated between fewer teams, which now spend an unforeseen amount of their time on per-service maintenance overhead. (For example, updating your runtime across all services becomes a massive chore, one that is doable when each team owns a certain number of services, but a morale-killer as soon as some threshold is crossed.)

wowohwow • 2 months ago

I disagree. Both can be true at the same time. A good design should not point to library-latest in a production setting, it should point to a stable known good version via direct reference, i.e library-1.0.0-stable.

However, the world we live in, people choose pointing to latest, to avoid manual work and trust other teams did the right diligence when updating to the latest version.

You can point to a stable version in the model I described and still be distributed and a micro service, while depending on a shared service or repository.

+4

vlovich123 • 2 months ago

GeneralMayhem • 2 months ago

Internal and external have wildly different requirements. Google internally can't update a library unless the update is either backward-compatible for all current users or part of the same change that updates all those users, and that's enforced by the build/test harness. That was an explicit choice, and I think an excellent one, for that scenario: it's more important to be certain that you're done when you move forward, so that it's obvious when a feature no longer needs support, than it is to enable moving faster in "isolation" when you all work for the same company anyway.

But also, you're conflating code and services. There's a huge difference between libraries that are deployed as part of various binaries and those that are used as remote APIs. If you want to update a utility library that's used by importing code, then you don't need simultaneous deployment, but you would like to update everywhere to get it done with - that's only really possible with a monorepo. If you want to update a remote API without downtime, then you need a multi-phase rollout where you introduce a backward-compatibility mode... but that's true whether you store the code in one place or two.

wowohwow • 2 months ago

To reference my other comment. This thread is about the nuance of if a dependency on a shared software repository means you are a microservice or not. I'm saying it's immaterial to the definition.

A dependency on an external software repository does not make a microservice no longer a microservice. It's the deployment configuration around said dependency that matters.

smaudet • 2 months ago

> Be ready for a blog post in ten years how they broke apart the monolith into loosely coupled components because it was too difficult to ship things with a large team and actually have it land in production without getting reverted to an unrelated issue.

Some of their "solutions" I kind of wonder how they plan on resolving this, like the black box "magic" queue service they subbed back in, or the fault tolerance problem.

That said, I do think if you have a monolith that just needs to scale (single service that has to send to many places), they are possibly taking the correct approach. You can design your code/architecture so that you can deploy "services" separately, in a fault tolerant manner, but out of a mono repo instead of many independent repos.

dmoy • 2 months ago

> Can you imagine if Google could only release a new API if all their customers simultaneously updated to that new API? You need loose coupling between services.

Internal Google services: *sweating profusely*

(Mostly in jest, it's obviously a different ballgame internal to the monorepo on borg)

mekoka • 2 months ago

You're both right, but talking past each other. You're right that shared dependencies create a problem, but it can be the problem without semantically redefining the services themselves as a shared monolith. Imagine someone came to you with a similar problem and you concluded "distributed monolith", which may lead them to believe that their services should be merged into a single monolith. What if they then told you that it's going to be tough because these were truly separate apps, but that used the same OS wide Python install, one ran on Django/Postgres, another on Flask/SQLite, and another was on Fastapi/Mongo, but they all relied on some of the same underlying libs that are frequently updated. The more accurate finger should point to bad dependency management and you'd tell them about virtualenv or docker.

deaddodo • 2 months ago

The dependencies they're likely referring to aren't core libraries, they're shared interfaces. If you're using protobufs, for instance, and you share the interfaces in a repo. Updating Service A's interface(s) necessitates all services dependent on communicating with it to be updated as well (whether you utilize those changes or not). Generally for larger systems, but smaller/scrappier teams, a true dependency management tree for something like this is out of scope so they just redeploy everything in a domain.

+2

mjr00 • 2 months ago

lowbloodsugar • 2 months ago

Oh god no.

I mean I suppose you can make breaking changes to any API in any language, but that’s entirely on you.

aezart • 2 months ago

We had this problem, 119 services that all got their dependencies from a shared domain. Individual services had to depend on the exact version of libraries provided by the domain. It made updates essentially impossible.

Eventually we were forced by a licensing change to move to containers, which fixed that issue but substantially increased our research usage, we went from 16 GB of RAM for the entire domain to 1.5 GB per service, with similar increases on CPU.

philwelch • 2 months ago

> If you do find yourself in a situation where you can't upgrade a core library like e.g. SQLAlchemy or Spring, or the underlying Python/Java/Go/etc runtime, without requiring updates to every service, you are back in the realm of a distributed monolith.

Show me a language runtime or core library that will never have a CVE. Otherwise, by your definition, microservices don’t exist and all service oriented architectures are distributed monoliths.

lelanthran • 2 months ago

Right, but theres a cost to having to support 12 different versions of a library in your system.

Its a tradeoff

+2

imtringued • 2 months ago

3rodents • 2 months ago

Yes, you’re describing a distributed monolith. Microservices are independent, with nothing shared. They define a public interface and that’s it, that’s the entire exposed surface area. You will need to do major version bumps sometimes, when there are backwards incompatible changes to make, but these are rare.

The logical problem you’re running into is exactly why microservices are such a bad idea for most businesses. How many businesses can have entirely independent system components?

Almost all “microservice” systems in production are distributed monoliths. Real microservices are incredibly rare.

A mental model for true microservices is something akin to depending on the APIs of Netflix, Hulu, HBO Max and YouTube. They’ll have their own data models, their own versioning cycles and all that you consume is the public interface.

makeitdouble • 2 months ago

I'm trying to understand what you see as a really independent service with nothing shared.

For instance if company A used one of the GCP logging stack, and company B does the same. GCP updates it's profuct in a way that strongly encourages upgrading within a specific time frame (e.g. price will drastically increase otherwise), so A and B do it mostly at the same time for the same reason.

Are A and B truly independent under your vision ? or are they a company-spanning monolith ?

+1

buttercraft • 2 months ago

wowohwow • 2 months ago

This type of elitist mentality is such a problem and such a drain for software development. "Real micro services are incredibly rare". I'll repeat myself from my other post, by this level of logic nothing is a micro service.

Do you depend on a cloud provider? Not a microservice. Do you depend on an ISP for Internet? Not a microservice. Depend on humans to do something? Not a microservice.

Textbook definitions and reality rarely coincide, rather than taking such a fundamentalist approach that leads nowhere, recognize that for all intents and purposes, what I described is a microservice, not a distributed monolith.

+1

ollysb • 2 months ago

+2

3rodents • 2 months ago

AndrewKemendo • 2 months ago

And if my grandmother had wheels she would be a bike

There are categories and ontologies are real in the world. If you create one thing and call it something else that doesn’t mean the definition of “something else” should change

By your definition it is impossible to create a state based on coherent specifications because most states don’t align to the specification.

We know for a fact that’s wrong via functional programming, state machines, and formal verification

andrewmutz • 2 months ago

Needing to upgrade a library everywhere isn’t necessarily a sign of inappropriate coupling.

For example, a library with a security vulnerability would need to be upgraded everywhere regardless of how well you’ve designed your system.

In that example the monolith is much easier to work with.

mjr00 • 2 months ago

While you're right, I can only think of twice in my career where there was a "code red all services must update now", which were log4shell and spectre/meltdown (which were a bit different anyway). I just don't think this comes up enough in practice to be worth optimizing for.

wowohwow • 2 months ago

You have not been in the field very long than I presume? There's multiple per year that require all hands on deck depending on your tech stack. Just look at the recent NPM supply chain attacks.

+2

mjr00 • 2 months ago

Aeolun • 2 months ago

We use pretty much the entire nodejs ecosystem, and only the very latest Next.js vulnerability was an all hands on deck vulnerability. That’s taken over the past 7 years.

+1

procaryote • 2 months ago

zhivota • 2 months ago

I mean I just participated in a Next JS incident that required it this week.

It has been rare over the years but I suspect it's getting less rare as supply chain attacks become more sophisticated (hiding their attack more carefully than at present and waiting longer to spring it).

Aeolun • 2 months ago

NextJS was just bog standard “we designed an insecure API and now everyone can do RCE” though.

Everyone has been able to exploit that for ages. It only became a problem when it was discovered and publicised.

jameshart • 2 months ago

A library which patches a security vulnerability should do so by bumping a patch version, maintaining backward compatibility. Taking a patch update to a library should mean no changes to your code, just rerun your tests and redeploy.

If libraries bump minor or major versions, they are imposing work on all the consuming services to accept the version, make compatibility changes, test and deploy.

VirusNewbie • 2 months ago

This is pedantic, but no, it doesn't need to be updated everywhere. It should be updated as fast as possible, but there isn't a dependency chain there.

mettamage • 2 months ago

Example: log4j. That was an update fiasco everywhere.

smrtinsert • 2 months ago

1 line change and redeploy

jabroni_salad • 2 months ago

Works great if you are the product owner. We ended up having to fire and replace about a dozen 3rd party vendors over this.

reactordev • 2 months ago

I was coming here to say this. That the whole idea of a shared library couples all those services together. Sounds like someone wanted to be clever and then included their cleverness all over the platform. Dooming all services together.

Decoupling is the first part of microservices. Pass messages. Use json. I shouldn’t need your code to function. Just your API. Then you can be clever and scale out and deploy on saturdays if you want to and it doesn’t disturb the rest of us.

xienze • 2 months ago

> Pass messages. Use json. I shouldn’t need your code to function. Just your API.

Yes, but there’s likely a lot of common code related to parsing those messages, interpreting them, calling out to other services etc. shared amongst all of them. That’s to be expected. The question is how that common code is structured if everything has to get updated at once if the common code changes.

reactordev • 2 months ago

Common code that’s part of your standard library, sure. Just parse the json. Do NOT introduce some shared class library that “abstracts” that away. Instead use versioning of schemas like another commenter said. Use protobuf. Use Avro. Use JSON. Use Swagger. Use something other than POCO/POJO shared library that you have to redeploy all your services because you added a Boolean to the newsletter object.

+1

narnarpapadaddy • 2 months ago

c-fe • 2 months ago

one way is by using schemas to communicate between them that are backwards compatible. eg with avro its quite nice

+3

gmueckl • 2 months ago

imtringued • 2 months ago

The third party shared library doesn't know your company exists. This means the third party dependency doesn't contain any business or application specific code and is applicable to any software project. This in turn means it has to solve the majority of business use cases ahead of time and be thoroughly tested to not break any consumers.

The problem has fundamentally gone away and reduced itself to a simple update problem, which itself is simpler because the update schedule is less frequent.

I use tomcat for all web applications. When tomcat updates I just need to bump the version number on one application and move on to the next. Tomcat does not involve itself in the data that is being transferred in a non-generic way so I can update whenever I want.

Since nothing blocks updates, the updates happen frequently which means no application is running on an ancient tomcat version.

reactordev • 2 months ago

That 3rd party library rarely gets updated whereas Jon’s commit adds a field and now everyone has to update or the marshaling doesn’t work.

Yes, there are scenarios where you have to deploy everything but when dealing with micro services, you should only be deploying the service you are changing. If updating a field in a domain affects everyone else, you have a distributed monolith and your architecture is questionable at best.

The whole point is I can deploy my services without relying on yours, or touching yours, because it sounds like you might not know what you’re doing. That’s the beautiful effect of a good micro service architecture.

magicalhippo • 2 months ago

I was trying to think of better terminology. Perhaps this works:

Two services can have a common dependency, which still leaves them uncoupled. An example would be a JSON schema validation and serialization/deserialization library. One service can in general bump its dependency version without the other caring, because it'll still send and consume valid JSON.

Two services can have a shared dependency, which couples them. If one service needs to bump its version the other must also bump its version, and in general deployment must ensure they are deployed together so only one version of the shared dependency is live, so to speak. An example could be a library containing business logic.

If you had two independent microservices and added a shared library as per my definition above, you've turned them into a distributed monolith.

Sometimes a common dependency might force a shared deployment, for example a security bug in the JSON library. However that is an exception, and unlike the business logic library. In the shared library case the exception is that one could be bumped without the other caring.

andy_ppp • 2 months ago

It’s easy to say things like this but also incredibly difficult to know if you’ll introduce subtle bugs or incompatibilities between services. It’s an example of people following the microservices pattern and then being given additional risk or problems deploying that are not immediately obvious when buying into this!

So let’s say you have a shared money library that you have fixed a bug in… what would you do in the real world - redeploy all your services that use said library or something else?

mjr00 • 2 months ago

> It’s easy to say things like this but also incredibly difficult to know if you’ll introduce subtle bugs or incompatibilities between services.

You are right: it is difficult. It is harder than building a monolith. No argument there. I just don't think proper microservices are as difficult as people think. It's just more of a mindshift.

Plenty of projects and companies continue to release backwards compatible APIs: operating systems, Stripe/PayPal, cloud providers. Bugs come up, but in general people don't worry about ec2:DescribeInstances randomly breaking. These projects are still evolving internally while maintaining a stable external API. It's a skill, but something that can be learned.

> So let’s say you have a shared money library that you have fixed a bug in… what would you do in the real world - redeploy all your services that use said library or something else?

In the real world I would not have a shared "money library" to begin with. If there were money-related operations that needed to be used by multiple services, I would have a "money service" which exposed an API and could be deployed independently. A bug fix would then be a deploy to this service, and no other services would have to update or be aware of the fix.

This isn't a theoretical, either, as a "payments service" that encapsulates access to payment processors is something I've commonly seen.

necovek • 2 months ago

But really, a shared "money library" is exactly the same thing as a shared "money service" if everyone is using the same, latest version (which is easier to enforce with a networked "service").

The difference is in what's easy and what's hard. With a library, it's easy for everyone to run a different version, and hard for everyone to run the same version. With a service, it's easy for everyone to use the same version, and harder to use a different one (eg. creating multiple environments, and especially ephemeral "pull request" environments where you can mix and match for best automated integration and e2e testing).

But you can apply the same backwards-compatible API design patterns to a library that you would be applying to a service: no difference really. It's only about what's the time to detection when you break these patterns (with a library, someone finds out 2 years later when they update; with a service, they learn right away).

+1

andy_ppp • 2 months ago

RaftPeople • 2 months ago

> In the real world I would not have a shared "money library" to begin with. If there were money-related operations that needed to be used by multiple services, I would have a "money service" which exposed an API and could be deployed independently.

Depending on what functionality the money service handles, this could become a problem.

For example, one example of a shared library type function I've seen in the past is rounding (to make sure all of the rounding rules are handled properly based on configs etc.). An HTTP call for every single low level rounding operation would quickly become a bottleneck.

mjr00 • 2 months ago

I agree an HTTP call for every rounding operation would be awful. I would question service boundaries in this case though. This is very domain-specific, but there's likely only a small subset of your system that cares about payments, calculating taxes, rounding, etc. which would ever call a rounding operation; in that case that entire subdomain should be packaged up as a single service IMO. Again, this gets very domain-specific quickly; I'm making the assumption this is a standard-ish SaaS product and not, say, a complex financial system.

AbstractH24 • 2 months ago

From the perspective of change management, what’s the difference between a shared library and an internal service relied on by multiple other services?

You still need to make sure changes don’t have unintended consequences downstream

andy_ppp • 2 months ago

Latency.

philwelch • 2 months ago

If there’s any shared library across all your services, even a third party library, if that library has a security patch you now need to update that shared library across your entire service fleet. Maybe you don’t have that; maybe each service is written in a completely different programming language, uses a different database, and reimplements monitoring in a totally different way. In that case you have completely different problems.

sethammons • 2 months ago

Everyone needing to update due to a security thing happens infrequently. Otherwise, coding and deploying may be happening multiple times a day.

We have had shared libraries. Teams updated to them when they next wanted to. When it was important, on call people made it happen asap. Zero issues.

__abc • 2 months ago

So you should re-write your logging code on each and every one of your 140+ services vs. leverage a shared module?

xboxnolifes • 2 months ago

You can keep using an older version for a while. You shouldn't need to redeploy everything at once. If you can't keep using the older version, you did it wrong.

And ideally, your logging library should rarely need to update. If you need unique integrations per service, use a plug-in architecture and keep the plug-ins local to each service.

__abc • 2 months ago

I wasn't taking into account velocity of fleet-wide rollout, as I agree, you can migrate over time however. however, I was focusing on the idea that anytime of fleet wide rollout for a specific change was somehow "bad."

duxup • 2 months ago

Yeah this seems very much not a microservices setup.

I don't pretend proper microservices are a magic solution... but if you break the rules / system if microservices, that's not "microsercices" being bad, that's just creating problems for yourself.

ChuckMcM • 2 months ago

While I think that's a bit harsh :-) the sentiment of "if you have these problems, perhaps you don't understand systems architecture" is kind of spot on. I have heard people scoff at a bunch of "dead legacy code" in the Windows APIs (as an example) without understanding the challenge of moving millions of machines, each at different places in the evolution timeline, through to the next step in the timeline.

To use an example from the article, there was this statement: "The split to separate repos allowed us to isolate the destination test suites easily. This isolation allowed the development team to move quickly when maintaining destinations."

This is architecture bleed through. The format produced by Twilio "should" be the canonical form, which is submitted the adapter which mangles it to the "destination" form. Great, that transformation is expressible semantically in a language that takes the canonical form and spits out the special form. Changes to the transformation expression should not "bleed through" to other destinations, and changes to the canonical form should be backwards compatible to prevent bleed through of changes in the source from impacting the destination. At all times, if something worked before, it should continue to work without touching it because the architecture boundaries are robust.

Being able to work with a team that understood this was common "in the old days" when people were working on an operating system. The operating system would evolve (new features, new devices, new capabilities) but because there was a moat between the OS and applications, people understood that they had to architect things so that the OS changes would not cause applications that currently worked to stop working.

I don't judge Twilio for not doing robust architecture, I was astonished when I went to work of Google how lazy everyone got when the entire system is under their control (like there are no third party apps running in the fleet). The was a persistent theme of some bright person "deciding" to completely change some interface and Wham! every other group at Google had to stop what they were doing and move their code to the new thing. There was a particularly poor 'mandate' on a new version of their RPC while I was there. As Twilio notes, that can make things untenable.

mlhpdx • 2 months ago

Agreed. It sounds like they never made it to the distributed architecture they would have benefited from. That said, if the team thrives on a monolithic one they made the right choice.

j45 • 2 months ago

Monorepos reasonably well designed and fleixble to grow with you can increase development speed quite a bit.

threethirtytwo • 2 months ago

Then every microservice network in existence is a distributed monolith so long as they communicate with one another.

If you communicate with one another you are serializing and deserializing a shared type. That shared type will break at the communication channels if you do not simultaneously deploy the two services. The irony is to prevent this you have to deploy simultaneously and treat it as a distributed monolith.

This is the fundamental problem of micro services. Under a monorepo it is somewhat more mitigated because now you can have type checking and integration tests across multiple repos.

Make no mistake the world isn’t just library dependencies. There are communication dependencies that flow through communication channels. A microservice architecture by definition has all its services depend on each other through this communication channels. The logical outcome of this is virtually identical to a distributed monolith. In fact shared libraries don’t do much damage at all if the versions are off. It is only shared types in the communication channels that break.

There is no way around this unless you have a mechanism for simultaneous merging code and deploying code across different repos which breaks the definition of what it is to be a microservice. Microservices always and I mean always share dependencies with everything they communicate with. All the problems that come from shared libraries are intrinsic to microservices EVEN when you remove shared libraries.

People debate me on this but it’s an invariant.

ricardobeat • 2 months ago

I believe in the original amazon service architecture, that grew into AWS (see “Bezos API mandate” from 2002), backwards compatibility is expected for all service APIs. You treat internal services as if they were external.

That means consumers can keep using old API versions (and their types) with a very long deprecation window. This results in loose coupling. Most companies doing microservices do not operate like this, which leads to these lockstep issues.

threethirtytwo • 2 months ago

Yeah. that's a bad thing right? Maintaining backward compatibility to the end of time in the name of safety.

I'm not saying monoliths are better then microservices.

I'm saying for THIS specific issue, you will not need to even think about API compatibility with monoliths. It's a concept you can throw out the window because type checkers and integration tests catch this FOR YOU automatically and the single deployment insures that the compatibility will never break.

If you choose monoliths you are CHOOSING for this convenience, if you choose microservices you are CHOOSING the possibility for things to break and AWS chose this and chose to introduce a backwards compatibility restriction to deal with this problem.

I use "choose" loosely here. More likely AWS ppl just didn't think about this problem at the time. It's not obvious... or they had other requirements that necessitated microservices... The point is, this problem in essence is a logical consequence of the choice.

+1

procaryote • 2 months ago

+1

Seattle3503 • 2 months ago

mjr00 • 2 months ago

> If you communicate with one another you are serializing and deserializing a shared type.

Yes, this is absolutely correct. The objects you send over the wire are part of an API which forms a contract the server implementing the API is expected to provide. If the API changes in a way which is backwards compatible, this will break things.

> That shared type will break at the communication channels if you do not simultaneously deploy the two services.

This is only true if you change the shared type in a way which is not backwards compatible. One of the major tenets of services is that you must not introduce backwards incompatible changes. If you want to make a fundamental change, the process isn't "change APIv1 to APIv2", it's "deploy APIv2 alongside APIv1, mark APIv1 as deprecated, migrate clients to APIv2, remove APIv1 when there's no usage."

This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility which exists when your code and your database do not match. The strategy might be a simple add new column->migrate code->remove old column, including some thought on how to deal with data added in the interim. It might be to use views. It might be some insane strategy of duplicating the full stack, using change data capture to catch changes and flipping a switch.[0] It doesn't really matter, the point is that even within a monolith, you have two separate services, a database and a backend server, and you cannot deploy them truly simultaneously, so you need to have some strategy for dealing with that; or more generally, you need to be conscious of breaking API changes, in exactly the same way you would with independent services.

> The logical outcome of this is virtually identical to a distributed monolith.

Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.

[0] I have seen this done. It was as crazy as it sounds.

Seattle3503 • 2 months ago

Managing two services is very different than managing 140. And databases have a lot of tooling, support, and documentation around migrations.

threethirtytwo • 2 months ago

>This is only true if you change the shared type in a way which is not backwards compatible. One of the major tenets of services is that you must not introduce backwards incompatible changes. If you want to make a fundamental change, the process isn't "change APIv1 to APIv2", it's "deploy APIv2 alongside APIv1, mark APIv1 as deprecated, migrate clients to APIv2, remove APIv1 when there's no usage."

Agreed and this is a negative. Backwards compatibility is a restriction made to deal with something fundamentally broken.

Additionally eventually in any system of services you will have to make a breaking change. Backwards compatibility is a behavioral comping mechanism to deal with a fundamental issue of microservices.

>This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility.

I believe you and am already aware. It's a limitation that exists intrinsically so it exists because you have No choice. A database and a monolith needs to exist as separate services. The thing I'm addressing here is the microservices and monolith debate. If you choose microservices, you are CHOOSING for this additional problem to exist. If you choose monolith, then within that monolith you are CHOOSING for those problems to not exist.

I am saying regardless of the other issues with either architecture, this one is an invariant in the sense that for this specific thing, monolith is categorically better.

>Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.

No you're categorically wrong. If they did this in ANY of the companies you worked at then they are Living with this issue. What I'm saying here isn't an opinion. It is a theorem based consequence that will occur IF all the axioms are satisfied: namely >2 services that communicate with each other and ARE not deployed simultaneously. This is logic.

The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate. Neither of these scenarios is practical.

+1

mjr00 • 2 months ago

+1

kccqzy • 2 months ago

procaryote • 2 months ago

You usually can't simultaneously deploy two services. You can try, but in a non trivial environment there are multiple machines and you'll want a rolling upgrade, which causes an old client to talk to a new service or vice versa. Putting the code into a monorepo does nothing to fix this.

This is much less of a problem than it seems.

You can use a serialisation format that allows easy backward compatible additions. The new service that has a new feature adds a field for it. The old client, responsibly coded, gracefully ignores the field it doesn't understand.

You can version the API to allow for breaking changes, and serve old clients old responses, and new clients newer responses. This is a bit of work to start and sometimes overkill, given the first point

If you only need very rare breaking changes, you can deploy new-version-tolerant clients first, then when that's fully done, deploy the new-version service. It's a bit of faff, but if it's very rare and internal, it's often easier than implementing full versioning.

threethirtytwo • 2 months ago

> You usually can't simultaneously deploy two services

Yeah it’s roundabout solution to create something to deploy two things simultaneously. Agreed.

> Putting the code into a monorepo does nothing to fix this.

It helps mitigate the issue somewhat. If it was a polyrepo you suffer from an identical problem with the type checker or the integration test. The checkers basically need all services to be at the same version to do a full and valid check so if you have different teams and different repos the checkers will never know if team A made a breaking change that will effect team B because the integration test and type checker can’t stretch to another repo. Even if it could stretch to another repo you would need to do a “simultaneous” merge… in a sense polyrepos suffer from the same issue as microservices on the CI verification layer.

So if you have micro services and you have a polyrepos you are suffering from a twofold problem. Your static checks and integration tests are never correct and always either failing and preventing you from merging or deliberately crippled so as to not validate things across repos. At the same time your deploys will also guarantee to be broken if a breaking api change is made. You literally give up safety in testing, safety in type checking and working deploys by going microservices and polyrepos.

Like you said it can be fixed with backward comparability but that’s a bad thing to restrict your code to be that way.

> This is much less of a problem than it seems.

It is not “much less of a problem then it seems” because big companies have developed methods to do simultaneous deploys. See Netflix. If they took the time to develop a solution it means it’s not a trivial issue.

Additionally are you aware of any api issues in communication between your local code in a single app? Do you have any problems with this such that you are aware of it and come up with ways to deal with it? No. In a monolith the problem is nonexistent and it doesn’t even register. You are not aware this problem exists until you move to micro-services. That’s the difference here.

> You can use a serialisation format that allows easy backward compatible additions.

Mentioned a dozen times in this thread. Backwards compatibility is a bad thing. It’s a restriction that freezes all technical debt into your code. Imagine python 3 stayed backward compatible with 2 or the current version of macOS was still compatible with binaries from the first Mac.

> You can version the API to allow for breaking changes, and serve old clients old responses, and new clients newer responses. This is a bit of work to start and sometimes overkill, given the first point

Can you honestly tell me this is a good thing? The fact that you have to pay attention to this in microservices while in a monolith you don’t even need to be aware there’s an issue tells you all you need to know. You’re just coming up with behavioral work around and coping mechanisms to make microservices work in this area. You’re right it does work. But it’s a worse solution for this problem then monoliths which doesn’t have these work arounds because these problems don’t exist in monoliths.

> If you only need very rare breaking changes, you can deploy new-version-tolerant clients first, then when that's fully done, deploy the new-version service. It's a bit of faff, but if it's very rare and internal, it's often easier than implementing full versioning.

It’s only very rare in microservices because it’s weaker. You deliberately make it rare because of this problem. Is it rare to change a type in a monolith? No. Happens on the regular. See the problem? You’re not realizing but everything you’re bringing up is behavioral actions to cope with an aspect that is fundamentally weaker in microservices.

Let me conclude to say that there are many reasons why microservices are picked over monoliths. But what we are talking about here is definitively worse. Once you go microservices you are giving up safety and correctness and replacing it with work arounds. There is no trade off for this problem it is a logical consequence of using microservices.

+1

procaryote • 2 months ago

kccqzy • 2 months ago

> That shared type will break at the communication channels if you do not simultaneously deploy the two services.

No. Your shared type is too brittle to be used in microservices. Tools like the venerable protobuf has solved this problem decades ago. You have a foundational wire format that does not change. Then you have a schema layer that could change in backwards compatible ways. Every new addition is optional.

Here’s an analogy. Forget microservices. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with both the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.

This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.

threethirtytwo • 2 months ago

False. Protobuf solves nothing.

1. Protobuf requires a monorepo to work correctly. Shared types must be checked across all repos and services simulateneously. Without a monorepo or some crazy work around mechanism this won't work. Think about it. These type checkers need everything at the same version to correctly check everything.

2. Even with a monorepo, deployment is a problem. Unless you do simultaneous deploys if one team upgrades there service and another team doesn't the Shared type is incompatible simply because you used microservices and polyrepos to allow teams to move async instead of insync. It's a race condition in distributed systems and it's theoremtically true. Not solved at all because it can't be solved by logic and math.

Just kidding. It can be solved but you're going to have to change definitions of your axioms aka of what is currently a microservice, monolith, monorepo and polyrepo. If you allow simultaneous deploys or pushes to microservices and polyrepos these problems can be solved but then can you call those things microservices or polyrepos? They look more like monorepos or monoliths... hmmm maybe I'll call it "distributed monolith".... See we are hitting this problem already.

>Here’s an analogy. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.

You are just describing the problem I provided. We call "monoliths" monoliths but technically a monolith must interact with a secondary service called a database. We have no choice in the matter. The monolith and microservice of course does not refer to that problem which SUFFERS from all the same problems as microservices.

>This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.

No it's not. Not at all. It's a problem that's lived with. I have two modules in a monolith. ANY change that goes into the mainline branch or deploy is type checked and integration tested to provide maximum safety as integration tests and type checkers can check the two modules simultaneously.

Imagine those two modules as microservices. Because they can be deployed at any time asynchronously, because they can be merged to the mainline branch at any time asynchronously They cannot be type checked or integration tested. Why? If I upgrade A which requires an upgrade to B but B is not upgraded yet, How do I type check both A and B at the same time? Axiomatically impossible. Nothing is solved. Just behavioral coping mechanisms to deal with the issue. That's the key phrase: behavioral coping mechanisms as opposed to automated statically checked safety based off of mathematical proof. Most of the arguments from your side will be consisting of this: "behavioral coping mechanisms"

klabb3 • 2 months ago

> Then you have a schema layer that could change in backwards compatible ways. Every new addition is optional.

Also known as the rest of the fucking owl. I am entirely in factual agreement with you, but the number of people who are even aware they maintain an API surface with backwards compatibility as a goal, let alone can actually do it well, are tiny in practice. Especially for internal services, where nobody will even notice violations until it’s urgent, and at such a time, your definitions won’t save you from blame. Maybe it should, though. The best way to stop a bad idea is to follow it rigorously and see where it leads.

I’m very much a skeptic of microservices, because of this added responsibility. Only when the cost of that extra maintenance is outweighed by overwhelming benefits elsewhere, would I consider it. For the same reason I wouldn’t want a toilet with a seatbelt.

wowohwow • 2 months ago

Bingo. Couldn't agree more. The other posters in this comment chain seem to view things from a dogmatic approach vs a pragmatic approach. It's important to do both, but individuals should call out when they are discussing something that is practiced vs preached.

procaryote • 2 months ago

If you've run a microservice stack or N at scale with good results, someone saying it's impossible doesn't look pragmatic

threethirtytwo • 2 months ago

I’m not commenting on the pragmatic part.

My thesis is logical and derived from axioms. You will have fundamental incompatibilities between apis between services if one service changes the api. That’s a given. It’s 1 + 1 =2.

Now I agree there are plenty of ways to successfully deal with these problems like api backwards compatibility, coordinated deploys… etc… etc… and it’s a given thousands of companies have done this successfully. This is the pragmatic part, but that’s not ultimately my argument.

My argument is none of the pragamatisms and methodologies to deal with those issues need to exist in a monolithic architecture because the problem itself doesn’t exist in a monolith.

Nowhere did I say microservices can’t be successfully deployed. I only stated that there are fundamental issues with microservices that by logic must occur definitionally. The issue is people are biased. They tie their identity to an architecture because they advocated it for too long. The funniest thing is that I didn’t even take a side. I never said microservices were better or worse. I was only talking about one fundamental problem with microservices. There are many reasons why microservices are better but I just didn’t happen to bring it up. A lot of people started getting defensive and hence the karma.

threethirtytwo • 2 months ago

Agreed. What I’m describing here isn’t solely pragmatic it’s axiomatic as well. If you model this as a distributed system with graph all microservices by definition will always reach a state where the apis are broken.

Most microservice companies either live with the fact or they have round about ways to deal with it including simultaneous deploys across multiple services and simultaneous merging, CI and type checking across different repos.

Aeolun • 2 months ago

Once all the code for the services lived in one repo there was nothing preventing them from deploying the thing 140 times. I’m not sure why they act like that wasn’t an option.

smrtinsert • 2 months ago

100%. It's almost like they jumped into it not understanding what they were signing up for.

echelon • 2 months ago

> If you must to deploy every service because of a library change

Hello engineer. Jira ticket VULN-XXX had been assigned to you as your team's on call engineer.

A critical vulnerability has been found in the netxyz library. Please deploy service $foo after SHA before 2025-12-14 at 12:00 UTC.

Hello engineer. Jira ticket VULN-XXX had been assigned to you as your team's on call engineer.

A critical vulnerability has been found in the netxyz library. Please deploy service $bar after SHA before 2025-12-14 at 12:00 UTC.

...

It's never ending. You get a half dozen of these on each on call rotation.

sethammons • 2 months ago

My experience doesn't align with yours. I worked at SendGrid for over a decade and they were on the (micro) service train. I was on call for all dev teams on a rotation for a couple of years and later just for my team.

I have seen like a dozen security updates like you describe.

echelon • 2 months ago

This was at a fintech and we took every single little vuln with the utmost priority. Triaged by severity of course, but everything had a ticking clock.

We didn't just have multiple security teams, we had multiple security orgs. If you didn't stay in compliance with VULN SLAs, you'd get a talking to.

We also had to frequently roll secrets. If the secrets didn't support auto-rotation, that was also a deployment (with other steps).

We also had to deploy our apps if they were stale. It's dangerous not to deploy your app every month or two, because who knows if stale builds introduced some kind of brittleness? Perhaps a change to some net library you didn't deploy caused the app not to tolerate traffic spikes. And it's been six months and there are several such library changes.

imtringued • 2 months ago

I don't know what a call rotation is, but I keep getting email flooded by half a dozen Linux vulnerabilities every day and it's getting old.

necovek • 2 months ago

Imagine your services were built on react-server-* components or used Log4J logging.

This is simply dependency hell exploding with microservices.

_pdp_ • 2 months ago

In my previous company we did everything as a micro service. In the company before that it was serverless on AWS!

In both cases we had to come up with clever solutions to simply get by because communication between services is of a problem. It is difficult (not impossible) to keep all the contracts in sync and deployment has to be coordinated in a very specific way sometimes. The initial speed you get is soon lost further down the path due to added complexities. There was fear-driven development at play. Service ownership is a problem. Far too much meetings are spent on coordination.

In my latest company everything is part of the same monolith. Yes the code is huge but it is so much easier to work with. We use a lot more unit tests then integration tests. Types make sense. Refactoring is just so easy. All the troubleshooting tools including specialised AI agents built on top of our own platform are part of the code-base which is kind of interesting because I can see how this is turning into a self-improving system. It is fascinating!

We are not planning to break up the monolith unless we grow so much that is impossible to manage from a single git repository. As far as I can tell this may never happen as it is obvious that much larger projects are perfectly well maintained in the exact same way.

The only downside is that build takes longer but honestly we found ways around that as well in the past and now with further improvements in the toolchains delivered by the awesome open-source communities around the world, I expect to see at least 10x improvement in deployment time in 2026.

Overall, in my own assessment, the decision to go for a monolith allowed us to build and scale much faster than if we had used micro services.

I hope this helps.

sethammons • 2 months ago

My experience is the opposite. I worked at SendGrid for a decade and we scaled the engineering org from a dozen to over 500 operating at scale sending billions of messages daily. Micro services. Well, services. The word micro messes people up.

I have also worked at half a dozen other shops with various architectures.

In every monolith, people violate separation of concerns and things are tightly coupled. I have only ever seen good engineering velocity happen when teams are decoupled from one another. I have only seen this happen in a (micro) service architecture.

Overall, in my own assessment, the decision to stick with a monolith has slowed down velocity and placed limits on scale at every other company I have been at and require changes towards decoupled services to be able to ship with any kind of velocity.

The place I just left took 2 years, over 50 teams, and over 150 individual contributors to launch a product that required us to move over an interface for sending messages from ORM querysets to DTOs. We needed to unlock our ability to start rearchitecting the modules because before it was impossible to know the actual edges of the system and how it used the data. This was incredibly expensive and hard and would never have happened but for the ability to reach into other's domains and making assumptions making things hard.

Don't couple systems. Micro services are the only arch I have seen successfully do this.

quails8mydog • 1 month ago

I wonder if people are talking about the same things in these discussions. 50 people working on the same deployable in the same repo is going to create friction. Similarly, having a few people work on 50 deployables across 50 repos will create challenges.

You need to scope services appropriately. A single small team shouldn't break their application down for the sake of doing microservices, but once you have multiple teams working on the same codebase splitting it up will probably help.

gorgoiler • 2 months ago

You say that every monolith you’ve seen has devolved into bad engineering — coupling, crossing boundaries. What was missing that could have stopped this? A missing service boundary you’d say, but also a lack of engineering leadership or lack of others’ experience? No code review? A commercial non-founder CEO pushing for results at the expense of codebase serviceability? Using a low-ceremony language (no types, no interfaces)?

You can stop coupling by enforcing boundaries. Repository boundaries are extremely solid ones. Too solid for some people, making it unnecessarily hard to coordinate changes on either side of the boundary. Barely solid enough for others, where it’s clearly too dangerous to let their devs work without solid brick walls keeping their hackers apart.

Coupling, smudged boundaries, and incoherence are symptoms of something more fundamental than simply we didn’t use services like we should have. If everyone’s getting colds off each other it’s because of bad hygiene in the office. Sure, you could force them to remain in their cubicles or stay at home but you could also teach them to wash their hands!

pdimitar • 2 months ago

Generalizations don't help almost any discussion. Even if that's 100% of what you ever saw, many people have seen mixed (or entirely in the other extreme).

> In every monolith, people violate separation of concerns and things are tightly coupled. I have only ever seen good engineering velocity happen when teams are decoupled from one another. I have only seen this happen in a (micro) service architecture.

I would write this off as indifferent or incompetent tech leadership. Even languages that people call obscure -- like Elixir that I mostly work with -- have excellent libraries and tools that can help you draw enforceable boundaries. Put that in CI or even in pre-commit hook. Job done.

Why was that never done?

Of course people will default to the easier route. It's on tech leadership to keep them to higher standards.

sethammons • 2 months ago

Funny you mention Elixir. At one company, we passed around Ecto querysets. It started when the company was smaller. Then someone needed a little bit of analytics. And a few years of organic growth later and the system was bogged down. Queries where joining all over the place, and separating out the analytics from everything else was, again, a major undertaking.

I would love to see a counter example in real life at an org with over a dozen teams. A well working monolith and a well working monorepo are like unicorns; I don't believe they exist and everyone is talking about and trying to sell their mutant goat as one.

pdimitar • 2 months ago

I am not selling you anything so you're starting off from a wrong premise.

What I said is that you should consider your experience prone to a bubble environment and as such it's very anecdotal. So is mine (a mix), granted. Which only means that neither extreme dominates out there. Likely a normal bell curve distribution.

What I did say (along with others) was that a little bit of technical discipline -- accentuating on "little" here -- nullifies the stated benefits of microservice architecture.

And it seems to me that the microservice architecture was chosen to overcome organizational problems, not technical ones.

MrDarcy • 2 months ago

Reading it with hindsight, their problems have less to do with the technical trade off of micro or monolith services and much more to do with the quality and organizational structure of their engineering department. The decisions and reasons given shine a light on the quality. The repository and test layout shine a light on the structure.

Given the quality and the structure neither approach really matters much. The root problems are elsewhere.

CharlieDigital • 2 months ago

My observation is that many teams lack strong "technical discipline"; someone that says "no, don't do that", makes the case, and takes a stand. It's easy to let the complexity genie out of the bottle if the team doesn't have someone like this with enough clout/authority to actually make the team pause.

Aeolun • 2 months ago

I think the problem is that this microservices vs monolith decision is a really hard one to convince people of. I made a passionate case for ECS instead of lambda for a long time, but only after the rest of the team and leadership see the problems the popular strategy generates do we get something approaching uptake (and the balance has already shifted to kubernetes instead, which is at least better)

CharlieDigital • 2 months ago

    > I made a passionate case...

My experience is that it is less about passion and more about reason.

There's a lot of good research and writing on this topic. This paper, in particular has been really helpful for my cause: https://dl.acm.org/doi/pdf/10.1145/3593856.3595909

It has a lot going for it: 1) it's from Google, 2) it's easy to read and digest, 3) it makes a really clear case for monoliths.

Otek • 2 months ago

I 100% agree with you but also sad fact is that it’s easy to understand why people don’t want to take this role. You can make enemies easily, you need to deliver “bad news” and convince people to put more effort or prove that effort they did was not enough. Why bother when you probably won’t be the one that have to clean it up

CharlieDigital • 2 months ago

    > You can make enemies easily...

Short term, definitely. In the long tail? If you are right more than you are wrong, then that manifests as respect.

AlwaysRock • 2 months ago

Ha! I wish I worked at the places you have worked!

panny • 2 months ago

>the quality and organizational structure of their engineering department

You're not kidding. I had to work with twilio on a project and it was awful. Any time there was an issue with the API, they'd never delve into why that issue had happened. They'd simply fix the data in their database and close the ticket. We'd have the same issue over and over and over again and they'd never make any effort to fix the cause of the problems.

iamflimflam1 • 2 months ago

This is probably the first time I’ve seen a human use the word “delve”.

It immediately triggered my - is this AI?

SatvikBeri • 2 months ago

Maybe you just don't read many books written after the year 2000? It was a pretty common word even before ChatGPT: https://books.google.com/ngrams/graph?content=delve&year_sta...

But perhaps the most famous source is Tolkien: "The Dwarves tell no tale; but even as mithril was the foundation of their wealth, so also it was their destruction: they delved too greedily and too deep, and disturbed that from which they fled, Durin's Bane."

necovek • 2 months ago

As a non-native speaker, I read a lot of fantasy and science fiction books in English. I use "delve" regularly (I wouldn't say "frequently" though). Not sure if it's Terry Pratchett's Discworld influence, but plenty of archaic sounding words there.

I did not even know it was considered uncommon and archaic, tbh.

ramraj07 • 2 months ago

People from different countries especially where English is not their first language often have more esoteric words in their vocabulary.

nutjob2 • 2 months ago

I guess there may be regional differences but delve is a commonly used word for native speakers.

monkaiju • 2 months ago

Conway's Law shines again!

It's amazing how much explanatory power it has, to the point that I can predict at least some traits about a company's codebase during an interview process, without directly asking them about it.

machomaster • 2 months ago

In this case, the more applicable are:

1. "Peter principle": "people in a hierarchy and organizations tend to rise to 'a level of respective incompetence' "

2. "Parkinson's law": "Work expands to fill the available time".

So people are filling all the available the time and working tirelessly to reach their personal and organizational levels of incompetency; working hard without stopping to think if what they are doing should be done at all. And nobody is stopping them, nobody asks why (with the real analysis of positives, negatives, risks).

Incompetent + driven is the worst combination there can be.

cindyllm • 2 months ago

[dead]

necovek • 2 months ago

A few thoughts: this is not really a move to a monolith. Their system is still a SOA (service-oriented architecture), just like microservices (make services as small as they can be), but with larger scope.

Having 140 services managed by what sounds like one team reinforces another point that I believe should be well known by now: you use SOAs (incuding microservices) to scale teams, and not services.

Eg. if a single team builds a shared library for all the 140 microservices and needs to maintain them, it's going to become very expensive quickly: you'll be using v2.3.1 in one service and v1.0.8 in another, and you won't even know yourself what API is available. Operationally, yes, you'll have to watch over 140 individual "systems" too.

There are ways to mitigate this, but they have their own trade-offs (I've posted them in another comment).

As per Conway's law, software architecture always follows the organizational structure, and this seems to have happened here: a single team is moving away from unneeded complexity to more effectively manage their work and produce better outcomes for the business.

It is not a monolith, but properly-scoped service level (scoped to the team). This is, in my experience, the sweet spot. A single team can run and operate multiple independent services, but with growth in those services, they will look to unify, so you need to restructure the team if you don't want that to happen. This is why I don't accept "system architect" roles as those don't give you the tools to really drive the architecture how it can be driven, and I really got into "management" :)

rtpg • 2 months ago

I am _not_ a microservices guy (like... at all) but reading this the "monorepo"/"microservices" false dichotomy stands out to me.

I think way too much tooling assumes 1:1 pairings between services and repos (_especially_ CI work). In huge orgs Git/whatever VCS you're using would have problems with everything in one repo, but I do think that there's loads of value in having everything in one spot even if it's all deployed more or less independently.

But so many settings and workflows couple repos together so it's hard to even have a frontend and backend in the same place if both teams manage those differently. So you end up having to mess around with N repos and can't send the one cross-cutting pull request very easily.

I would very much like to see improvements on this front, where one repo could still be split up on the forge side (or the CI side) in interesting ways, so review friction and local dev work friction can go down.

(shorter: github and friends should let me point to a folder and say that this is a different thing, without me having to interact with git submodules. I think this is easier than it used to be _but_)

GeneralMayhem • 2 months ago

I worked on building this at $PREV_EMPLOYER. We used a single repo for many services, so that you could run tests on all affected binaries/downstream libraries when a library changed.

We used Bazel to maintain the dependency tree, and then triggered builds based on a custom Github Actions hook that would use `bazel query` to find the transitive closure of affected targets. Then, if anything in a directory was affected, we'd trigger the set of tests defined in a config file in that directory (defaulting to :...), each as its own workflow run that would block PR submission. That worked really well, with the only real limiting factor being the ultimate upper limit of a repo in Github, but of course took a fair amount (a few SWE-months) to build all the tooling.

physicles • 2 months ago

We’re in the middle of this right now. Go makes this easier: there’s a go CLI command that you can use to list a package’s dependencies, which can be cross-referenced with recent git changes. (duplicating the dependency graph in another build tool is a non-starter for me) But there are corner cases that we’re currently working through.

This, and if you want build + deploy that’s faster than doing it manually from your dev machine, you pay $$$ for either something like Depot, or a beefy VM to host CI.

A bit more work on those dependency corner cases, along with an auto-sleeping VM, should let us achieve nirvana. But it’s not like we have a lot of spare time on our small team.

GeneralMayhem • 2 months ago

Go with Bazel gives you a couple options:

* You can use gazelle to auto-generate Bazel rules across many modules - I think the most up to date usage guide is https://github.com/bazel-contrib/rules_go/blob/master/docs/g....

* In addition, you can make your life a lot easier by just making the whole repo a single Go module. Having done the alternate path - trying to keep go.mod and Bazel build files in sync - I would definitely recommend only one module per repo unless you have a very high pain tolerance or actually need to be able to import pieces of the repo with standard Go tooling.

> a beefy VM to host CI

Unless you really need to self-host, Github Actions or GCP Cloud Build can be set up to reference a shared Bazel cache server, which lets builds be quite snappy since it doesn't have to rebuild any leaves that haven't changed.

Seattle3503 • 2 months ago

I've heard horror stories about Bazel, but a lot of them involve either not getting full buy in from the developer team or not investing in building out Bazel correctly. A few months of developer time upfront does seem like a steep ask.

carlm42 • 2 months ago

You're pointing out exactly what bothered me with this post in the first place: "we moved from microservices to a monolith and our problems went away"... ... except the problems had not much to do with the service architecture but all to do with operational mistakes and insufficient tooling: bad CI, bad autoscaling, bad oncall.

maxdo • 2 months ago

Both approaches can fail. Especially in environments like Node.js or Python, there's a clear limit to how much code an event loop can handle before performance seriously degrades.

I managed a product where a team of 6–8 people handles 200+ microservices. I've also managed other teams at the same time on another product where 80+ people managed a monolith.

What i learned? Both approaches have pros and cons.

With microservices, it's much easier to push isolated changes with just one or two people. At the same time, global changes become significantly harder.

That's the trade-off, and your mental model needs to align with your business logic. If your software solves a tightly connected business problem, microservices probably aren't the right fit.

On the other hand, if you have a multitude of integrations with different lifecycles but a stable internal protocol, microservices can be a lifesaver.

If someone tries to tell you one approach is universally better, they're being dogmatic/religious rather than rational.

Ultimately, it's not about architecture, it's about how you build abstractions and approach testing and decoupling.

dragonwriter • 2 months ago

> If your software solves a tightly connected business problem, microservices probably aren't the right fit.

If your software solves a single business problem, it probably belongs in a single (still micro!) service under the theory underlying microservices, in which the "micro" is defined in business terms.

If you are building services at a lower level than that, they aren't microservices (they may be nanoservices.)

Seattle3503 • 2 months ago

How do people usually slice a single business problem?

rozap • 2 months ago

To me this rationalization has always felt like duct tape over the real problem, which is that the runtime is poorly suited to what people are trying to do.

These problems are effectively solved on beam, the jvm, rust, go, etc.

strken • 2 months ago

Can you explain a bit more about what you mean by a limit on how much code an event loop can handle? What's the limit, numerically, and which units does it use? Are you running out of CPU cache?

BoorishBears • 2 months ago

Most people don't realize their applications are running like dogwater on Node because serverless is letting them smooth it over by paying 4x what they would be paying if they moved 10 or so lines of code and a few regexes to a web worker.

(and I say that as someone who caught themselves doing the same: severless is really good at hiding this.)

joker666 • 2 months ago

I assume he means, how much work you let the event loop do without yielding. It doesn't matter if there's 200K lines of code but no real traffic to keep the event loop busy.

rubenvanwyk • 2 months ago

Wait, do people at scale use NodeJS and Python for services? I assume always it’s Go, Java, C# etc.

Nextgrid • 2 months ago

Depends on your definition of "scale", but yes. I ran an app serving ~1k requests/second from a Django monolith around 2017, distributed across ~20 Heroku "dynos". Nowadays a couple bare-metal servers will handle this.

waterproof • 2 months ago

The only rationale given for the initial switch to microservices is this:

> Initially, when the destinations were divided into separate services, all of the code lived in one repo. A huge point of frustration was that a single broken test caused tests to fail across all destinations.

You kept breaking tests in main so you thought the solution was to revamp your entire codebase structure? Seems a bit backward.

kgeist • 2 months ago

We had a similar problem in our monolith. Team #1 works on a feature, their code breaks tests. Team #2 works on another feature, their code is OK, but they can't move forward because of the failing tests from team #1. Plus often it takes additional time to figure out if it the tests fail because of feature #1 or feature #2, and who must fix them.

We solved it by simply giving every team their own dev environment (before merging to main). So if tests break in feature #1, it doesn't break anything for feature #2 or team #2. It's all confined to their environment. It's just an additional VM + a config in CI/CD. The only downside of this is that if there are conflicts between features they won't be caught immediately (only after one of the teams finally merges to main). But in our case it wasn't a problem because different teams rarely worked on the same parts of the monorepo at the same time due to explicit code ownership.

wg0 • 2 months ago

Thanks. It was a stupid most idea for MOST shops. I think maybe it works for AWS, Google and Netflix but everywhere in my career, I saw 90% of the problem was due to microservices.

Diving system into composable parts is a very very difficult problem already and it is only foolish to introduce further network boundaries between them.

Next comeback I see is away from React and SPAs as view transitions become more common.

nyrikki • 2 months ago

> Once the code for all destinations lived in a single repo, they could be merged into a single service. With every destination living in one service, our developer productivity substantially improved. We no longer had to deploy 140+ services for a change to one of the shared libraries. One engineer can deploy the service in a matter of minutes.

This is the problem with the undefined nature of the term `microservices`, In my experience if you can't develop in a way that allows you to deploy all services independently and without coordination between services, it may not be a good fit for your orgs needs.

In the parent SOA(v2), what they described is a well known anti-pattern: [0]

    Application Silos to SOA Silos
       * Doing SOA right is not just about technology. It also requires optimal cross-team communications. 
    Web Service Sprawl
        * Create services only where and when they are needed. Target areas of greatest ROI, and avoid the service sprawl headache.

If you cannot, due to technical or political reasons, retain the ability to independently deploy a service, no matter if you choose to actually independently deploy, you will not gain most of the advantages that were the original selling point of microservices, which had to do more with organizational scaling than technical conserns.

There are other reasons to consider the pattern, especially due to the tooling available, but it is simply not a silver bullet.

And yes, I get that not everyone is going to accept Chris Richardson's definitions[1], but even in more modern versions of this, people always seem to run into the most problems because they try to shove it in a place where the pattern isn't appropriate, or isn't possible.

But kudos to Twilio for doing what every team should be, reassessing if their previous decisions were still valid and moving forward with new choices when they aren't.

[0] https://www.oracle.com/technetwork/topics/entarch/oea-soa-an... [1] https://microservices.io/post/architecture/2022/05/04/micros...

yearolinuxdsktp • 2 months ago

I would caution that microservices should be architected with technical concerns first—-being able to deploy independently is a valid technical concern too.

Doing it for organizational scaling can lead to insular vision with turf defensive attitude, as teams are rewarded on the individual service’s performance and not the complete product’s performance. Also refactoring services now means organizational refactoring, so the friction to refactor is massively increased.

I agree that patterns should be used where most appropriate, instead of blindly.

What pains me is that a language like “Cloud-Native” has been usurped to mean microservices. Did Twilio just stop having a “Cloud-Native” product due to shipping a monolith? According to CNCF, yes. According to reason, no.

develatio • 2 months ago

can you add [2018] to the title, please?

andrewmuia • 2 months ago

No kidding, not cool to be rehashing an article that is 7 years old. In tech terms, that is antiquity.

pmbanugo • 2 months ago

have they reverted to microservices?

Towaway69 • 2 months ago

Mono services in a micro repository. /s

mlhpdx • 2 months ago

Wow. Their experience could not be more different than mine. As I’m contemplating the first year of my startup I’ve tallied 6000 deployments and 99.997 percent uptime and a low single digit rollback percentage (MTTR in low single digit minutes and fractional, single cell impact for them so far). While I’m sure it’s possible for a solo entrepreneur to hit numbers like that with a monolith I have never done so, and haven’t see others do so.

Edit: I’d love to eat the humble pie here. If you have examples of places where monoliths are updated 10-20 times a day by a small (or large) team post the link. I’ll read them all.

AlotOfReading • 2 months ago

The idea of deploying to production 10-20 times per day sounds terrifying. What's the rationale for doing so?

I'll assume you're not writing enough bugs that customers are reporting 10-20 new ones per day, but that leaves me confused why you would want to expose customers to that much churn. If we assume an observable issue results in a rollback and you're only rolling back 1-2% of the time (very impressive), once a month or so customers should experience observable issues across multiple subsequent days. That would turn me off making a service integral to my workflow.

mlhpdx • 2 months ago

Speed is the rationale. I have zero hesitation to deploy and am extremely well practiced at decomposing changes into a series of small safe changes at this point. So maybe it's a single spelling correction, or perhaps it's the backend for a new service integration -- it's all the same to me.

Churn is kind of a loaded word, I'd just call it change. Improvements, efficiencies, additions and yes, of course, fixes.

It may be a little unfair to compare monoliths with distributed services when it comes to deployments. I often deploy three services (sometimes more) to implement a new feature, and that wouldn't be the case with a monolith. So 100% there is a lower number of deploys needed in that world (I know, I've been there). Unfortunately, there is also a natural friction that prevents deploying things as they become available. Google called that latency out in DORA for a reason.

et1337 • 2 months ago

If something is difficult or scary, do it more often. Smaller changes are less risky. Code that is merged but not deployed is essentially “inventory” in the factory metaphor. You want to keep inventory low. If the distance between the main branch and production is kept low, then you can always feel pretty confident that the main branch is in a good state, or at least close to one. That’s invaluable when you inevitably need to ship an emergency fix. You can just commit the fix to main instead of trying to find a known good version and patching it. And when a deployment does break something, you’ll have a much smaller diff to search for the problem.

AlotOfReading • 2 months ago

There's a lot of middle ground between "deploy to production 20x a day" and "deploy so infrequently that you forget how to deploy". Like, once a day? I have nothing against emergency fixes, unless you're doing them 9-19x a day. Hotfixes should be uncommon (neither rare nor standard practice).

sethammons • 2 months ago

Org size matters. A team of 500 should be deploying multiple times per day.

sriku • 2 months ago

In a discussion I was in recently, a participant mentioned "culture eats strategy for breakfast" .. which perhaps makes sense in this context. Be bold enough to do what makes the team and the product thrive.

TZubiri • 2 months ago

This is not the first time that an engineer working at a big company thinks they are using a monolith when in reality they are a small team in charge of a single microservice, which in turn is part of a company that definitely does not run a monolith.

Last time it was an aws engineer that worked on route 53, and they dismissed microservices in a startup claiming that in AWS they ran a monolith (as in the r53 dns).

Everything is a monolith if you zoom in enough and ignore everything else. Which I guess you can do when you work on a big company and are in charge of a very specific role.

wiradikusuma • 2 months ago

> With everything running in a monolith, if a bug is introduced in one destination that causes the service to crash, the service will crash for all destinations

We can have a service with 100 features, but only enable the features relevant to a given "purpose". That way, we can still have "micro services" but they're running the same code: "bla.exe -foo" and "bla.exe -bar".

chmod775 • 2 months ago

In practice most monoliths turned into "microservices" are just monoliths in disguise. They still have most of the failure modes of the original monolith, but now with all the complexity and considerable challenges of distributed computing layered on top.

Microservices as a goal is mostly touted by people who don't know what the heck they're doing - the kind of people who tend to mistakenly believe blind adherence to one philosophy or the other will help them turn their shoddy work into something passable.

Engineer something that makes sense. If, once you're done, whatever you've built fits the description of "monolith" or "microservices", that's fine.

However if you're just following some cult hoping it works out for your particular use-case, it's time to reevaluate whether you've chosen the right profession.

Nextgrid • 2 months ago

Microservices were a fad during a period where complexity and solving self-inflicted problems were rewarded more than building an actual sustainable business. It was purely a career- & resume-polishing move for everyone involved.

Putting this anywhere near "engineering" is an insult to even the shoddiest, OceanGate-levels of engineering.

abernard1 • 2 months ago

I remember when microservices were introduced and they were solving real problems around 1) independent technological decisions with languages, data stores, and scaling, and 2) separating team development processes. They came out of Amazon, eBay, Google and a host of successful tech titans that were definitely doing "engineering." The Bezos mandate for APIs in 2002 was the beginning of that era.

It was when the "microservices considered harmful" articles started popping up that microservices had become a fad. Most of the HN early-startup energy will continue to do monoliths because of team communication reasons. And I predict that if any of those startups are successful, they will have need for separate services for engineering reasons. If anything, the historical faddishness of HN shows that hackers pick the new and novel because that's who they are, for better or worse.

gostsamo • 2 months ago

This is a horror story of being totally unable to understand your product and its behavior and throwing people and resources in large rewrites to only learn that you still don't understand your product and its behavior. Badly done tests used as a justification to write multiple suites of badly done tests and it is all blamed on the architecture.

tonymet • 2 months ago

Your “microservice” is just a clumsy & slow symbol lookup over the network, at 1000x the cpu and 10000x the latency.

AndrewKemendo • 2 months ago

> Microservices is a service-oriented software architecture in which server-side applications are constructed by combining many single-purpose, low-footprint network services.

Gonna stop you right there.

Microservices have nothing to do with the hosting or operating architecture.

Martin Fowler who formalized the term, Microservices are:

“In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery”

You can have an entirely local application built on the “microservice architectural style.”

Saying they are “often HTTP and API” is besides the point.

The problem Twilio actually describe is that they messed up service granularity and distributed systems engineering processes

Twilio's experience was not a failure of the microservice architectural style. This was a failure to correctly define service boundaries based on business capabilities.

Their struggles with serialization, network hops, and complex queueing were symptoms of building a distributed monolith, which they finally made explicit with this move. So they accidentally built a system with the overhead of distribution but the tight coupling of a single application. Now they are making their foundations of architecture fit what they built, likely cause they poorly planned it.

The true lesson is that correctly applying microservices requires insanely hard domain modeling and iteration and meticulous attention to the "Distributed Systems Premium."

https://martinfowler.com/microservices/

Scubabear68 • 2 months ago

Please don’t fall into the Fowler-said-so trap.

Just because he says something does not mean Fowler “formalized the term”. Martin wrote about every topic under the sun, and he loved renaming and or redefining things to fit his world view, and incidentally drive people not just to his blog but also to his consultancy, Thoughtworks.

PS The “single application” line shows how dated Fowlers view were then and certainly are today.

abernard1 • 2 months ago

I've been developing under that understanding since before Fowler-said-so. His take is simply a description of a phenomenon predating the moniker of microservices. SOA with things like CORBA, WSDL, UDDI, Java services in app servers etc. was a take on service oriented architectures that had many problems.

Anyone who has ever developed in a Java codebase with "Service" and then "ServiceImpl"s everywhere can see the lineage of that model. Services were supposed to be the API, and the implementation provided in a separate process container. Microservices signalled a time where SOA without Java as a pre-requisite had been successful in large tech companies. They had reached the point of needing even more granular breakout and a reduction of reliance on Java. HTTP interfaces was an enabler of that. 2010s era microservices people never understood the basics, and many don't even know what they're criticizing.

Scubabear68 • 2 months ago

I think you are confusing limitations of Java at the time with something else. Interfaces everywhere and single implementation classes has nothing at all to do with Microservices or SOA.

AndrewKemendo • 2 months ago

Thank you this is the point

honkycat • 2 months ago

I don't care how it is done just dont rely on your database schema for data modeling and business logic

majgr • 2 months ago

I have feeling that microservices improve overall design when they can live on their own, as microapps perhaps, also with their own UI. What is the point of service if it is not usable beyond its original design and just bound to other similar services?

yieldcrv • 2 months ago

I feel like microservices have gotten a lot easier over the last 7 years from when Twilio experienced this, not just from my experience but from refinements in architectures

There are infinite permutations in architecture and we've collectively narrowed them down to things that are cheap to deploy, automatically scale for low costs, and easily replicable with a simple script

We should be talking about how AI knows those scripts too and can synthetize adjustments, dedicated Site Reliability Engineers and DevOps is great for maintaining convoluted legacy setups, but irrelevant for doing the same thing from scratch nowadays

eYrKEC2 • 2 months ago

You know what I think is better than a push of the CPU stack pointer and a jump to a library?

A network call. Because nothing could be better for your code than putting the INTERNET into the middle of your application.

--

The "micro" of microservices has always been ridiculous.

If it can run on one machine then do it. Otherwise you have to deal with networking. Only do networking when you have to. Not as a hobby, unless your program really is a hobby.

NeutralCrane • 2 months ago

Microservices have nothing to do with the underlying hosting architecture. Microservices can all run and communicate on a single machine. There will be a local network involved, but it absolutely does require the internet or multiple machines.

yieldcrv • 2 months ago

it's not really "micro" but more so "discreet" as in special purpose, one off. to ensure consistent performance, as opposed to shared performance.

yes, networking is the bottleneck between the processes, while one machine is the bottleneck to end users

Nextgrid • 2 months ago

> one machine is the bottleneck to end users

You can run your monolith on multiple machines and round-robin end-user requests between them. Your state is in the DB anyway.

yieldcrv • 2 months ago

I do bare metal sometimes and I like the advances in virtualization for many processes there too

ikiris • 2 months ago

Not everything you think you know is right.

https://github.com/sirupsen/napkin-math

josephg • 2 months ago

Well implemented network hardware can have high bandwidth and low latency. But that doesn't get around the complexity and headaches it brings. Even with the best fiber optics, wires can be cut or tripped over. Controllers can fail. Drivers can be buggy. Networks can be misconfigured. And so on. Any request - even sent over a local network - can and will fail on you eventually. And you can't really make a microservice system keep working properly when links start failing.

Local function calls are infinitely more reliable. The main operational downside with a binary monolith is that a bug in one part of the program will crash the whole thing. Honestly, I still think Erlang got it right here with supervisor trees. Use "microservices". But let them all live on the same computer, in the same process. And add tooling to the runtime environment to allow individual "services" to fail or get replaced without taking down the rest of the system.

moltar • 2 months ago

Do you have any recommended reading on the topic of refinements in architectures? Thank you.

otterley • 2 months ago

Discussion in 2018, when this blog post was published: https://news.ycombinator.com/item?id=17499137

mcrk • 2 months ago

They obviously read https://grugbrain.dev/#grug-on-microservices

bob1029 • 2 months ago

Monolith is definitely what you want to start with.

Being able to ~instantly obtain a perfect list of all references to all symbols is an extraordinarily powerful capability. The stronger the type system, the more leverage you get. If you have only ever had experience with weak type systems or poor tooling, I could understand how the notion of putting everything into one compilation context seems pointless.

sethammons • 2 months ago

/me raises hand. Any system that passes querysets around for one. Can't know who is using what.

axelthegerman • 2 months ago

> we had to spend time fixing the broken test even if the changes had nothing to do with the initial change. In response to this problem, it was decided to break out the code for each destination into their own repos.

One could also change the way tests are run or selected. Or allow manual overrides to still deploy. Separating repos doesn't sound like the only logical solution

andrewprock • 2 months ago

The whole point of micro-services is to manage dependencies independently across service boundaries, using the API as the contract, not the internal libraries.

Then you can implement a service in Java, Python, Rust, C++, etc, and it doesn't matter.

Coupling your postgres db to your elasticsearch cluster via a hard library dependency impossibly heavy. The same insight applies to your bespoke services.

pss314 • 2 months ago

A recent blog post from Docker mentions about Twilio and Amazon Prime Video seeing gains by moving away from microservices to monolith

You Want Microservices, But Do You Really Need Them? https://www.docker.com/blog/do-you-really-need-microservices...

doctor_phil • 2 months ago

I don't think this blog post reflects so well on this engineering team. Kudos to them to be so transparent about it though. "We had so many flaky tests that depended on 3rd parties that broke pipelines that we decided on micro-services" is not something I would put on my CV at least.

electromech • 2 months ago

That seems unfair. There's a lot we don't know about the politics behind the scenes. I'd bet that the individuals who created the microservice architecture aren't the same people who re-consolidated them into one service. If true, the authors of the article are being generous to the original creators of the microservices, which I think reflects well on them for not badmouthing their predecessors.

andrewstuart • 2 months ago

You can have infrastructure complexity (microservices) or trade it for development complexity (monolith).

Choose one.

akoumjian • 2 months ago

I humbly post this little widget to help your team decide if some functionality warrants being a separate service or not: https://mulch.dev/service-scorecard/

ShakataGaNai • 2 months ago

Too much of anything sucks. Too big of a monolith? Sucks. Too many microservices? Sucks. Getting the right balance is HARD.

Plus, it's ALWAYS easier/better to run v2 of something when you completely re-write v1 from scratch. The article could have just as easily been "Why Segment moved from 100 microservices to 5" or "Why Segment rewrote every microservice". The benefits of hindsight and real-world data shouldn't be undersold.

At the end of the day, write something, get it out there. Make decisions, accept some of them will be wrong. Be willing to correct for those mistakes or at least accept they will be a pain for a while.

In short: No matter what you do the first time around... it's wrong.

dev_l1x_be • 2 months ago

I can't believe how many times I have seen companies trying to implement microservices with multirepo and get lost in the access management, versioning and ending up producing a fragile, overly complex hot mess.

brightstep • 2 months ago

They have a monolith but struggle with individual subsystem failures bringing down the whole thing. Sounds like they would benefit from Elixir’s isolated, fail-fast architecture.

renewiltord • 2 months ago

Is it 2018? Are you guys going to repost the MySQL DB as a queue story again? Perhaps an announcement that you’re migrating to Java 9 and what you learned about generics?

btown • 2 months ago

Some important context to this 2018 article is given here: https://www.twilio.com/en-us/blog/archive/2018/introducing-c...

TL;DR they have a highly partitioned job database, where a job is a delivery of a specific event to a specific destination, and each partition is acted upon by at-most-one worker at a time, so lock contention is only at the infrastructure level.

In that context, each worker can handle a similar balanced workload between destinations, with a fraction of production traffic, so a monorepo makes all the sense in the world.

IMO it speaks to the way in which microservices can be a way to enforce good boundaries between teams... but the drawbacks are significant, and a cross-team review process for API changes and extensions can be equally effective and enable simplified architectures that sidestep many distributed-system problems at scale.

abernard1 • 2 months ago

They also failed as a company, which is why that's on Twilio's blog now. So there's that. Undoubtedly their microservices architecture was a bad fit because of how technically focused the product was. But their solution with a monolith didn't have the desired effect either.

btown • 2 months ago

Failed? It was a $3.2B acquisition with a total of 283M raised. I don’t see any way that’s a failure.

That said I’m curious if you’re basing this on service degradation you’ve seen since the acquisition. We were thinking of starting to use them - is that a bad move?

abernard1 • 2 months ago

By all means use Segment. Segment was a great technology with an incredible technical vision for what they wanted to do. I was in conversations in that office on Market far beyond what they ended up doing post-acquisition.

But a company that can't stand on its own isn't a success in my opinion. Similar things can be said about companies that continue to need round after round of funding without an IPO.

My comment is of the "(2018)" variety. Old news that didn't age well like the people jumping on the "Uber: why we switched to MySQL from Postgres" post. (How many people would choose that decision today?)

People tend to divorce the actual results of a lot of these companies from the gripes of the developers of the tech blogs.

readthenotes1 • 2 months ago

Some this sounds like the journey to ejb's and back.

mikert89 • 2 months ago

"Microservices is the software industry’s most successful confidence scam. It convinces small teams that they are “thinking big” while systematically destroying their ability to move at all. It flatters ambition by weaponizing insecurity: if you’re not running a constellation of services, are you even a real company? Never mind that this architecture was invented to cope with organizational dysfunction at planetary scale. Now it’s being prescribed to teams that still share a Slack channel and a lunch table.

Small teams run on shared context. That is their superpower. Everyone can reason end-to-end. Everyone can change anything. Microservices vaporize that advantage on contact. They replace shared understanding with distributed ignorance. No one owns the whole anymore. Everyone owns a shard. The system becomes something that merely happens to the team, rather than something the team actively understands. This isn’t sophistication. It’s abdication.

Then comes the operational farce. Each service demands its own pipeline, secrets, alerts, metrics, dashboards, permissions, backups, and rituals of appeasement. You don’t “deploy” anymore—you synchronize a fleet. One bug now requires a multi-service autopsy. A feature release becomes a coordination exercise across artificial borders you invented for no reason. You didn’t simplify your system. You shattered it and called the debris “architecture.”

Microservices also lock incompetence in amber. You are forced to define APIs before you understand your own business. Guesses become contracts. Bad ideas become permanent dependencies. Every early mistake metastasizes through the network. In a monolith, wrong thinking is corrected with a refactor. In microservices, wrong thinking becomes infrastructure. You don’t just regret it—you host it, version it, and monitor it.

The claim that monoliths don’t scale is one of the dumbest lies in modern engineering folklore. What doesn’t scale is chaos. What doesn’t scale is process cosplay. What doesn’t scale is pretending you’re Netflix while shipping a glorified CRUD app. Monoliths scale just fine when teams have discipline, tests, and restraint. But restraint isn’t fashionable, and boring doesn’t make conference talks.

Microservices for small teams is not a technical mistake—it is a philosophical failure. It announces, loudly, that the team does not trust itself to understand its own system. It replaces accountability with protocol and momentum with middleware. You don’t get “future proofing.” You get permanent drag. And by the time you finally earn the scale that might justify this circus, your speed, your clarity, and your product instincts will already be gone."

-DHH

untwerp • 2 months ago

Also from DHH: microservices were a zero-interest rate phenomena https://youtu.be/iqXjGiQ_D-A?t=924

sethammons • 2 months ago

I left Twilio in 2018. I spent a decade at SendGrid. I spent a small time in Segment.

The shitty arch is not a point against (micro)services. SendGrid, another Twilio property, uses (micro)services to great effect. Services there were fully independently deployable.

joeyguerra • 2 months ago

Cool.

shoo • 2 months ago

Great writeup. Much of this is more about testing, how package dependencies are expressed and many-repo/singlerepo tradeoffs than "microservices"!

Maintaining and testing a codebase containing many external integrations ("Destinations") was one of the drivers behind the earlier decision to shatter into many repos, to isolate the impact of Destination-specific test suite failures caused because some tests were actually testing integration to external 3rd party services.

One way to think about that situation is in terms of packages, their dependency structure, how those dependencies are expressed (e.g. decoupled via versioned artefact releases, directly coupled via monorepo style source checkout), their rates of change, and the quality of their automated tests suites (high quality meaning the test suite runs really fast, tests only the thing it is meant to test, has low rates of false negatives and false positives, low quality meaning the opposite).

Their initial situation was one that rapidly becomes unworkable: a shared library package undergoing a high rate of change depended on by many Destination packages, each with low quality test suites, where the dependencies were expressed in a directly-coupled way by virtue of everything existing in a single repo.

There's a general principle here: multiple packages in a single repo with directly-coupled dependencies, where those packages have test suites with wildly varying levels of quality, quickly becomes a nightmare to maintain. The packages with low quality test suites that depend upon high quality rapidly changing shared packages generate spurious test failures that need to be triaged and slow down development. Maintainers of packages that depend upon rapidly changing shared package but do not have high quality test suites able to detect regressions may find their package frequently gets broken without anyone realising in time.

Their initial move solves this problem by shattering the single repo and trade directly-coupled dependencies with decoupled versioned dependencies, to decouple the rate of change of the shared package from the per Destination packages. That was an incremental improvement but added the complexity and overhead of maintaining multiple versions of the "shared" library and per-repo boilerplate, which grows over time as more Destinations are added or more changes are made to the shared library while deferring the work to upgrade and retest Destinations to use it.

Their later move was to reverse this, go back to directly-coupled dependencies, but instead improve the quality of their per-Destination test suites, particularly by introducing record/replay style testing of Destinations. Great move. This means that the test suite of each Destination is measuring "is the Destination package adhering to its contract in how it should integrate with the 3rd party API & integrate with the shared package?" without being conflated with testing stuff that's outside of the control of code in the repo (is the 3rd party service even up, etc).

blatherard • 2 months ago

(2018)

0xbadcafebee • 2 months ago

These "we moved from X to Y" posts are like Dunning-Kruger humblebrags. Yes, we all lack information and make mistakes. But there's never an explanation in these posts of how they've determined their new decision is any less erroneous than their old decision. It's like they threw darts at a wall and said "cool, that's our new system design (and SDLC)". If you have not built it yourself before, and have not studied in depth an identical system, just assume you are doing the wrong thing. Otherwise you are running towards another Dunning-Kruger pit.

If you have a company that writes software, please ask a professional software/systems architect to review your plans before you build. The initial decisions here would be a huge red flag to any experienced architect, and the subsequent decisions are full of hidden traps, and are setting them up for more failure. If you don't already have a very skilled architect on staff (99% chance you don't) you need to find one and consult with them. Otherwise your business will suffer from being trapped in unnecessary time-consuming expensive rework, or worse, the whole thing collapsing.

ram_shares • 2 months ago

The “distributed monolith” line is the key takeaway here.

Microservices only buy you something if teams can deploy, version, and reason about them independently. Once shared libraries or coordinated deploys creep in, you’ve taken on all the operational cost with none of the autonomy benefits.

I’ve seen monoliths with clear module boundaries outperform microservice setups by an order of magnitude in developer throughput.

YetAnotherNick • 2 months ago

If microservice or monolith is giving order of magnitude improvement in productivity, you clearly are doing something wrong, or having terrible practices.