Today we are exploring the fancy topic of breaking the monolith to microservices in telecom. Sound familiar? See, there is a lot of 🦜parroting in the content marketing world. Once in a while a good text appears (usually the outcome of years of thorough and thoughtful research), then “everybody” starts writing about the same topic. For example, Google launched a vast “breaking down the monolith” content hype back in summer 2014 when Kubernetes appeared. Amazon reacted with its content backlash, quickly realizing where things are going for AWS.
On Sinning 😈 and Content Marketing 🤓
“Wait, are you trying to confess your content marketing sins at my reading expense?” a careful reader might ask. Well, yes and no. The “yes” part: to be frank, we need to sell our products (and feed our families). This content is helping us turn you into – or keep you as – a happy customer. (Or at least it’s helping us arouse your interest… the first step in your soon-to-emerge customer journey with PortaOne.) After all, you wouldn’t be reading this without that blessed (or damned) Google search results page.
The “no” part: we write these stories because we want to share them with you and because they pass the “fresh and worthy” test within our team. We honestly never write clickbait chunks. See, it’s 2021, not 2014. The “breaking down the monolith to microservices in… [type your industry here, like: telecom]” hype is long gone. It ended around September 2020, according to Google Trends. And yet here we are, long after, sharing our own “getting hands dirty” and “making mistakes” stories with you. But these are not some “general thoughts and ideas” aimed only at capturing hot search keywords. As you’re about to find out.
So, to get us started, please enjoy the following piece of historical context.
The New Rulers of Constantinople
If you lived in Europe during the Middle Ages, you would have heard of the Ottomans. After the conquest of Constantinople, the Ottoman Empire expanded all over Europe and the Maghreb: from the Ukrainian steppes and the Danube River to the lush castles of Montenegro, Rhodes, Algeria, and Egypt. The Ottoman fleet was instrumental in building and sustaining the empire’s influence all over the region. It consisted of mighty vessels: oarsmen-propelled galleys and sailing galleons.
Each galley had around 150 to 200 oarsmen, a crew of 25 to 30 sailors, and a unit of about 60 fighting men. In some cases, depending on the vessel’s size, the total number of people on board could range up to 1500. And the typical Ottoman naval expedition had a dozen or more galleys.
There was no way to stand against such a tremendous armada. Or so it seemed… And, indeed, the Ottomans scored many victories over the Byzantium or the Eastern Roman Empire and the Venetian, Spanish and Genoan navies over the XIV to XVI centuries. These victories turned the Ottoman Empire into a regional superpower. But it was soon to face a different and unexpected enemy.
The Ottomans Get a Lesson in Agility
Life was tough for a medieval Ukrainian peasant. You spent your days working hard in the field, pausing now and then to celebrate the Greek Orthodox holidays. Then one day the nomads arrive from the south and take you and your family prisoner. They sell you at a slave market in Kaffa, Crimea, and you end up as an oarsman on that Ottoman galley, or as a sultan’s favorite wife. A brief clarification: this was before the tech cluster arrived in Ukraine with their outsourcing galleys (and clickbait content).
So what did you do to survive? You became a refugee, a constant nomad yourself. Maybe you’d settle on one of the many tiny islands in the lower reaches of the beautiful Dnipro River. Just to spend a cold winter there and then move on to another island. You became a fisherman and a warrior in one of the most unusual navies of its time.
Because Ottomans were the European pioneers in artillery, they equipped their ships with the best naval guns. But how can you use a gun against hundreds of small boats besieging you from all angles? The galleys were also among the fastest vessels in their class. Again, though: how can you pursue a “navy” that literally moves by land?
A Decentralized Navy with No Castles or Nobility
Ukrainian “peasant sailors” adopted the best Viking naval practices of their Norse ancestors and moved their Chaika boats by foot through the land and the small rivers of the “Wild Fields”, which they knew well as local fishermen. A galley would simply get stuck in those shallow waters!
And they also found a way to avert any “revenge mission”. The people who would later be known as Zaporozhian Cossacks built one of the first European republics. (Even earlier than the Dutch did, actually.) See, how can you enslave your enemies and take their castle when there is no castle to take and no noblemen to bribe or expel?
They lived in small seasonal huts, which they did not care about losing that much. Their families and most valuable resources were hidden in dugouts and winter hideaways scattered across the region. As the Coalition forces would discover centuries later: it’s tough to win a battle against an enemy that simply isn’t there.
Why Are Microservices 🔥Hot in Telecom, on Netflix, and Everywhere?
Ok, by now, you’ve realized what this “history lesson” is doing here. Yes, microservices are like a fisherman navy – sometimes able to best even the mightiest of monolith ships. Netflix (distributor of The Ottomans miniseries) learned this lesson when the big service disruption of 2008 hit their IT infrastructure. By that time, their main business model consisted of sending out DVDs to people fast. But for several days that year, some clients could not get their next portion of binge-watching. What a disaster!
The Chaos Monkey and the Simian Army
So Netflix broke down their monolith, migrated to AWS (becoming their poster child) and introduced the microservice architecture to their IT cluster. Then, to ensure the “2008 blackout” would never happen again, Netflix built the 🙈Chaos Monkey. What’s that? “A tool that randomly disables our production instances to make sure we can survive this common type of failure without any customer impact”, says Netflix.
Impressed? Well, so were the Netflix engineers. They followed it with the “Simian Army”, consisting of 🙉 Latency Monkey, 🐵 Conformity Monkey, 🐒 Security Monkey, 🙊 10–18 Monkey… You get the idea. There are more, but the monkey emoji set is more limited than the imagination of the Netflix engineering team. And if that’s not enough: there is also the 🦍Chaos Gorilla — a reliability microservice that “simulates an outage of an entire Amazon availability zone.”
Uber and Lyft Build Their Product with Microservices in Mind, While Telecom Starts Breaking Down Its Monoliths
Uber and arch rival Lyft followed. But Uber took the path of building their cloud. They leveraged different cloud providers, becoming a scarecrow for devotees of the AWS cult. Lyft, meanwhile, took a more classic “just leave it to Amazon” approach. See, Lyft started three years later than Uber. That meant their engineers could use Uber as an inspiration model, while taking advantage of the radical advances in cloud computing that had arrived by 2012. More advances, more space for innovation.
In telecom, our “home industry”, AT&T boasted of “already employing over 300 microservices” by the end of 2017, while Telefonica made a substantial microservices-based BSS platform replacement in Peru.
PortaSwitch to PortaOne iPaaS: Our Own Story of Breaking the Monolith to Microservices in Telecom
At this point, a reader could get the (wrong) impression that monolith software architecture is always a “evil”. But each architecture is suitable for its age. It’s like playing Minecraft. If you’re five years old? Welcome to the world of logic and programming, friend! If you’re 25 (and don’t have a 5-year-old kid or aren’t a math teacher at a primary school or a software engineer at Mojang)? Huh, it might be time for you to pursue other interests. Similarly, we were more than happy with the PortaSwitch monolith architecture for over a decade. And our clients were too. Then we felt we needed a change.
During the “Monolith Age”, each time a customer asked us to implement a new payment processor or support for a new phone model, we needed to put that new code into the upcoming MR. That wasn’t cool: the customer had to wait 7+ weeks, our product and sales teams had priority feuds, and our developers growled that “the damn thing is getting fatter each day”.
Obviously, it was time for breaking our monolith to microservices in telecom. Underneath that, though, was our team’s desire “to play with something new”. But how could we when our entire PortaBilling backend was over a million lines of Perl code?
There was a third factor in our quest for a solution. Think: business continuity. Sure, “microservices are the future” talks 🇺🇳 and @channel Slack longreads are nice. But they’re difficult to implement when you have several hundred actually_paying_clients to support day-to-day. So while we desperately needed (and wanted) the new architecture, we had to design it around a monolithic architecture, which had worked “just fine” for over a decade.
The Solution: Don’t Wreck the Monolith, Shave and Tunnel While Using the “Fishbone” 🐟
Somewhere around 2002, Jeff Bezos issued his API Mandate. While some people doubt the very existence of this document, others cite it as the turning point for Amazon. Four years later, in 2006, Amazon launched Amazon Web Services. (Yes, we’ve already used the AWS acronym several times by now.) AWS is now one of the largest cloud computing providers in the world. But at PortaOne, we don’t have an API Mandate yet. Still, we’ve built (and documented) a microservices API, which we dubbed “the 🐟 Fishbone”. While an API itself is pretty common, we’d like to share our unique approach to building it. If you’re thinking about breaking the monolith to microservices in telecom, we think you’ll find it interesting.
The Сommon Approach
This is when the whole “transition” process is engineer-driven. It involves “blowing up” the monolith, especially the oldest parts (which often tend to be the “core”). After that, you create a “new and shiny” set of microservices in shape, usable for the consumer. That always starts with a new architecture that uses the most remarkable technologies and most recent approaches. Fun! (At least it’s fun for the development team.)
But after the fanfare ends, you face the painful process of attaching pieces of the old functionality (that users still need and thus request) to that “new and shiny” microservice-based architecture. Then your software development follows the typical pattern of being “almost ready” for months while your customers (and your salespeople) get angrier and angrier. That’s the “moon chunks hitting the Earth” part, like in the above novel.
“Our Approach” and Why It’s Like a Visit to the Dentist
If someone asked us to explain the PortaOne approach to breaking the monolith to microservices in telecom in one sentence, it would be this: “Shave off the monolith pieces with the most friction first. Then do the rest”. Have you ever had a tooth filled at the dentist? Remember the carbon paper and the “tap-tap-tap” to find and polish any places of friction after the filling was set? For PortaOne, that “tap-tap-tap” was: (1) payment processors, (2) IP phone provisioning, (3) self-care portals, and (4) sending events to external systems upon data modification in PortaBilling.
Step 2: Digging Tunnels to the Monolith’s Core
It’s time to drill deeper into the monolith. But where to start? We want to “tunnel” to the places our customers are most interested in reaching. So, that’s where we’re beginning. Specifically: closing the billing period. Right now, this is a complex multi-step process that involves charging recurring fees, applying discounts, assessing taxes, credit cards, and so on.
So we’re creating an additional API for our 🐟 Fishbone. It will allow the connection of a custom code piece (running in a cloud somewhere) to any desired step of the process. Then, the results can be adjusted to pass on to the next step. For example, our customers may wish to insert a special promo discount for a specific end user before regular discounts kick in.
Step 3: Replacing “the Core” and Thus Completely Breaking the Monolith to Microservices in Telecom
After we’ve tested the customer flow with this functionality, we’ll redesign the entire provisioning framework throughout 2021 and early 2022. In 2022, we plan to tunnel to the most critical processes (e.g., the aforementioned complete closing of a billing period). Only then, after our customers give their feedback, will we proceed to replacing the “monolith’s core”.
The “core replacement” stage is when our engineering team can start replacing the major technology pieces within our system. We expect the whole process to be complete by 2024. However, the work we did on the previous two steps, especially the unit tests produced, will allow us to perform the entire replacement process in a controlled manner and at the desired pace.
Our Lessons (So Far) in Breaking the Monolith to Microservices in Telecom
Lesson 1. View the Transition from the Business Perspective and from Where the Most Friction Occurs
Drive the “engineering dreams” closer to reality. Let the customer and the sales team take the lead. It’s like boring the tunnels toward the core (see: Jules Verne, above). Instead of spending tremendous effort on “removing” the soil (most of which you won’t use), you establish agile teams of business developers and salespeople, coupled with product people and software architects. These small teams compete among themselves for who will bore the “best tunnel”. Remember our peasant fleet analogy. They did burn and sink those mighty galleys, after all.
Lesson 2. Always ask Yourself: “Will My Product Survive the Long and Costly Redesign Process?”
Most software vendors will not survive if they stop developing their “cash-cow” product or if their new version doesn’t meet customer expectations. So the goal is to shorten the time the cash-cow spends in the 🦋 butterfly cocoon. When we have microservices in place, along with customers who are using them, only then can we afford to organize teams of developers to completely redesign our architecture from monolithic to microservice-based.
Lesson 3. The Coolest Modern Technology Is Great… When Your Customers Can Use It
While designing iPaaS, we used gRPC – the brand-new and ultra-advanced communication protocol at that time. gRPC delivered all the benefits promised and gave our developers something to brag about over beers with their friends. But it also created a problem: many external systems (their developers, to be precise) could not easily use it. So we created an API gateway that allows connection via a “traditional” REST to those not (yet) ready for gPRC.
Lesson 4. Use Workflow Programming Instead of Programming Workflows in Breaking the Monolith to Microservices in Telecom
When a new customer signs up, the telco has to activate their SIM card in the mobile network’s core. Then the sales system creates a shipping order for the warehouse to deliver the physical SIM and chosen mobile phone to the customer. “Before”, when a developer needed to write a script for provisioning some elements accessible via the network, she had to handle unexpected situations in her code. For instance, one of the components might be temporarily unavailable or inaccessible via the network. The more complex the system, the wider the variety of these potential network delays.
Consider “What-If” Situations
Given these various risk factors, any application that has not been written to handle all these “what-if” situations will routinely fail when deployed on a real-life network. And the operator will end up with a bunch of half-provisioned customer records, which will have to be sorted out and fixed manually. On the other hand, it’s costly for a developer to write a complex code that could handle any unexpected situation. And that’s before all the lengthy debugging and QA.
How Does Temporal Help?
The “base layer” by Temporal.io (the technology partner behind our new “service provisioning paradigm”) arrived as a huge relief. Now a developer simply writes code as if it runs on a “perfect” network with no delays or outages. The Temporal server “underneath” handles things when a server containing an essential element of the entire architecture becomes unavailable. Temporal will pause the process and persist in trying to reach that server. Only after it does (or when the system owner changes it to cure the breakage) will the whole provisioning process continue. That’s a real help with breaking the monolith to microservices in telecom! Thank you, Temporal.
Lesson 5. Stop the Holy Wars on Technology, or the “Canonic Approach” to Coding Your System
This lesson is more related to the past experiences of one of our founders and the current CEO. Andriy Zhylenko recalls: “After several local acquisitions, Telenor Czechia ended up with two major coding offices: the one in Prague [which Andriy led] and the one in Brno. The Prague hub wrote code in Perl, the Brno hub wrote code in Python.” Each day, the two teams spent half the day on writing the actual code, and the other half in a heated email debate over “why Python is better than Perl” and what one language we should all be using.
Now, this situation could be “settled forever” through one or two online standups. But, Andriy concludes, back then “we would simply agree on the API, and then each team would use the programming language of their preference.” So now we can allow for “technology diversity”. In our case: the PortaBilling backend is all Perl because of Perl’s “stickiness”. Our iPaaS components, meanwhile have GoLang, Python, Perl, and even some Boomi low-code – whichever suits better. Each language is our own Chaika boat, making us a pretty nimble little army.