Dual Version appeared as part of the natural evolution of our own internal software distribution process. The “big launches” of the 2000s had created a somewhat ceremonial attitude among customers and developers towards launching new software. The older software distribution model (sending out CDs with the install files) contributed to that as well.
Back then, the typical consumer eagerly awaited the next release of new commercial software, such as Microsoft Windows, Office, MacOS, or Adobe’s product. But that was definitely not the case with B2B clients, telecom in particular. (See our last post for a refresher on why.)
Why Customers Ignore Software Updates (Especially in Telecom)
A telecom operations team (the people that handle installed software) can be pretty motivated to do as few changes as possible. Preparation for an update requires major testing efforts, creating a huge drain on scarce resources. Typically, it’s the sales and marketing teams who want the update, but they don’t have the direct ability to make it happen. The result? A lot of internal struggle. So it’s not surprising that there’s a natural tendency to procrastinate.
The Great MR Gap: Its Risks and Inconveniences
Remember the MR Gap from our “Agile retro” post? For any readers lazy enough to skim this blog: the MR Gap arises when a client falls chronically behind on their MR updates. This can happen for a couple of reasons:
- They’re afraid their customer base won’t like the change. (Sometimes people get upset not because things no longer work, but simply because they work differently!)
- They don’t have the internal engineering resources to prepare and execute the update properly.
The “Great” MR Gap (our nod to the Great Barrier Reef) appeared when PortaOne started to produce new software releases on a seven-week Agile schedule. We were turning out new updates fast, but our customers stuck to the habit of one update every one or two years. By the time we rolled out MR70, more than half of our customers still had a pre-MR55 version installed. At that point it was at least three years old!
Did Other People Already Solve This? (And How?)
We weren’t the only market player facing this technology-upgrade cycle issue. Our “old good friends” at Cisco around that time were struggling with much greater upgrade lags. Their by-then vice president of technology and sales for worldwide channels even said: “Anything that’s more than five years old is now somewhat obsolete.” Ok, Cisco, not “that 😂obsolete” if you issue updates once in a year or two, compared to our seven-week MR cycle.
So, we had an idea. What if we made it really easy for customers to upgrade? What if we could eliminate marketing’s fear of negative impacts on existing customers, and operation’s lack of testing time? And what if our own product could enable gradual, smooth migration, including a range of options for A/B testing, launch cohorts, feature toggles, and so on? Welcome Dual Version by PortaOne.
Dark Launching and Dual Version
Long story short: in spring 2020 we wrote an (almost scientific) white paper on Dual Version. In it, we analyzed the concept and existing technology solutions, and explained how Dual Version works with PortaSwitch. On the off chance you don’t feel like reading 20 pages of highly nuanced technology-related text 🧐tonight, here’s the gist: Dual Version allows gradual migration of customer batches to a new release, avoiding the risk of unexpected downtime or impaired user experience.
The Architecture of Dual Version
Dual Version has two major components:
- Porter agent: A piece of code that reads the data from the old version (“source”), applies data transformation rules, and stores the resulting data in the new version (“target”). This lets you jump across virtually any gap in software versions. And if things don’t go as planned, the rollback only affects the customers in a specific customer batch.
- DSBC (dispatching session border controller): A piece of software that stands “between” the end user and instances of PortaSwitch. During a migration, it determines which end user gets into which system (source or target). This allows the operator to perform the migration without the customer needing to change any configuration settings on their side.
This approach allows three essential benefits:
- Sales and marketing can start selling new services while the migration is still in progress.
- There are limitless opportunities for features experimentation, A/B and cohort testing, and dark launching (i.e., delivering a new feature to customers in targeted batches rather than one “big bang” update). Any negative impacts are limited by the number of customers in the batch.
- You can turn your end users into testers of strange and uncommon scenarios – AKA “long tail”. This helps you uncover any rare or unforeseen issues. (Think: a problem with a web page developed by a reseller three years ago.) It also gives you way more time to fix these issues before they start affecting a broader circle of end users.
“Great Product Idea, a Few Problematic Pieces of Architecture” — Our Lessons with Dual Version v1
Winston Churchill (or was it Amine A. Ayad?) once said, “Be humble enough to see your mistakes, courageous enough to admit them, and wise enough to correct them.” After coming up with the DSBC architecture in 2016, we “rushed to implement” it. We did some internal testing, then sold three discounted pilots to our actual clients to try it out in real-life conditions. The original project estimate for these pilots was six to nine months in early 2018. Just over a year later, we began to realize our approach needed to change. Things were not turning out the way we thought.
The Support Nightmare: Dual Version Backports
MR55 (launched in May 2016) was our most popular release at the time we started developing the Dual Version project in 2017. This meant that any changes we made to accommodate the new approach had to find their way into code that was already written, tested, and deployed.
Backporting is a practice well known in software engineering and cybersecurity. It consists of applying a patch taken from a recent software version to an older version of the same software. While the term itself is elegant, the rough reality it requires is not. Backporting takes tremendous effort from the dev and QA teams, consisting mostly of running newer pieces of code in an older environment, and fixing the resulting errors one by one. And to make it worse, it’s very hard to split backporting into parallel tasks (for obvious reasons). To avoid confusion, the code adaptation has to occur mostly in turns. This heavily impacts the time frame and makes it less predictable.
API Was a Nice, Automated Way of Migrating Data… in Theory
Here was our plan: Porter reads the customer data from the older release, then writes it to the new one, using the standard PortaBilling API. But this plan faced the rough reality of legacy data inconsistencies. Take this example: in those “ancient” days, an application, writing directly to the database, would put “USA” (instead of “United States”) in the country name field. Then, when we implemented data validation in the API, it “grandfathered” that old customer record.
That old customer record did keep working. But when we tried to insert that data into the new release, the API there rightfully complained and rejected it. So, after each Porter run – migrating, say, 1,000 customers – Porter would report a dozen or more “data migration issues”. Then our support team had to manually analyze each one to make the client database consistent. That took a lot of man hours, rendering migration via API useless for someone who had “old” data in the system.
Plan Assessments Need to Be Realistic
Unexpected things kept on popping up during migrations. “We just remembered there is this product we initially kind of forgot.” Or: “By the way, it uses a VoIP gateway that was not in the original network diagram.” On one pilot with multisite architecture, we had planned to move the same IP address between them using an external router. But, during the development stage, we assumed each site would have its own IP address. Add DSBC and the need for two complete sets of the source and target systems, and a “quick MVP pilot” became a full-scale implementation project. And, as it turned out, one of the toughest, architecture-wise, in our entire corporate history. Obviously, the client didn’t know about these kinds of details, and we absorbed all the unexpected migration costs.
Dual Version v2 and @Madball
We learned from our mistakes. The most important lesson? Dual Version required a leader and a single responsible project manager (PM). Talented and experienced former support engineer (and then-PM in our Chernihiv office) Oleksandr Zalugovskiy – Slack handle @Madball – became this leader.
Out of three pilot projects, we stopped one. Then Oleksandr and his team carried the remaining two “on their shoulders” to a successful completion. Why didn’t we stop all three? Because the remaining two allowed us to try the concept of DSBC in a real client environment. That let us confirm the approach was viable, and fix any issues we discovered. It was our Falcon1 project, setting the groundwork for the smooth landing of a future Falcon9! While the support team under Oleksandr’s command did the migration, the developers redesigned Porter. They improved the migration procedures to lessen the load on our support engineers, and made the whole migration process more predictable.
Porter v2 now performs data migration on the database level, using the same “pre-packaged” SQL queries we use for the standard update procedure. (Hence, they have been battle-tested on hundreds of updates already.) This ensures that Porter processes all existing data properly (even that “USA” in the place of “United States”), and no one has to deal with any import errors manually. It’s also much faster, especially for customers with lots of subordinate accounts underneath.
Delivering Features via Hotfixes Instead of Backporting
The PM, QA and dev teams championed the use of the hotfixes procedure for Dual Version v2. Unlike backports, hotfixes are a standard support procedure, so any engineer can correctly use them. So yes, it requires more work from dev and QA to create the standardized code and test it, but the support engineers simply run a few commands and the customer system is up to date. This improvement required a thorough analysis of the Dual Version pilots, along with creating a full-scale testing environment. So most of the upgrade and DSBC procedures are now standardized.
Pre-migration Analysis for Dual Version
Here are a couple of (seemingly) easy questions that puzzle most migration engineers and PMs on the customer side: Which systems will interconnect with PortaSwitch? And what customer batch from which product are we going to migrate first into the target system? To accommodate these questions, at the pre-migration analysis stage we always level expectations and book adequate resources on both sides.
Usually, this stage takes two weeks. We go in with four main deliverables for the customer:
- A high-level diagram of involved network components
- A tentative migration timeline
- A migration matrix (so we know how subscribers of each product are split into batches)
- A sequence of actions and responsibilities (AKA, what will done by us and what will be done by the customer)
“A decade ago, when I just arrived at PortaOne, we had a practice that I still remember,” recalls Oleskandr. The PM of some feature would invite the developers, the QA team, and a few random support engineers to “role play” a specific telco project. They would assign responsibilities (e.g., someone changes configuration of a customer via UI, someone runs the actual update, and someone plays the role of the end customer, making phone calls and getting upset when something doesn’t work). Afterward, he says, the team shared pizza, asked questions, and offered feedback. Mashups are indeed an internationally recognized ideation method. In Oleksandr’s words, “Support folk are those who will ultimately drive the project forward to success. If they get the product, the customer will get it too.”
The Dude with World Class Self-Irony
“I had to apply three times and they still did not accept me,” says “@Madball” on a Zoom call. “But I am as persistent as a ram, so they ultimately had to hire me.” Oleksander’s persistence is definitely cooler than his alleged dumbness (despite his humble talk, he is actually super sharp). Those smarts got him slowly but steadfastly promoted from junior support engineer to one of the company’s most revered project managers.
The Girl from Ipanema
“Another funny story is how I met my wife…”
PortaOne had an “old style” update (the type that requires a lot of preparation work) for an operator in São Paulo, Brazil. Two of our engineers were sent on-site to assist. Oleksandr arrived together with Olena, our application engineer, who was also from Chernihiv. They took a bus to the client’s office and reported to their reception. The secretary led them to the conference room, which was booked for the training session. Somehow, things got mixed up and they ended up waiting in this room together for several hours.
Forgotten at the Conference Room
“I am not a big lady-talker. And we never really talked to each other back home. She was just ’that beautiful lady from the application engineering department.’ But after half an hour of uneasy silence, even Zalugovskiy can start talking to a lady. It was an awkward situation. She was nervous because that was one of her first customer training sessions,” remembers Oleksandr.
“So, I tried to support her. Just some empty talk. Then we went walking around the city late at night. When two people from the same company in Chernihiv arrive in São Paulo, there’s nothing criminal if they go for a walk together, right? It was purely a business relationship. And it took us another six months to start dating.”
Never Give Up!
Olena just returned back to PortaOne from a maternity leave last year. “Returned” is a relative term in our modern age of hybrid work. Both Zalugovskiys now work from home (and take working breaks playing with Sviatoslav).
Oleksandr’s advice to young PMs out there? “Well, never give up, and always stay proactive. Just make sure you read the documentation well and that your mates understand the next step in the timing. If you discover that something ’isn’t good’ – don’t walk away. Think about how you can help. Accept responsibility and help others. Also: treat failures as a normal thing. Only people who don’t do anything never make mistakes.”
Dual Version v2
This summer we launched Dual Version v2, reimagined and championed by Oleksandr Zalugovskiy and his team. Three pilot projects are already underway. (Spoiler: all are going much, much smoother!) If you want to save your engineer’s time and your customer experience once Dual Version v2 reaches “general availability”, it’s already time to schedule your pre-migration analysis. Please contact us. During 2021–2022, we plan on launching the wider version of Dual Version, extending it to outside products that are relevant to telecom and the daily work of our customers.