Future Tech

Techie's enthusiasm for decluttering fails to spark joy

Tan KW
Publish date: Mon, 06 May 2024, 05:35 PM
Tan KW
0 435,804
Future Tech

Who, me? Welcome once again, dear readerfolk, to the sanctuary of Who, Me? in which Register readers can recount the times when their technical skills abandoned them, even if momentarily, without fear of judgment.

This week, meet a reader we'll Regomize as "Monty" who, many years ago, was employed by a manufacturer of sensors and measurement systems. The small but close-knit software team was responsible for designing, developing, testing, installing and supporting all the products of the biz - a challenging and interesting environment, sure, and sometimes also confusing.

Late one Friday, a customer's system had to be patched. This particular installation was a remotely managed system composed of several nodes located abroad, which were continuously streaming data to the customer's central offices. As one might imagine, downtime outside of the announced maintenance windows was never well received, as a matter of principle.

The standard solution for installing the software had recently transitioned from "copy the repository and build locally" to .deb packaging - which allowed proper versioning, dependency management and uninstall capabilities. Everything was finally automated and error-proof - or it should have been.

Dramatic foreshadowing.

So our self-confident Monty (who was also responsible for the packaging) connected to the each of the remote nodes, pulled the new package, installed the update, rebooted, and waited for the data to start flowing again. While this deployment part could have been automated as well, the installation was relatively recent and doing some manual monitoring during maintenance windows allowed him to gather some feedback on how the system was performing.

As the end of the working day approached, it was the turn for the last node to be updated - a node that had had a bit of a special history. There were tiny differences between it and the other nodes - differences that were never documented, nor even properly understood.

As you well know, gentle readerfolk, for a big thing to go wrong, several smaller things have to go wrong. In this case you had the aforementioned special installation, an engineer eager for a drink - but actually in desperate need for coffee - and an idiosyncratic OS. After each update, the OS popped up a reminder that there were unused packages, and would you like to uninstall them?

Monty, clean person he was, and worried that local disk space could become an issue, clicked yes - as he had done many many times in his life.

Too late, he realized that the list of "unused" packages contained a few obscure names of seldom-mentioned software with no practical relevance on a modern system. Names like, oh, for example, "Python".

Which was being efficiently uninstalled before his very eyes, together with all the software that depended on it - including the OS itself.

The realization was quickly followed by one CRTL+C and then by fear. The system was not responding. Nor could a new connection be established.

His last chance was a cry for help to his software lead, who had just replaced his wizard robe for a leather jacket and was about to leave. After hearing of the installation snafu, the software lead donned the robe, called home to postpone dinner, and sat down at the console.

The system was unreachable via the usual channel. One magic spell proved unsuccessful, then another, and a third. But maybe … if they could route a tunnel through another node and the wired LAN … with a small detour here and a special parameter there …

And there they were, logged in to the severely crippled system.

Most of the files were indeed gone, but the system was still somehow running - and, most importantly, the update utilities were still available. Time for reinstalling the missing packages! A quick script was prepared to check for missing software, reinstall everything that should have been there, apply the planned update and then reboot. The script was launched and it seemed to do its job, to the huge relief of the duo.

But it got worse.

Just when the reinstallation process was about to complete, the remote connection froze again - this time for good, and for all systems. It transpired that while there was a LAN, outside access for maintenance and updates was happening through a mobile hotspot powered by a local SIM card - whose monthly data cap had just been reached with the big reinstall.

In other words, the script may well have completed successfully, or it may not - there was just no way to tell. They had done everything possible, and the immediate organization of a field trip was not justified without first discussing with the bosses and the client - which would have been an awkward conversation indeed. The only thing was to head home for the weekend and prepare for a challenging Monday.

Before confessing to the big bosses, first thing on Monday morning Monty tried connecting to all the systems - to find that they were all merrily sending data packets to the central bucket.

As it transpired, luck was on his side. For one thing, the download had miraculously completed just kilobytes before the connection froze, and the system rebooted as planned. For another, it was the very end of the month, so during the weekend a new monthly cap on that SIM card had started from zero.

And as it turned out, the client had a completely unrelated IT problem elsewhere on their system at the time, so hadn't noticed any outage at all.

(As a postscript, Monty later learned that his estimates were wrong, and disk space was definitely not an issue - so there was absolutely no need to play Marie Kondo on a production system. Another lesson learned.)

Have you ever been rescued from a stuff-up by the fickle hand of fate, like good old Monty here? Tell us all about it in an email to Who, Me? and we might immortalize your adventure some future Monday. ®

 

https://www.theregister.com//2024/05/06/who_me/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment