Future Tech

When AI helps you code, who owns the finished product?

Tan KW
Publish date: Wed, 15 May 2024, 04:29 PM
Tan KW
0 443,033
Future Tech

Opinion I've been writing software for almost half a century, and my recent experiences with AI suggest that developers may soon find ourselves in a very sticky situation.

I say that having started with 8085 assembly code, then moving on to C, then C++, then Java. Once the web came along I learned the three Ps: Perl, PHP and Python.

Python stuck - more than two decades later it remains my go-to language. I'm far from alone; these days, many introductory computing courses teach Python. This means most scientists and engineers have at least a passing familiarity with it, so when they need to code something, they use Python. Enormous libraries of Python “solutions” can therefore be found online. If you have a coding problem, chances are that someone else has already solved it.

This explains why Python became the de facto language for machine learning and artificial intelligence; researchers working on ML algorithms want to test their hypotheses and optimize their approaches - without having to sweat the details of the code. With Python, researchers don't have to put a lot of effort into their code; instead, they can focus on the problem they're solving. That kicked off a virtuous cycle of development: pretty much everything in artificial intelligence today - except for the lowest-level, tightest loops of bit-banging and matrix multiplications - is written in Python.

Recently, an attorney who specializes in Intellectual Property Law requested my assistance prototyping a tool he'd dreamed up - using generative AI to automate some of the boring, fiddly bits of research that IP lawyers do daily. I leapt at the opportunity to get my hands into a bit of “product-oriented' AI coding, and realized I could even take advantage of a bit of AI myself, using OpenAI's GPT-4.

All five of the big foundation models (GPT-4, Microsoft Copilot, Google Gemini, Anthropic Claude and Meta AI) were fed trillions of “tokens” of text during their lengthy training, including pretty much every last example of source code that could be scraped off the open web and open-source code repositories.

A lot of that code is Python, which means all these models can do a decent job writing Python.

Knowing this, I wanted to learn if I could use AI to '10x' myself: could I make software ten times faster, using AI, than I could using just my wetware?

To test that idea, I soon learned that I'd have to adapt my playful coding approach to something more rigorous. Did I understand the problem I wanted to solve? Could I express it clearly? Could I communicate my understanding to the AI in a sufficiently direct and unambiguous prompt that it would generate the response I sought?

That was my first big 'aha' moment: to realise the benefits of AI, I'd have to completely rework my workflow into something much more formal, considered, and structured - a process a lot less fun than idly flipping from editor to command line. Working with an AI as an accelerator transforms the work.

If I hadn't already been coding for nearly a half a century it would have taken me a lot longer to intuit how I needed to change my practice, conforming myself to what the AI demands; as it is, I see what I need to do - though I am resisting. It feels less fun that way. Then again, that's always going to be the nature of the trade-off - sure, you can work faster, but you probably won't enjoy the process.

However, when faced with writing a function to extract a set of relevant data from a huge and deeply nested XML document, I relished the assistance of GPT-4. I could have spent a day writing bits of code, exploring Python's xml module. I did throw an hour at the problem, before deciding this work could be better performed by the AI. It took me a few minutes to structure an effective prompt, and fed that in, along with an example of the XML file - a 'one-shot' prompt. The AI quickly gave me a function that fit the bill perfectly and even ran first time. But after a few modifications it became clear that I didn't understand the structure of the XML document and the AI-generated code reflected my poor understanding. That led to my second 'aha' moment: garbage in, garbage out.

I prompted GPT-4 to modify the function to reflect my deeper understanding; it generated a new version of the function. I pasted that into my code, then added single-line additions, tuning it to my specific needs. I got it to a point where around 80 percent of the output was AI generated, and 20 percent was my own work. That's when I had my third and biggest 'aha': Whose code is this?

In the age of AI, one thing the legal system has so far been unambiguously clear on revolves around the ownership of AI generated content: as it has not been created by a human being, it cannot be copyrighted. The AI doesn't own it, the AI's creators don't own it, and whoever prompted the AI to generate this content doesn't own it either. That code cannot have an owner.

Who owns this code that I've written for this attorney? I've plastered a copyright notice at the top of the source - as I've always done - but does that mean anything? A core function in this code is largely AI-generated; and while the rest of my code may be artisanal, bespoke, human-crafted Python, any coder working with an IDE hooked into Github Copilot or getting help from GPT-4 will likely create code containing so many AI-written tidbits that it's very hard to know where the human ends and the machine begins.

Is any of that code copyrightable? Or is all the software we're writing today so thoroughly compromised it might no longer be defensible as a work protected by copyright?

I asked the attorney who'd brought me in to solve their problem. "Well, you own the copyright over the compilation," he said, pointing to a recent ruling where an author was awarded copyright over an AI-generated collection of texts - because of their role as curator of that collection.

What does that mean for source code, where one line may be human-written (and therefore protected) while the next line may be AI-generated (and unprotected)? "It's a mess," the attorney admitted.

We’ve run into this mess at ludicrous speed, blithely unaware that using these AI-powered coding tools turns the copyright protections every software firm takes for granted into a sort of Swiss cheese of loopholes, exceptions, and issues to eventually be tested in court cases.

It seems unlikely commercial organisations will turn their backs on the productivity increases promised by AI coding tools. The allure of 10x-ing an engineering team will almost certainly drown out any of the risks voiced by a legal department that urges caution.

Blasting ahead at full speed will work until it doesn't - when some major software company learns that their crown jewels have been ablated away by the consistent incorporation of generative AI. By then, it will be far too late for anyone else. ®

 

https://www.theregister.com//2024/05/15/ai_coding_complications/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment