Pay for the data you’re using

There’s something of a sea change underway in the global AI debate, and it’s taking place in the UK of all places. But not in a subtle way, by any stretch. Members of Parliament are finally pushing back on one of the tech industry’s most beloved pastimes: running AI algorithms on huge swaths of online content without much regard for who actually owns it.

Their solution is simple, almost obvious. If an AI model is trained on someone’s content, they should probably have to pay for it.

At the moment, a UK parliamentary committee is calling on the government to implement what it’s calling a “licensing-first” model. That would mean that companies would need permission before they could use copyrighted works to train AI models. This includes everything from books and journalism to music, art and photography, basically all the raw material making up the web.

It’s not hard to understand why.

If you’ve followed the rise of AI at all, you may have encountered the term “text and data mining.” It sounds obscure, maybe even innocuous. But it basically means what it says on the tin: algorithms scouring huge amounts of web content in order to understand patterns. That’s how AI learns to generate text, images, summaries and conversations.

It’s clever stuff, certainly.

But there’s a part of the equation that some in the tech industry are occasionally reluctant to discuss. Much of that material is owned by people, authors and musicians and photographers and journalists, who often spend decades producing it.

And, understandably, they’re none too pleased about serving as the unpaid teaching assistants in AI’s classroom.

“The potential damage that could be inflicted on creators by the widespread use of generative AI without proper copyright permissions or payment of fair remuneration is clear and present,” the House of Lords Communications and Digital Committee warned in a briefing to the UK government. “If this happens, the creative industries which play such an important part in the success of the UK economy could be very seriously damaged.”

You can practically hear the resentment from creators on the topic.

Imagine spending years writing a book, or an album, or a photography portfolio, only to discover that AI has somehow absorbed your style along the way. It’s not plagiarizing in the classical sense, perhaps, but it’s close enough to raise some eyebrows. But here’s the kicker: the artist would never even know.

Which is why some policymakers believe the default should be reversed. The onus should be on the AI provider to demonstrate it has licensed the material it used. Where did we get this data? How did we get it? Let’s make this transparent.

Sounds straightforward. Is actually tricky.

But it’s an idea that’s gaining traction. The U.K. isn’t the only country that’s grappling with the issue. Most countries are trying to figure out how to control AI without strangling its development.

It’s a delicate dance.

The European Union, for example, recently put forth its own proposal for an EU Artificial Intelligence Act that aims to increase the accountability and transparency of AI systems. It’s far from a cure-all, but it demonstrates that governments are serious about AI governance.

But here’s the thing.

When one jurisdiction gets serious, others often follow. Tech companies are global, they don’t respect borders, so a decision made in London or Brussels can affect how AI is developed in California, Toronto or Singapore.

So while this may seem like a U.K. issue, it’s really part of a broader game of tug-of-war.

If the U.K. does ultimately decide to require licenses, AI developers may have to completely reconsider how they acquire their training data. That could create all-new industries: companies that license data, publishers and news organizations that partner with AI providers, entire businesses that spring up just to supply AIs with material to learn from.

The data dispute could be a business opportunity.

Unsurprisingly, the tech community isn’t too sanguine about the prospect. It argues that requiring licenses for all the information an AI system learns from could hinder innovation, or make it more expensive. Training large AI models is already prohibitively expensive. Sometimes millions. Sometimes billions. Of dollars.

If you tack licensing fees onto that, it could get dicey.

But the Wild West approach, grabbing as much data as we can now and worrying about the legal issues later, may be coming to an end.

Regardless of whether you’re an AI enthusiast, a tech worker or simply a curious human being who’s ever wondered why chatbots seem to be getting a little too good at mimicking you, the training data debate is shaping up to be one of the major flashpoints of the AI age.

And if the U.K.’s rhetoric is any indication, it’s a fight that’s just beginning.

Source_link