Griffin on Tech: AI and copyright - Big Tech wants a free pass to mine data

21 Apr

For all their potential to spit out inaccurate information and even start hallucinating, the likes of ChatGPT and Google Bard are pretty remarkable at stringing together coherent thoughts on nearly any topic you throw at them.

But that’s because the companies behind them, OpenAI and Google respectively, have harvested vast amounts of information from the internet to train the large language models that underpin them.

That includes copyrighted material that is freely available on the web from news outlets, research institutions and companies. They give generative AI systems the context needed to make them useful.

But content creators, visual artists in particular, are not happy about this. If you can tell DALL-E or Midjourney to create a street art painting in the style of Banksy, those systems will analyse thousands of images of Banksy paintings to server up a version that reflects the artist’s style. Shouldn’t Banksy get a clip of the ticket if that artwork is then used commercially?

Given the anonymous status of the elusive artist, it might be hard to know where to send the royalty cheque. The social network Reddit said this week that it will begin charging companies for API access to the conversations in its millions of threads, if those companies are mining the data for use generative AI systems.

Lawsuit time

Meanwhile, Elon Mush has threatened to sue Microsoft, alleging that the software giant has harvested Twitter conversations to train its own AI model.

“They trained illegally using Twitter data,” Musk tweeted. “Lawsuit time.”

Big Tech in the generative AI space are trying to head off this type of response at the pass by lobbying for copyright law to be changed to allow their blanket harvesting of data to be lawful.

Google has told the Australian Government that its copyright laws are stifling innovation by not protecting companies that want to mine data far and wide to build AI systems.

It urged the Australian Goverment to “review its existing copyright flexibilities and consider introducing fair dealing exceptions including for Text and Data Mining (TDM),” in a submission on the Copyright Enforcement Review Issues Paper.

“The lack of such copyright flexibilities means that investment in and development of AI and machine-learning technologies is happening and will continue to happen overseas,” said Google, rolling out the well-used argument that legislation is a brake on innovation.

Google has history in this area. It fought a major legal battle over Google Books, which saw it scan the contents of millions of books to feature snippets from them online. In 2005, the Authors Guild of America took a class action lawsuit against Google which eventually resulted in a US$125 million settlement for authors whose work had been scanned and digitized by Google without their consent. Google Books has languised as a product since then, too problematic to try and monetise in any serious way.

A brake on innovation?

The Comms Alliance, which represents the Big Tech companies in Australia, wants a version of the Digital Millennium Copyright Act built into the Australian regime.

“Copyright law in Australia should be updated and brought in line with more expansive safe harbour schemes in jurisdictions such as the US which has the Digital Millennium Copyright Act (DMCA),” the Comms Alliance said.

The DCMA effectively puts the onus on copyright holders to identify content they believe is being used inappropriately and ask for its removal. It’s a full-time job for movie studios and music publishers to keep their copyrighted works offs Youtube and other platforms.

In the world of AI, is it realistic to expect copyright holders to participate in a DMCA-type system? It would be a very hard ask. Why can’t tech companies create an opt-in system, asking website owners and content producers whether they want their work included - and pay a licensing fee for the privilege?

That’s the scenario Big Tech wants to avoid. It was forced to the negotiating table in Australia to hammer out deals with news publishers food featuring snippets of their contnet on social media platforms.

The precedent has been set - copyright still matters in the digital age. AI is a massive revenue opportunity for companies like Microsoft, which will sell more Microsoft 365 and Azure subscriptions based on the value of its models.

But its models are filled with info gathered from all over the web. It’s only fair that content creators get a clip of the ticket, or at least can decide whether they want to be included in the massive data harvest.

Peter Griffin

Griffin on Tech: AI and copyright - Big Tech wants a free pass to mine data

A rise in self-service technologies may cause a decline in our sense of community

ITP Cartoon by Jim - Flexibility