Microsoft CEO of AI: Your online content is 'freeware' fodder for training models

Tan KW

Publish date: Sat, 29 Jun 2024, 11:10 AM

Mustafa Suleyman, the CEO of Microsoft AI, said this week that machine-learning companies can scrape most content published online and use it to train neural networks because it's essentially "freeware."

Shortly afterwards the Center for Investigative Reporting sued OpenAI and its largest investor Microsoft "for using the nonprofit news organization’s content without permission or offering compensation."

This follows in the footsteps of eight newspapers that sued OpenAI and Microsoft over alleged content misappropriation in April, as did the New York Times four months earlier.

Then there are the two authors who sued OpenAI and Microsoft in January alleging that they trained AI models on the authors' works without permission. Also, in 2022, several unidentified developers sued OpenAI and GitHub based on claims that the organizations used publicly posted programming code to train generative models in violation of software licensing terms

Asked in
an interview with CNBC’s Andrew Ross Sorkin at the Aspen Ideas Festival whether AI companies have effectively stolen the world's intellectual property, Suleyman acknowledged the controversy and attempted to draw a distinction between content people put online and content backed by corporate copyright holders.

"I think that with respect to content that is already on the open web, the social contract of that content since the 1990s has been it is fair use," he opined. "Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That's been the understanding."

Suleyman did allow that there's another category of content, the stuff published by companies with lawyers.

"There's a separate category where a website or publisher or news organization had explicitly said, 'do not scrape or crawl me for any other reason than indexing me,' so that other people can find that content," he explained. "But that's the gray area. And I think that's going to work its way through the courts."

That's putting it mildly. While Suleyman's remarks seem certain to offend content creators, he's not entirely wrong - it's not clear where the legal lines are with regard to AI model training and model output.

Most people posting content online as individuals will have compromised their rights in some way by accepting the Terms of Service agreements offered by major social media platforms. Reddit's decision to license its users' posts to OpenAI wouldn't happen if the social media giant thought its users had a valid claim to their memes and manifestos.

The fact that OpenAI and others making AI models are striking content deals with major publishers shows that a strong brand, deep pockets, and a legal team can bring large technology operations to the negotiating table.

In other words, those creating content and posting it online make freeware unless they retain, or can attract, attorneys willing to challenge Microsoft and its ilk.

In a paper distributed via SSRN last month, Frank Pasquale, professor of law at Cornell Tech and Cornell Law School in the US, and Haochen Sun, associate professor of law at The University of Hong Kong, explore the legal uncertainty surrounding the use of copyrighted data to train AI and whether courts will find such use fair. They conclude that AI has to be dealt with at a policy level, because current laws are ill-suited to answer the questions that now need to be addressed.

"Given that there is substantial uncertainty over the legality of AI providers’ use of copyrighted works, legislators will need to articulate a bold new vision for rebalancing rights and responsibilities, just as they did in the wake of the development of the Internet (leading to the Digital Millennium Copyright Act of 1998)," they argue.

The authors suggest that the continued uncompensated harvesting of creative works threatens not just writers, composers, journalists, actors, and other creative professionals, but generative AI itself, which will end up being starved of training data. People will stop making work available online, they predict, if it just gets used to power AI models that reduce the marginal cost of content creation to zero and deprive creators of the possibility of any reward.

That's the future Suleyman anticipates. "The economics of information are about to radically change because we can reduce the cost of production of knowledge to zero marginal cost," he said.

All this freeware that you perhaps helped create can be yours for a small monthly subscription fee. ®

https://www.theregister.com//2024/06/28/microsoft_ceo_ai/

Discussions

Be the first to like this. Showing 0 of 0 comments

Featured Posts

MQ Chat

New Update. Discover investment communities that resonate with your ideas

Latest Videos

MQ Market Updates - 28 June 2024

MQ Trader

Apps

MQ Chat

Send individual or group chats with anyone on i3investor

MQ Trader

Earn MQ Points while trading with MQ Trader

MQ Affiliate

Earn side income from Affiliate Program

MQdemy

Online learning and teaching marketplace

Hot Stocks Today >

PTRANS

PERAK TRANSIT BERHAD

1000

MPI

MALAYSIAN PACIFIC INDUSTRIES

994

HLIND

HONG LEONG INDUSTRIES BHD

913

KIPREIT

KIP REAL ESTATE INVESTMENT TRUST

448

YTLPOWR

YTL POWER INTERNATIONAL BHD

415

JCY

JCY INTERNATIONAL BERHAD

412

GENTING

GENTING BHD

377

UCHITEC

UCHI TECHNOLOGIES BHD

342

GENM

GENTING MALAYSIA BERHAD

299

MAYBANK

MALAYAN BANKING BHD

280

Daily Stocks

HSI-HWE

0.17

-0.005

248,121,800

BORNOIL

0.01

+0.005

224,622,400

HSI-HU8

0.095

-0.01

154,131,800

HSI-CXV

0.105

-0.005

126,319,700

HSI-CXF

0.07

-0.01

101,309,000

NOVAMSC

0.215

+0.02

86,522,500

AHB-WC

0.075

+0.005

79,965,600

MYEG

1.02

+0.05

74,108,200

INGENIEU

0.05

-0.01

62,528,100

YNHPROP

0.545

+0.05

50,393,900

More active Stocks

DLADY

36.18

+0.68

15,800

MPI

39.42

+0.54

96,600

UTDPLT

24.50

+0.30

228,400

AJI

15.50

+0.26

208,600

CDB

3.68

+0.21

11,439,700

ALLIANZ-PA

23.60

+0.20

100

PETDAG

17.44

+0.18

704,400

ALLIANZ

22.30

+0.18

15,200

AIRPORT

9.90

+0.17

1,698,400

HUMEIND

3.35

+0.14

941,700

More gainer Stocks

ORIENT

6.97

-0.18

1,220,900

GESHEN

3.23

-0.17

150,100

TENAGA

13.78

-0.16

11,369,400

PETGAS

17.82

-0.16

871,600

HEIM

22.04

-0.16

164,300

APOLLO

6.71

-0.13

1,500

KUAISHO-C17

0.08

-0.12

58,800

NOTION-WD

1.77

-0.11

1,778,200

HLIND

11.12

-0.10

10,500

CANONE

3.00

-0.09

39,800

More loser Stocks

MQ Trading Signals

BUY
SELL

No trading signals available.

More Trading Signals

No trading signals available.

More Trading Signals

Featured Advertisers / Partners

Top Brokers >

AmEquities

Affin Hwang

Rakuten Trade

Hong Leong Bank

Books Review >

Ride The Bull Short The Bear

CS Tan

4.9 / 5.0

This book is the result of the author's many years of experience and observation throughout his 26 years in the stockbroking industry. It was written for general public to learn to invest based on facts and not on fantasies or hearsay....

Read More