Google Search results polluted by buggy AI-written code frustrate coders

Tan KW

Publish date: Wed, 01 May 2024, 04:21 PM

Analysis Google has indexed inaccurate infrastructure-as-code samples produced by Pulumi AI - a developer that uses an AI chatbot to generate infrastructure - and the rotten recipes are already appearing at the top of search results.

This mess started with Pulumi's decision to publish the result of its users' prompts on a curated AI Answers page. Google's crawlers indexed the resulting robo-responses - but when users find them, the AI answers are often inaccurate.

"It has happened," wrote developer Arian van Putten in a social media post over the weekend. "The number one Google result was an official Pulumi documentation page that was clearly written by an LLM (it had a disclaimer that it was) and hallucinated an AWS feature that didn't exist. This is the beginning of the end."

As The Register opined in 2022 and reported in January this year, search quality has declined because search engines index low-quality AI-generated content and present it in search results. This remains an ongoing area of concern.

Pulumi AI and its online archive of responses, AI Answers, is a case in point. Google's search crawler indexes the output of Pulumi's AI and presents it to search users alongside links to human-authored content. Software developers have found some of the resulting AI-authored documentation and code inaccurate or even non-functional.

The problem was noted on March 21, 2024 by developer Pete Nykänen in a GitHub Issues post to the Pulumi AI code repository. "Today I was googling various infrastructure related searches and noticed a worrying trend of Pulumi AI answers getting indexed and ranking high on Google results, regardless of the quality of the AI answer itself or if the question involved Pulumi in the first place. This happened with multiple searches and will probably get even worse as the time goes on."

Others have also raised the issue.

A rising tide of muck

Nykänen told The Register in an email that he began noticing Pulumi AI search result issues around the time he posted to GitHub last month.

"As an engineer, I spend a lot of time searching for answers online and it was not difficult to notice the AI answers rising to the top of the search results overnight, even for keywords unrelated to Pulumi itself," he noted. "I filed the issue and hoped that Pulumi would rectify the situation (which they promised to do) but sadly the issue still persists."

"Documentation, especially infrastructure related, is already often incorrect, hard to find, outdated or otherwise missing. While tools like Pulumi AI can provide value to some, filling the internet with unconfirmed, possibly hallucinated, answers is actually pretty malicious. And the longer it goes on, the worse it gets."

Nykänen argued that with AI content already appearing at the top of search results and more companies creating content generation tools, he hopes that those involved in AI consider how their work impacts the integrity of the web.

"I don't think it's too late for Pulumi either and hopefully they will decide to hide their AI generated content from search engine scrapers," he suggested.

Aaron Friel, an AI engineer at Pulumi, acknowledged Nykänen's concerns, responding the following day that the developer has "taken steps to remove more than half (almost two thirds) of AI Answers, and we plan to continue to ensure that these AI answers are complementary to our existing documentation."

Friel noted that Pulumi also plans to make sure its site mentions real APIs and upstream documentation. Testing generated code is also on the to-do list.

Hello? Google?

That was a month ago, and Google hasn't yet gotten the memo. When The Register tried the keywords cited by Nykänen on Monday "aws lightsail xray" - Pulumi AI's answer was the second search result. And when we tried again on Tuesday, it ranked at the top of the page - above the official AWS documentation.

We asked Google what it thought of the situation and a company spokesperson told us it "always aims to surface high quality information, but on some niche topics or unusual queries, there may not be a lot of high quality content available to rank highly in Search."

The search giant also reminded us that it policies mean "Low value content that’s created at scale to manipulate Search rankings is spam, however it is produced", and that recent updates to its tech "reduced low quality, unoriginal content on Search by 45 percent, and aim to tackle unhelpful content that’s designed to rank well in Search."

Microsoft's Bing search engine could be ahead of the game in terms of filtering AI-generated material as it did not have this problem for the same query, though results it produced included a Chat button that launched an AI-generated response if you took the bait and clicked rather than just hitting return to submit the query. Brave Search also omitted the Pulumi AI response. DuckDuckGo, meanwhile, returned the Pulumi AI result as the fourth item on its search results page for the query.

Another GitHub Issue post on Monday, referring to van Putten's complaint, has asked for the removal of Pulumi AI's answer about AWS EBS direct APIs - which Pulumi evidently does not support.

Several AI hallucinations flagged in March have already been dealt with.

In an email to The Register, Pulumi co-founder and CEO Joe Duffy defended his firm's AI effort - but allowed that more drastic intervention might be called for if the issue can't be adequately addressed.

"Pulumi AI has transformed how most of our customers work, enabling them to navigate a sea of hundreds of clouds with the myriad ways you can use all of their services," Duffy explained. "We processed a 50 percent increase in prompts quarter on quarter, which is a testament to how useful our customers are finding it to their daily work."

A startup that promises to do better ...

Duffy claimed that Pulumi has tested and improved its code quality over time and has seen a double-digit improvement in the success rates for code examples quarter over quarter.

"That said, we know these aren't perfect," he conceded. "Because our AI answers are indexable by Google, they show up in search results. I'll be the first to admit, I was surprised at how highly Google is ranking these pages, since in general they have no inbound links - a far cry from how PageRank used to work - and I would have expected it to prefer our older, more mature content."

Asked when Pulumi first realized its AI had issues, Duffy acknowledged Pulumi has been aware its AI isn't perfect since it launched last year, and has invested to improve its quality.

"We have a new typechecker loop that feeds back into the AI and improves our results," he explained. "We've tweaked it to be better at Python, and we've taught it about our cloud SDKs. All of these have had material increases in quality - and it will just keep getting better from here. Although there's been some negative sentiment on social media, far and away the feedback we get directly is that the AI is helpful, especially when just getting started in the cloud - it truly is daunting to even get started navigating hundreds of clouds each with tens of thousands of services."

Duffy revealed that Pulumi has already removed 100,000 AI answers and will take down more in future.

Despite the challenges, Duffy expects AI will improve over time. "We move fast and try innovative new ideas regularly - and sometimes they just don't work out the way we intended," he admitted. "If we can’t get to a good place quickly, we will absolutely consider delisting all of them and building back up more slowly."

Duffy added that Pulumi's AI Answers clearly state that they're the product of AI. "Despite the hallucinations, we regularly hear 'Even if imperfect, we prefer to have something 80 percent correct, [rather] than nothing at all'." ®

https://www.theregister.com//2024/05/01/pulumi_ai_pollution_of_search/

Discussions

Be the first to like this. Showing 0 of 0 comments

Featured Posts

Moomoo MY

Don't Miss Out: Up to USD1,000 in Our Lucky Draw!

Latest Videos

MQ Market Updates - 15 May 2024

MQ Trader

Apps

MQ Chat

Send individual or group chats with anyone on i3investor

MQ Trader

Earn MQ Points while trading with MQ Trader

MQ Affiliate

Earn side income from Affiliate Program

MQdemy

Online learning and teaching marketplace

Hot Stocks Today >

TOPGLOV

TOP GLOVE CORPORATION BHD

1000

TASCO

TASCO BERHAD

634

HARTA

HARTALEGA HOLDINGS BHD

567

MPI

MALAYSIAN PACIFIC INDUSTRIES

431

PTRANS

PERAK TRANSIT BERHAD

422

HLIND

HONG LEONG INDUSTRIES BHD

414

YTLPOWR

YTL POWER INTERNATIONAL BHD

411

MYEG

MY E.G. SERVICES BHD

378

MBSB

MALAYSIA BUILDING SOCIETY BHD

337

SUPERMX

SUPERMAX CORPORATION BHD

318

Daily Stocks

HSI-CXG

0.23

+0.025

278,010,300

REVENUE

0.235

-0.01

271,524,400

TOPGLOV

1.18

-0.08

247,469,200

HSI-HUZ

0.205

-0.04

232,343,800

MQTECH

0.015

-0.005

156,841,100

INIX-OR

0.02

0.00

141,839,900

CAREPLS

0.345

-0.035

94,381,200

MYEG

1.05

+0.02

86,427,000

SUPERMX

0.99

-0.09

84,617,400

SINKUNG

0.15

+0.01

83,119,100

More active Stocks

MPI

33.52

+1.00

155,300

F&N

33.24

+0.76

192,400

HEIM

24.80

+0.60

281,800

UTDPLT

26.20

+0.58

261,700

NESTLE

129.50

+0.50

27,100

CARLSBG

20.10

+0.48

328,100

PANAMY

19.62

+0.36

16,200

KSENG

6.31

+0.33

494,900

ALLIANZ

22.00

+0.32

30,700

TOMEI

2.04

+0.30

3,117,600

More gainer Stocks

AJI

16.20

-0.36

182,900

AIRPORT

10.12

-0.28

21,538,400

HARTA

3.55

-0.27

31,018,000

KOSSAN

2.48

-0.26

18,894,400

DLADY

32.72

-0.22

2,100

KOSSAN-C60

0.585

-0.195

341,000

AHEALTH

2.92

-0.15

756,000

ICAP

3.31

-0.13

314,600

HLBANK

19.30

-0.12

155,100

KUAISHO-C17

0.08

-0.12

58,800

More loser Stocks

MQ Trading Signals

BUY
SELL

SAB

SOUTHERN ACIDS (M) BHD

2024-05-16 15:30:00

OBV

30 Mins

SAB

SOUTHERN ACIDS (M) BHD

2024-05-16 15:30:00

VOLUME BREAKOUT

30 Mins

SAB

SOUTHERN ACIDS (M) BHD

2024-05-16 15:30:00

TURTLE SYSTEM 20

30 Mins

SAB

SOUTHERN ACIDS (M) BHD

2024-05-16 15:30:00

TURTLE SYSTEM 55

30 Mins

PADINI

PADINI HOLDINGS BHD

2024-05-16 15:30:00

EMA 5

30 Mins

More Trading Signals

VSOLAR

VSOLAR GROUP BERHAD

2024-05-16 15:30:00

TURTLE SYSTEM 20

30 Mins

VSOLAR

VSOLAR GROUP BERHAD

2024-05-16 15:30:00

TURTLE SYSTEM 55

30 Mins

WELLS

WELLSPIRE HOLDINGS BERHAD

2024-05-16 15:30:00

EMA 5

30 Mins

WARISAN

WARISAN TC HOLDINGS BHD

2024-05-16 15:30:00

TURTLE SYSTEM 20

30 Mins

WARISAN

WARISAN TC HOLDINGS BHD

2024-05-16 15:30:00

TURTLE SYSTEM 55

30 Mins

More Trading Signals

Featured Advertisers / Partners

Top Brokers >

AmEquities

Affin Hwang

Rakuten Trade

Hong Leong Bank

Books Review >

Ride The Bull Short The Bear

CS Tan

4.9 / 5.0

This book is the result of the author's many years of experience and observation throughout his 26 years in the stockbroking industry. It was written for general public to learn to invest based on facts and not on fantasies or hearsay....

Read More