Hackers aim to find flaws in AI with White House help

As soon as ChatGPT was unleashed, hackers began "jailbreaking" the AI chatbot - trying to override its security measures so it could blurt out something incomprehensible or obscene.

But now its creator, OpenAI, and other major AI providers like Google and Microsoft are coordinating with the Biden administration to allow thousands of hackers to test the limits of their technology.

Some of the things they will be looking for: How can chatbots be manipulated to cause harm? Will they share the private information we entrust to them with other users? And why do they assume the doctor is male and the nurse female?

"That's why we need thousands of people," said Rumman Chowdhury, chief coordinator of a mass hacking event planned for this summer's DEF CON hacker conference in Las Vegas, which is expected to attract several thousand people. "We need a lot of people with a wide range of lived experiences, expertise and backgrounds to hack into these models and try to find problems that can then be solved."

Anyone who has tried ChatGPT, Microsoft's Bing chatbot, or Google's Bard has quickly learned that it tends to make up information and confidently present it as fact. Built on what are known as large language models, these systems also mimic cultural biases learned by being trained on the vast amount of what people have written online.

The idea of mass hacking caught the attention of US government officials in March at the South by Southwest festival in Austin, Texas, where Sven Cattell, founder of DEF CON's long-running AI Village, and Austin Carson, president of the responsible AI nonprofit SeedAI, helped lead a workshop that invited students community colleges to hack an artificial intelligence model.

Carson said those conversations eventually grew into a proposal to test language models of artificial intelligence against the White House Blueprint for the AI Bill of Rights -- a set of principles designed to limit the effects of algorithmic bias, give users control over their data and ensure that automated systems are used securely and transparently.

There is already a community of users who try their best to trick chatbots and point out their shortcomings. Some of these are official "red teams" authorized by companies to "prompt attack" AI models to reveal their vulnerabilities. Many others are hobbyists displaying funny or disturbing output on social media until they get banned for violating the product's terms of service.

"What's happening now is kind of a point approach where people find things, it goes viral on Twitter," and then it may or may not be fixed if it's severe enough or the person who points it out is influential, Chowdhury said.

In one example, known as the "granny exploit," users were able to get chatbots to tell them how to make a bomb—a request that a commercial chatbot would normally refuse—by asking it to pretend to be a grandmother telling bedtime story how to make a bomb

In another example, a search for Chowdhury using an early version of Microsoft's Bing search engine chatbot — which is based on the same technology as ChatGPT but can pull real-time information from the Internet — led to a profile speculating that Chowdhury "likes to buy new shoes every month." and made queer and gendered claims about her physical appearance.

Chowdhury helped launch the algorithmic bias detection reward method for DEF CON's AI Village in 2021 when she was head of Twitter's AI ethics team — a job that has since been scrapped following Elon Musk's takeover of the company in October. Paying hackers a "bounty" if they uncover a security flaw is commonplace in the cybersecurity industry — but it was a newer concept for researchers studying malicious AI bias.

This year's event will be on a much larger scale and is the first to address the large language models that have attracted a surge in public interest and commercial investment since the release of ChatGPT late last year.

Chowdhury, now co-founder of AI nonprofit Humane Intelligence, said it's not just about finding flaws, but finding ways to fix them.

"This is a direct channel for providing feedback to companies," she said. "It's not like we just do this hackathon and everyone goes home. After the exercise, we spend months putting together a report, explaining common vulnerabilities, things that have come up, patterns that we've seen."

Some details are still being negotiated, but companies that have agreed to provide their models for testing include OpenAI, Google, chip maker Nvidia, and startups Anthropic, Hugging Face, and Stability AI. Another startup called Scale AI, known for its work assigning people to help train AI models by labeling data, is building the testing platform.

"As these core models continue to proliferate, it's really important that we do everything we can to ensure their security," said Scale CEO Alexander Wang. "You can imagine someone on one side of the world asking some very sensitive or detailed questions, including some of their personal information. You don't want that information to be leaked to another user."

Another danger Wang worries about is chatbots giving out "incredibly bad medical advice" or other misinformation that can cause serious harm.

Anthropic co-founder Jack Clark said the DEF CON event will hopefully be the start of a deeper commitment by AI developers to measure and evaluate the security of the systems they build.

Hackers aim to find flaws in AI with White House help

Post a Comment

Search This Blog

Top Posts/Right Now

Labels