* . *
  • About
  • Advertise
  • Privacy & Policy
  • Contact
Monday, December 22, 2025
Earth-News
  • Home
  • Business
  • Entertainment
    $150 million, 12,500-seat entertainment venue coming to Houston in 2027 – CultureMap Houston

    Houston Set to Unveil a Spectacular $150 Million, 12,500-Seat Entertainment Venue in 2027

    WildBrain Sells Stake in Peanuts Holdings to Sony Pictures Entertainment – Licensing International

    WildBrain Sells Stake in Peanuts Holdings to Sony Pictures Entertainment – Licensing International

    Country music star, wife are getting divorced: ‘We are no longer suited to be married’ – PennLive.com

    Country Music Star and Spouse Reveal They Are No Longer Suited for Marriage

    Nate Bargatze is leaving his podcast — and Utah recently saw why – Deseret News

    Nate Bargatze Is Leaving His Podcast – What Utah Fans Recently Went Through

    State Farm Arena Ranks In The Top 5 Live Entertainment Venues In The U.S. & Top 7 In The World, According To Billboard – Secret Atlanta

    State Farm Arena Ranks In The Top 5 Live Entertainment Venues In The U.S. & Top 7 In The World, According To Billboard – Secret Atlanta

    Walk on White features Conchettes and Santa – keysnews.com

    Uncover the Enchantment of Conchettes and Santa in Walk on White

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    Technology Stocks Week Ahead: AI Spending Scrutiny, Fed Rate Path, and Holiday-Thin Trading to Drive Tech Stocks (Dec. 22–26, 2025) – ts2.tech

    Tech Stocks Outlook for Dec. 22-26, 2025: AI Investments, Fed Rate Moves, and Holiday-Thin Trading to Drive Market Action

    Technology is powerful but unforgiving when misused – Supreme Court judge warns – GhanaWeb

    Supreme Court Judge Issues Stark Warning: Technology’s Power Can Be Dangerous When Misused

    The 8 worst technology flops of 2025 – MIT Technology Review

    The 8 worst technology flops of 2025 – MIT Technology Review

    Bangor School District receives new CNC router technology from First National Bank – news8000.com

    Bangor School District Unveils Cutting-Edge CNC Router Technology Thanks to Local Support

    6G discussions: How things have changed – 5gtechnologyworld.com

    The Evolution of 6G: How the Conversation Has Transformed

    Retail supply chains brace for a redefined 2026 as tariffs, technology gaps, and nearshoring upend old models – Raleigh News & Observer

    Retail Supply Chains Revolutionize in 2026: How Tariffs, Technology Gaps, and Nearshoring Are Shaping the Future

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
  • Home
  • Business
  • Entertainment
    $150 million, 12,500-seat entertainment venue coming to Houston in 2027 – CultureMap Houston

    Houston Set to Unveil a Spectacular $150 Million, 12,500-Seat Entertainment Venue in 2027

    WildBrain Sells Stake in Peanuts Holdings to Sony Pictures Entertainment – Licensing International

    WildBrain Sells Stake in Peanuts Holdings to Sony Pictures Entertainment – Licensing International

    Country music star, wife are getting divorced: ‘We are no longer suited to be married’ – PennLive.com

    Country Music Star and Spouse Reveal They Are No Longer Suited for Marriage

    Nate Bargatze is leaving his podcast — and Utah recently saw why – Deseret News

    Nate Bargatze Is Leaving His Podcast – What Utah Fans Recently Went Through

    State Farm Arena Ranks In The Top 5 Live Entertainment Venues In The U.S. & Top 7 In The World, According To Billboard – Secret Atlanta

    State Farm Arena Ranks In The Top 5 Live Entertainment Venues In The U.S. & Top 7 In The World, According To Billboard – Secret Atlanta

    Walk on White features Conchettes and Santa – keysnews.com

    Uncover the Enchantment of Conchettes and Santa in Walk on White

  • General
  • Health
  • News

    Cracking the Code: Why China’s Economic Challenges Aren’t Shaking Markets, Unlike America’s” – Bloomberg

    Trump’s Narrow Window to Spread the Truth About Harris

    Trump’s Narrow Window to Spread the Truth About Harris

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    Israel-Gaza war live updates: Hamas leader Ismail Haniyeh assassinated in Iran, group says

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    PAP Boss to Niger Delta Youths, Stay Away from the Protest

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Court Restricts Protests In Lagos To Freedom, Peace Park

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Fans React to Jazz Jennings’ Inspiring Weight Loss Journey

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Science
  • Sports
  • Technology
    Technology Stocks Week Ahead: AI Spending Scrutiny, Fed Rate Path, and Holiday-Thin Trading to Drive Tech Stocks (Dec. 22–26, 2025) – ts2.tech

    Tech Stocks Outlook for Dec. 22-26, 2025: AI Investments, Fed Rate Moves, and Holiday-Thin Trading to Drive Market Action

    Technology is powerful but unforgiving when misused – Supreme Court judge warns – GhanaWeb

    Supreme Court Judge Issues Stark Warning: Technology’s Power Can Be Dangerous When Misused

    The 8 worst technology flops of 2025 – MIT Technology Review

    The 8 worst technology flops of 2025 – MIT Technology Review

    Bangor School District receives new CNC router technology from First National Bank – news8000.com

    Bangor School District Unveils Cutting-Edge CNC Router Technology Thanks to Local Support

    6G discussions: How things have changed – 5gtechnologyworld.com

    The Evolution of 6G: How the Conversation Has Transformed

    Retail supply chains brace for a redefined 2026 as tariffs, technology gaps, and nearshoring upend old models – Raleigh News & Observer

    Retail Supply Chains Revolutionize in 2026: How Tariffs, Technology Gaps, and Nearshoring Are Shaping the Future

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
No Result
View All Result
Earth-News
No Result
View All Result
Home Technology

Anthropic’s red team methods are a needed step to close AI security gaps

June 18, 2024
in Technology
Anthropic’s red team methods are a needed step to close AI security gaps
Share on FacebookShare on Twitter

AI red teaming is proving effective in discovering security gaps that other security approaches can’t see, saving AI companies from having their models used to produce objectionable content.

Anthropic released its AI red team guidelines last week, joining a group of AI providers that include Google, Microsoft, NIST, NVIDIA and OpenAI, who have also released comparable frameworks.

The goal is to identify and close AI model security gaps

All announced frameworks share the common goal of identifying and closing growing security gaps in AI models.

It’s those growing security gaps that have lawmakers and policymakers worried and pushing for more safe, secure, and trustworthy AI. The Safe, Secure, and Trustworthy Artificial Intelligence (14110) Executive Order (EO) by President Biden, which came out on Oct. 30, 2018, says that NIST “will establish appropriate guidelines (except for AI used as a component of a national security system), including appropriate procedures and processes, to enable developers of AI, especially of dual-use foundation models, to conduct AI red-teaming tests to enable deployment of safe, secure, and trustworthy systems.”

NIST released two draft publications in late April to help manage the risks of generative AI. They are companion resources to NIST’s AI Risk Management Framework (AI RMF) and Secure Software Development Framework (SSDF).

Germany’s Federal Office for Information Security (BSI) provides red teaming as part of its broader IT-Grundschutz framework. Australia, Canada, the European Union, Japan, The Netherlands, and Singapore have notable frameworks in place. The European Parliament passed the  EU Artificial Intelligence Act in March of this year.

Red teaming AI models rely on iterations of randomized techniques

Red teaming is a technique that interactively tests AI models to simulate diverse, unpredictable attacks, with the goal of determining where their strong and weak areas are. Generative AI (genAI) models are exceptionally difficult to test as they mimic human-generated content at scale.

The goal is to get models to do and say things they’re not programmed to do, including surfacing biases. They rely on LLMs to automate prompt generation and attack scenarios to find and correct model weaknesses at scale. Models can easily be “jailbreaked” to create hate speech, pornography, use copyrighted material, or regurgitate source data, including social security and phone numbers.

A recent VentureBeat interview with the most prolific jailbreaker of ChatGPT and other leading LLMs illustrates why red teaming needs to take a multimodal, multifaceted approach to the challenge.

Red teaming’s value in improving AI model security continues to be proven in industry-wide competitions. One of the four methods Anthropic mentions in their blog post is crowdsourced red teaming. Last year’s DEF CON hosted the first-ever Generative Red Team (GRT) Challenge, considered to be one of the more successful uses of crowdsourcing techniques. Models were provided by Anthropic, Cohere, Google, Hugging Face, Meta, Nvidia, OpenAI, and Stability. Participants in the challenge tested the models on an evaluation platform developed by Scale AI.

Anthropic releases their AI red team strategy

In releasing their methods, Anthropic stresses the need for systematic, standardized testing processes that scale and discloses that the lack of standards has slowed progress in AI red teaming industry-wide.

“In an effort to contribute to this goal, we share an overview of some of the red teaming methods we have explored and demonstrate how they can be integrated into an iterative process from qualitative red teaming to the development of automated evaluations,” Anthropic writes in the blog post.

The four methods Anthropic mentions include domain-specific expert red teaming, using language models to red team, red teaming in new modalities, and open-ended general red teaming.

Anthropic’s approach to red teaming ensures human-in-the-middle insights enrich and provide contextual intelligence into the quantitative results of other red teaming techniques. There’s a balance between human intuition and knowledge and automated text data that needs that context to guide how models are updated and made more secure.

An example of this is how Anthropic goes all-in on domain-specific expert teaming by relying on experts while also prioritizing Policy Vulnerability Testing (PVT), a qualitative technique to identify and implement security safeguards for many of the most challenging areas they’re being compromised in. Election interference, extremism, hate speech, and pornography are a few of the many areas in which models need to be fine-tuned to reduce bias and abuse.  

Every AI company that has released an AI red team framework is automating their testing with models. In essence, they’re creating models to launch randomized, unpredictable attacks that will most likely lead to target behavior. “As models become more capable, we’re interested in ways we might use them to complement manual testing with automated red teaming performed by models themselves,” Anthropic says.  

Relying on a red team/blue team dynamic, Anthropic uses models to generate attacks in an attempt to cause a target behavior, relying on red team techniques that produce results. Those results are used to fine-tune the model and make it hardened and more robust against similar attacks, which is core to blue teaming. Anthropic notes that “we can run this process repeatedly to devise new attack vectors and, ideally, make our systems more robust to a range of adversarial attacks.”

Multimodal red teaming is one of the more fascinating and needed areas that Anthropic is pursuing. Testing AI models with image and audio input is among the most challenging to get right, as attackers have successfully embedded text into images that can redirect models to bypass safeguards, as multimodal prompt injection attacks have proven. The Claude 3 series of models accepts visual information in a wide variety of formats and provide text-based outputs in responses. Anthropic writes that they did extensive testing of multimodalities of Claude 3 before releasing it to reduce potential risks that include fraudulent activity, extremism, and threats to child safety.

Open-ended general red teaming balances the four methods with more human-in-the-middle contextual insight and intelligence. Crowdsourcing red teaming and community-based red teaming are essential for gaining insights not available through other techniques.

Protecting AI models is a moving target

Red teaming is essential to protecting models and ensuring they continue to be safe, secure, and trusted. Attackers’ tradecraft continues to accelerate faster than many AI companies can keep up with, further showing how this area is in its early innings. Automating red teaming is a first step. Combining human insight and automated testing is key to the future of model stability, security, and safety.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : VentureBeat – https://venturebeat.com/business/anthropics-ai-red-team-methods-a-needed-first-step-to-closing-security-gaps/

Tags: Anthropic’smethodstechnology
Previous Post

China’s DeepSeek Coder becomes first open-source coding model to beat GPT-4 Turbo

Next Post

Hybrid Work Has Changed Meetings Forever

A Lifestyle Rx For Keeping Your Brain Young – Indiana Gazette Online

Unlock the Secret to a Youthful, Sharp Brain with This Lifestyle Rx

December 21, 2025
Technology Stocks Week Ahead: AI Spending Scrutiny, Fed Rate Path, and Holiday-Thin Trading to Drive Tech Stocks (Dec. 22–26, 2025) – ts2.tech

Tech Stocks Outlook for Dec. 22-26, 2025: AI Investments, Fed Rate Moves, and Holiday-Thin Trading to Drive Market Action

December 21, 2025
Chargers lead Cowboys 21-17 at halftime – Yahoo Sports

Chargers Surge Ahead with a 21-17 Lead Over Cowboys at Halftime

December 21, 2025
World’s Calmest Stock Market Challenges Options Traders in India – Bloomberg.com

India’s Unstoppable Stock Market Leaves Options Traders Scratching Their Heads

December 21, 2025
The cash bazooka: Why Trump wants to send you money – Axios

The Cash Bazooka: How Trump Plans to Put Money Straight into Your Hands

December 21, 2025
$150 million, 12,500-seat entertainment venue coming to Houston in 2027 – CultureMap Houston

Houston Set to Unveil a Spectacular $150 Million, 12,500-Seat Entertainment Venue in 2027

December 21, 2025
Editorial: America’s looming health care crisis – Times Union

America’s Urgent Health Care Crisis: What Everyone Must Understand Today

December 21, 2025
Dismissing politics as ‘dirty’ is wrong and self-defeating – The Republic News

Why Labeling Politics as ‘Dirty’ Is a Dangerous Misstep That Harms Us All

December 21, 2025
Opinion — Eric Sorenson, Brett Engstrom, and Liz Thompson: We need more wild forests and ecological forestry. – VTDigger

Why We Must Protect and Expand Wild Forests Through Ecological Forestry

December 21, 2025
Scientists at the American Museum of Natural History discovered more than 70 new species in 2025 – Phys.org

Discover Over 70 Thrilling New Species Uncovered in 2025 by Top Scientists

December 21, 2025

Categories

Archives

December 2025
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
293031  
« Nov    
Earth-News.info

The Earth News is an independent English-language daily published Website from all around the World News

Browse by Category

  • Business (20,132)
  • Ecology (980)
  • Economy (999)
  • Entertainment (21,876)
  • General (18,873)
  • Health (10,039)
  • Lifestyle (1,012)
  • News (22,149)
  • People (1,005)
  • Politics (1,013)
  • Science (16,214)
  • Sports (21,500)
  • Technology (15,982)
  • World (988)

Recent News

A Lifestyle Rx For Keeping Your Brain Young – Indiana Gazette Online

Unlock the Secret to a Youthful, Sharp Brain with This Lifestyle Rx

December 21, 2025
Technology Stocks Week Ahead: AI Spending Scrutiny, Fed Rate Path, and Holiday-Thin Trading to Drive Tech Stocks (Dec. 22–26, 2025) – ts2.tech

Tech Stocks Outlook for Dec. 22-26, 2025: AI Investments, Fed Rate Moves, and Holiday-Thin Trading to Drive Market Action

December 21, 2025
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

No Result
View All Result

© 2023 earth-news.info

Go to mobile version