Name: Jailbreak Attack on a Multi-Agent LLM Defense System
Start: 2025-02-21T09:30:00+0000
End: 2025-02-21T11:00:00+0000

10th International Congress on Information and Communication Technology in concurrent with ICT Excellence Awards (ICICT 2025) will be held at London, United Kingdom | February 18 - 21 2025.

Friday February 21, 2025 9:30am - 11:00am GMT

Virtual Room E

Open Zoom

Authors - Junichiro Ando, Satoshi Okada, Takuho Mitsunaga
Abstract - Large Language Models (LLMs) like ChatGPT and Claude have demonstrated exceptional capabilities in content generation but remain vulnerable to adversarial jailbreak attacks that bypass safety mechanisms to output harmful content. This study introduces a novel jailbreak method targeting Autodefense, a multi-agent defense framework designed to detect and mitigate such attacks. By combining obfuscation techniques with the injection of harmless plaintext, our proposed method achieved a high jailbreak attack success rate (maximum value is 95.3%) across different obfuscation methods, which marks a significant increase compared to the ASR of 7.95% without our proposed method. Our experiments prove the effectiveness of our proposed method to bypass Autodefense system.

Paper Presenters

Junichiro Ando

Japan

Friday February 21, 2025 9:30am - 11:00am GMT
Virtual Room E London, United Kingdom

Virtual Room_12E, Virtual Room E

10th International Congress on Information and Communication Technology

Junichiro Ando

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!