Propagating Unsafe Actions in LLM Controlled Multi-Robot Collaboration via Single Robot Compromise
Zhen Huang, Zhihuang Liu, Mengxuan Luo, Weishang Wu, Zhiping Cai
Abstract
Large language models (LLMs) are increasingly used as general planners in embodied intelligence, enabling high level coordination and low level task planning for both single robot and multi-robot collaboration. This increasing reliance on embodied LLM planners also raises critical security concerns, since misaligned or manipulated instructions can be translated into physical actions. Prior work has studied such threats in single robot settings, while security risks in LLM controlled multi-robot collaboration, especially those propagated through inter robot communication, remain largely unexplored. To bridge this gap, we propose a novel attack paradigm for multi-robot system in which the adversary interacts with only a single entry robot. The compromised robot then propagates malicious intent through peer communication, leading to coordinated unsafe actions across the system. Our evaluation, covering high risk dimensions of dereliction of duty, privacy compromise, and public safety hazards, reveals a persistent safety alignment gap in multi-robot planners. We quantify this process with three metrics, obedience, infectiousness, and stealthiness. Experiments demonstrate both persistent attacker control and rapid propagation: obedience reaches 1.00 in the strongest cases, and infectiousness rises to 0.90. Notably, the attack is highly efficient, requiring as few as 3.0 rounds to compromise all the robots while maintaining a stealthiness score of 0.81. Such risks are amplified when robots must resolve trade offs in critical situations, such as emergencies or conflicts of rights, because the coordination mechanism can unintentionally allow adversarial instructions to override safety requirements. The code is available at https://github.com/TheFatInsect/InfectBot.
AI Impact Assessments
(1 models)Scientific Impact Assessment
1. Core Contribution
This paper introduces InfectBot, a novel adversarial attack paradigm targeting LLM-controlled multi-robot collaboration systems. The key insight is that in cooperative multi-robot setups where LLM agents communicate via natural language, compromising a single "entry" robot through jailbreak prompts can cascade malicious intent through peer-to-peer coordination channels, ultimately leading to system-wide unsafe behavior. This is a meaningful conceptual contribution: while prior work (BadRobot, POEX, Robey et al.) has studied jailbreak vulnerabilities in single embodied LLM agents, this paper is the first to systematically explore how inter-robot communication serves as a propagation vector for adversarial influence.
The attack operates under a minimal threat model — black-box access to a single robot via natural language only — which makes the threat realistic and concerning. The paper formalizes the propagation process through three metrics (obedience, infectiousness, stealthiness) and provides a structured algorithmic framework (Algorithm 1) for staged dissemination.
2. Methodological Rigor
Strengths in formulation: The paper provides a clean mathematical formulation of the multi-robot setting and threat model. The three propagation metrics are well-motivated and formally defined, with capability-conditioned normalization for infectiousness being a thoughtful design choice that enables cross-task comparison. The budgeted objective function (Equation 2) provides a principled way to compare attack policies.
Experimental concerns: The experimental evaluation has notable limitations:
The adaptation of the BADROBOT benchmark for system prompt evaluation is reasonable but somewhat ad hoc.
3. Potential Impact
The paper addresses a genuinely important emerging risk. As LLM-controlled multi-robot systems move toward real-world deployment in warehousing, healthcare, and logistics, understanding propagation-based attacks is critical. The key findings are alarming:
These findings could influence: (1) the design of communication protocols in multi-robot systems, (2) trust mechanisms between cooperative agents, (3) safety alignment research for embodied AI, and (4) policy discussions around deploying LLM-controlled robots in safety-critical environments. The open-source release of code and the simulation framework adds practical value.
4. Timeliness & Relevance
This work is highly timely. LLM-powered multi-robot systems are an active area (ICRA 2024-2025, ICLR 2024-2025 publications on cooperative embodied agents), and the security implications are genuinely underexplored. The paper fills a clear gap between single-robot jailbreak research and real-world multi-agent deployment. The connection to actual commercial platforms (Unitree Go2) grounds the threat model in practice.
5. Strengths & Limitations
Key Strengths:
Notable Limitations:
Additional Observations
The paper's framing around "biological" contagion metaphors (infection, propagation) is engaging but could benefit from more rigorous connection to established multi-agent adversarial literature. The ethical statement is present but the dual-use implications of releasing attack code deserve more careful consideration. The writing quality is generally good though some notation is dense.
The finding that stronger models don't suppress cascade dynamics is particularly important and counterintuitive — it suggests that scaling alone won't solve this vulnerability class.
Generated May 19, 2026
Comparison History (21)
Paper 2 likely has higher scientific impact due to a more general, constructive methodology (ISSf-CBF-based hierarchical whole-body control) with direct, deployable safety guarantees for real humanoid systems under disturbances. It addresses a core robotics bottleneck—real-time enforcement of multiple safety constraints with dynamics/contact considerations—and is validated in both simulation and real-robot experiments across tasks, increasing rigor and applicability. Paper 1 is timely and novel in embodied-LLM security, but its impact may be narrower and more contingent on specific LLM coordination architectures and evolving defenses.
Paper 1 explores a highly timely and novel vulnerability in LLM-controlled multi-robot systems: the infectious propagation of malicious intent. As reliance on embodied LLMs grows, physical safety and security alignment become critical. Its intersection of cybersecurity, AI alignment, and robotics gives it a broader potential impact compared to Paper 2. While Paper 2 offers a strong neuro-symbolic approach for autonomous driving, Paper 1 addresses an emerging, high-stakes paradigm with far-reaching implications for future multi-agent AI architectures.
PRISM-SLAM addresses a fundamental and long-standing problem in robotics (monocular SLAM scale ambiguity) with a rigorous Bayesian framework that integrates vision foundation models in a principled way. It offers real-time performance (30 FPS), strong theoretical contributions (Fisher-identifiability of metric scale, Plücker Ray-Distance Factor), and demonstrates deployment-ready results on standard benchmarks. Paper 2 identifies an important security vulnerability in LLM-controlled multi-robot systems, but its scope is narrower and more empirical, addressing a threat model that may be mitigated as LLM safety alignment improves. PRISM-SLAM has broader and more lasting impact across robotics, AR, and autonomous systems.
Paper 2 presents a novel unified RL framework for humanoid locomotion that solves a fundamental robotics challenge—combining walking, running, and fall recovery in a single policy without mode switching—validated on real hardware. This has broad, immediate practical impact for humanoid robotics deployment. Paper 1 identifies an important security vulnerability in LLM-controlled multi-robot systems, but is more incremental (extending single-robot LLM attacks to multi-robot settings) and addresses a narrower, less mature application domain. Paper 2's hardware-validated contribution to a rapidly growing field (humanoid robots) gives it higher impact potential.
Paper 1 addresses a critical, emerging security vulnerability in LLM-driven multi-robot systems, connecting cybersecurity with AI alignment and robotics. Given the rapid adoption of LLMs in embodied AI, exposing and quantifying these systemic risks has high urgency and broad implications for safe AI deployment, offering higher potential impact than the specific perception and navigation improvements proposed in Paper 2.
Paper 1 likely has higher scientific impact due to stronger novelty and timeliness: it introduces a new attack paradigm for LLM-controlled multi-robot collaboration via single-robot compromise and quantifies propagation with clear metrics (obedience, infectiousness, stealthiness), supported by code. The real-world implications for safety-critical robotics and AI security are broad, affecting multi-agent systems, embodied AI, and cybersecurity. Paper 2 appears to be a solid systems contribution achieving SOTA on VLN benchmarks, but its innovation is more incremental (architecture/mapping enhancements) and its impact is narrower to VLN/robot navigation.
Paper 2 (MUSE) likely has higher scientific impact due to broad, immediate applicability in core robotics problems (VIO/state estimation) across navigation, driving, and flight, where calibrated uncertainty is essential for safety and decision-making. Methodologically, it proposes a real-time multimodal uncertainty framework with modern sequential modeling (Mamba) and reports validation on multiple datasets with ablations, supporting rigor and adoption. Paper 1 is timely and novel in multi-robot LLM security, but its impact may be narrower and more contingent on specific LLM-based coordination stacks, with mitigation/defense implications less directly transferable than improved uncertainty estimation.
Paper 1 likely has higher scientific impact because it introduces a broadly useful, novel interpretability methodology for vision-language-action policies—event-grounded SAE analysis—that is validated with causal interventions across multiple architectures, simulations, and a real-robot study. This combination of methodological innovation, rigor, and direct relevance to making embodied AI more interpretable and safer is timely and transferable across robotics, ML interpretability, and foundation-model-based control. Paper 2 is important for security awareness, but reads more like an attack demonstration with narrower methodological novelty and less generalizable scientific tooling.
Paper 2 has higher likely impact due to strong timeliness and broad relevance: it targets security/safety of LLM-driven multi-robot systems, a rapidly growing deployment area with immediate real-world stakes. The single-robot compromise → multi-robot propagation paradigm is a novel threat model with cross-field implications (robotics, AI safety, cybersecurity, HRI). It introduces clear metrics and provides code, supporting methodological rigor and reproducibility. Paper 1 is innovative for manipulation planning with nonprehensile actions, but its impact is more domain-specific and primarily validated in simulation.
Paper 2 likely has higher scientific impact due to its novelty in identifying and experimentally characterizing a multi-robot, communication-propagated attack surface for LLM-controlled systems, a timely and high-stakes topic. It introduces an attack paradigm plus clear metrics (obedience, infectiousness, stealthiness) and provides code, facilitating follow-on research and benchmarks. Real-world implications span robotics, AI safety/security, and distributed systems, with broad relevance as LLM planners are deployed. Paper 1 is methodologically solid and useful for navigation efficiency, but the conceptual leap and cross-field urgency are smaller.
Paper 1 likely has higher impact due to its timeliness and broad relevance: security/safety of LLM-driven embodied multi-robot systems is a rapidly emerging, high-stakes area. It introduces a novel attack paradigm (single-robot compromise propagating through coordination), provides concrete metrics (obedience/infectiousness/stealthiness), strong empirical results, and released code—supporting methodological rigor and reproducibility. The findings generalize across many LLM-planner deployments, affecting robotics, AI safety, and cybersecurity. Paper 2 is methodologically sound and useful, but is a more incremental modeling/planning advance with narrower cross-field urgency.
While Paper 1 offers a strong methodological improvement for long-horizon robotic tasks, Paper 2 exposes critical and previously unexplored security vulnerabilities in LLM-controlled multi-robot systems. By demonstrating how a single compromised robot can propagate unsafe physical actions across a fleet, Paper 2 addresses urgent real-world safety and privacy concerns. This pioneering work is highly timely and likely to catalyze a significant new sub-field focused on the security and alignment of embodied multi-agent systems.
Paper 1 pioneers the exploration of security vulnerabilities in multi-agent LLM systems, revealing a novel 'infectious' attack paradigm in physical robotics. Its findings have broad, critical implications for AI alignment, cybersecurity, and robotics, making it highly impactful for future safety protocols. While Paper 2 offers significant architectural advancements for autonomous driving, Paper 1 addresses an emerging, fundamental failure mode in the rapidly growing field of embodied AI systems.
Paper 2 likely has higher scientific impact: it introduces a new task paradigm plus a sizable benchmark dataset and a strong baseline method with large gains and real-world validation, enabling broad follow-on work in VLN, robotics, mapping, and multimodal reasoning. Its contributions are constructive, reusable, and broadly applicable across embodied AI. Paper 1 is novel and timely in security, with clear metrics and a practical attack model, but its impact may be narrower (focused on adversarial safety in LLM multi-robot coordination) and less generative of standardized benchmarks compared to the dataset-driven paradigm shift in Paper 2.
Paper 2 addresses a highly timely and critical security vulnerability at the intersection of LLMs, multi-robot systems, and cybersecurity. As LLMs are increasingly deployed in embodied AI, exposing and quantifying adversarial propagation in multi-agent systems has broader implications for safety and AI alignment across disciplines compared to Paper 1's specialized algorithmic improvements for SLAM under extreme motion.
Paper 2 addresses a critical and timely security vulnerability in LLM-controlled multi-robot systems—a rapidly growing deployment area. Its novel attack paradigm demonstrating how compromising a single robot can propagate unsafe actions across an entire fleet has broad implications for AI safety, robotics, and cybersecurity. The work opens a largely unexplored research direction with immediate real-world safety implications. Paper 1, while technically sound, addresses a narrower optimization problem (resource-aware navigation) with incremental improvements. Paper 2's cross-disciplinary relevance and urgency of the security threat give it higher impact potential.
Paper 2 has higher potential impact due to its novelty and timeliness in exposing a new security failure mode for LLM-controlled multi-robot systems: attack propagation via inter-robot communication from a single compromised agent. The problem is broadly relevant across robotics, AI safety, cybersecurity, and human-robot interaction, with clear real-world implications as LLM planners are rapidly deployed. It introduces a concrete threat model plus measurable metrics (obedience, infectiousness, stealthiness) and demonstrates efficient, scalable compromise. Paper 1 is valuable but more incremental (reward shaping with CBFs) and narrower to a specific CAV intersection MARL setting.
Paper 1 addresses the fundamental and broadly applicable problem of catastrophic forgetting in fine-tuning VLA models with a principled, practical solution (ConSFT) requiring no extra data or architecture changes. It demonstrates strong results across multiple benchmarks and real-world deployments. Paper 2 identifies an important security vulnerability in LLM-controlled multi-robot systems, but its scope is narrower—focused on attack demonstration rather than defense—and the threat model, while novel, addresses a less mature deployment scenario. Paper 1's methodological contribution is more immediately actionable across the growing VLA community.
Paper 1 exposes a critical, highly novel security vulnerability in emerging LLM-controlled multi-robot systems: the propagation of malicious intent through inter-robot communication. By introducing a new attack paradigm (infecting a swarm via a single robot), it opens a new sub-field in embodied AI safety with severe real-world implications for public safety and privacy. While Paper 2 presents a strong algorithmic improvement for VLA generalization, Paper 1's identification of a fundamental security flaw in a rapidly adopting technology gives it a broader and more urgent potential scientific impact.
Both “papers” are identical (same title, abstract, metrics, and code link), so they have equal novelty, applications, rigor, breadth, and timeliness. With no differentiating information, neither can be judged higher impact; selecting paper1 only as a tie-breaker.