Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents

2026-06-16Cryptography and Security

Cryptography and SecurityArtificial IntelligenceMachine Learning
AI summary

The authors study how to help computer defense systems better predict hacker actions when they can't see everything happening. They create a learning method that mimics how attackers behave by watching what can be observed in the network. This approach is used alongside smart defense agents that use both rules and learning to protect networks more effectively. Their method works well in different fake attack situations and helps defenders understand hacker moves better.

Reinforcement LearningImitation LearningCyber-defensePartially Observable SystemsNeurosymbolic ApproachesBehavior TreesDiscrete States and ActionsAutonomous NetworksPolicy LearningIntrusion Detection
Authors
Ankita Samaddar, Sandeep Neema, Daniel Balasubramanian, Xenofon Koutsoukos
Abstract
With sophisticated cyber-attacks becoming increasingly prevalent, modern networks require intelligent autonomous cyber-defense agents trained via Reinforcement Learning (RL). These agents employ neurosymbolic approaches such as behavior trees with learning-enabled components (LECs) to learn, reason, adapt, and implement security rules while maintaining critical operations. However, these autonomous networks are partially observable systems, i.e., the cyber-attacker's (red agent's) actions are not observable, making it difficult for the defender to predict red actions, learn red policies, or assess the attacker's intrusion levels. To address this, we propose a Policy Learning Technique using imitation learning to learn policies for partially observable RL agents with discrete states and discrete actions. We apply this technique in an autonomous cyber environment to predict red agent's actions from network observations and defender actions. Integrated with a neurosymbolic cyber-defense agent, our method effectively handles different red policies and achieves high prediction accuracy across diverse simulated scenarios.