Nets🕸️vs👾Automata

Environment:

Entropy: 0.0
Agent Position: (0, 0)

Reward:

Last Reward: 0.0
Total Reward: 0.0

Actor (Policy Network):

Policy Entropy: 0.0
Action Probabilities:
Last Action: None
Last Write Pattern:

Critic (Value Network):

Value Estimate: 0.0
Advantage: 0.0

RL Details:

Learning Rate:
Training Steps: 0

The Reinforcement Learning Games are comprised of a Cellular Automata grid (implicitly on a torus) that updates according to a CA rule, and an RL Agent that sees globally (nxn cells) but acts locally, within a 2x2 window.

Project goal: To create an RL agent that learns to interact on top of a cellular automata environment to play a game. Game ideas: 1) destroy a stable 2x2 target in Game of Life, 2) reduce 'entropy' as Maxwell's Demon by separating on and off cells to the left and right 3) reduce 'entropy' by making the grid mostly on or off 4) Or achieve *something* (like a stable pattern)??🤔.

🕸️ & 👾RL Training Details

  • RL Actor Agent Policy Net 🕸️:
    • Game State Input Tensor Shape = (n, n, 2); NxN Cellular Automata channel & NxN Agent Position channel
    • Output Layer Shape (1, 21), A sample distribution Over 5 + 16 Actions: move Up, Down, Left, Right, Do Nothing, Write one of 16 2x2 patterns
    • Current Games with Reward Functions (needs more consideration): Choose between hitting a target, separating on/off cells, reducing entropy
  • RL Critic Value Net 🕸️:
    • Input State Tensor Shape: (n, n, 2); NxN Cellular Automata channel & NxN Agent Position channel
    • Output State Value Scalar Estimate V(s), the Present Value of Rewards in State S
  • Cellular Automata Environment 👾:
    • Multiple Rule Sets: Conway's Game of Life, Seeds, Maze, and custom rules
    • The agent is overlaid on top of the CA environment but can affect it by writing a 2x2 pattern
    • Torus topology: CA grid wraps left and right and top and bottom edges so there are no boundaries (special toroidal padding is used in the ConvNet)
    • Custom CA Rules: Define your own birth/survival rules

🕹️Manual Play

  • Choose A Game:
    • Manual Control: Use WASD keys + Space (do nothing) + G (Write Pattern)

Dev Thought: I'm still looking for interesting, tractable, bit-flipping games for the agent to play... Got some idea?

Dev Thought: Should we have multiple agents?

Dev Thought: How does this relate to neural cellular automata that 'grow' and persist images they were trained on? This is weaker and can only influence one 2x2 region at a time, and NCA networks defines the update rule.

Dev Thought: What's the hyperparameter search plan? Or is it up to the user and give them control of the learning rates, net architectures, etc.?

Dev Thought: Should we add another table to our SQL database for storing manual game results, for Supervised Learning pretraining?

Dev Thought: Should we make it so the agents actions are not about moving, but it can just affect one 2x2 region per step?

Ready to start