ClaudeScape

How It Works

The Perception-Action Loop

A Mac Mini captures the screen, sends it to Claude Opus 4.6 via API, receives action decisions, and executes them with human-like input simulation — all running autonomously.

This is an autonomous AI agent that plays Old School RuneScape entirely through vision, running 24/7 on a Mac Mini. It doesn't read game memory, inject code, or use any plugins — it simply looks at the screen and decides what to do, just like a human player would.

Every few seconds, the Mac Mini captures a screenshot of the game and sends it to Claude Opus 4.6 (Anthropic's most powerful AI model), which returns a structured decision: what it sees, what it thinks, and what actions to take. Those actions — mouse clicks, keyboard presses, minimap navigation — are executed with human-like Bezier curve mouse paths and randomized timing.

The agent has persistent memory powered by a vector database. It remembers where it's been, what worked, what failed, and what killed it. These memories are fed back into every decision, so it learns from experience across hundreds of ticks.

Everything streams live to this dashboard — the AI's thoughts, its actions, player stats, a world map tracking its location, screenshots of what it sees, and achievement milestones. You're watching an AI explore Gielinor in real time.

Screenshot

Capture the game window

→

LLM Vision

Claude Opus 4.6 analyzes the scene

→

Decision

Choose actions from 30+ types

→

Execution

Human-like Bezier mouse paths

→

Memory

Store experience in vector DB

→

Repeat

Loop forever, learn always

Player Dashboard

Player Progress

Real-time stats extracted from the game via vision AI. Every tree chopped, every monster slain, every level gained — tracked live.

osrs-agent — player_stats.py

OFFLINE

ClaudeScape

Autonomous vision-only RuneScape player. No APIs, no plugins — just screenshots and intelligence.

Uptime: -- Ticks: 0 Location: Unknown Distance: 0 Memories: 0

Vitals

Hitpoints

100

Prayer

Run

100

Activity Counters

0Trees

0Kills

0Items

0Deaths

0Fish

0Ores

0Bones

0Food

0Doors

0NPCs

World Map

Unknown

XP Tracker

Total XP 0

No XP gains yet...

Level Ups

No level-ups yet...

Inventory

Small Net

Bread

Cowhide

Bronze Axe

Logs

Bones

Coins 25

Raw Shrimp

Bronze Sword

17 / 28 slots

Equipment

Headnone

Capenone

Necknone

Weaponnone

Bodynone

Shieldnone

Legsnone

Glovesnone

Bootsnone

Ringnone

Ammonone

Goal Progress

Loading goals...

Memory Bank

Loading memory data...

Action Timeline

No actions yet...

Milestones

No milestones yet...

Session History

Loading sessions...

Capabilities

Built for Autonomy

Every component is engineered for one goal: a self-sufficient agent that plays RuneScape like a human, learns from experience, and never stops.

Multimodal Vision AI

Sends raw screenshots to Claude Opus 4.6. The LLM observes the game world, identifies objects, NPCs, and interfaces, then decides what to do next — pure vision, zero game API access.

Claude Opus 4.6

Human-Like Input

Mouse movements follow Bezier curves with de Casteljau interpolation, ease-in-out acceleration, random jitter, and occasional hesitation pauses. Indistinguishable from a real human player.

Bezier Curves

Persistent Memory

ChromaDB vector database stores 12 types of memory: observations, combat knowledge, navigation paths, NPC encounters, death learnings, and more. Semantic search retrieves relevant past experiences.

ChromaDB + Embeddings

Hierarchical Goal Planning

Tree-based goal system with prerequisites, priority ordering, auto-cascading completion, and retry logic. Goals decompose from "Complete Tutorial Island" into atomic sub-tasks.

Goal Tree

Intelligent Stuck Detection

Three-level detection: repeated clicks, scene-stuck keywords, and area-stuck monitoring. Automatic recovery nudges force new approaches, exploration, and activity variation.

Self-Recovery

Live Stream Overlay

OSRS-themed transparent overlay shows the agent's real-time thoughts, reasoning, and actions. OBS WebSocket integration enables 24/7 autonomous livestreaming with status overlays.

OBS Integration

System Design

Modular Architecture

Clean separation of concerns allows each subsystem to evolve independently. The game loop orchestrates all components through a unified tick-based pipeline.

┌──────────────────────────────────────────────────────────────────────┐ │ ClaudeScape — System Architecture │ ├──────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Screenshot │────▶│ LLM Vision │────▶│ Decision │ │ │ │ Capture │ │ Engine │ │ Engine │ │ │ └─────────────┘ └──────────────┘ └──────┬───────┘ │ │ │ ▲ │ │ │ │ │ │ │ ▼ │ │ │ ┌─────┴────────┐ ┌──────────────┐ │ │ │ │ Memories │ │ Action │ │ │ │ │ + Goals │ │ Executor │ │ │ │ └──────────────┘ └──────┬───────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌──────────────┐ │ │ └─────────────────────────│ Human-Like │ │ │ loop every 300ms │ Input │ │ │ │ Controller │ │ │ └──────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────┘

Under the Hood

Code That Plays Games

From Bezier curves to semantic memory, every line is crafted for autonomous gameplay.

            agent/core/input_controller.py
            Python
          
def _bezier_points(self, start: Point, end: Point,
                    control_points: int = 2, num_steps: int = 50) -> list[Point]:
    """Generate points along a Bezier curve from start to end.
    Creates natural-looking curved mouse trajectories."""
    points = [start]

    # Generate random control points that create a natural arc
    cps = [start]
    for _ in range(control_points):
        mid_x = (start.x + end.x) / 2
        mid_y = (start.y + end.y) / 2
        dist = math.hypot(end.x - start.x, end.y - start.y)
        spread = dist * 0.3
        cp = Point(
            int(mid_x + random.uniform(-spread, spread)),
            int(mid_y + random.uniform(-spread, spread)),
        )
        cps.append(cp)
    cps.append(end)

    # De Casteljau's algorithm for smooth Bezier interpolation
    for i in range(1, num_steps + 1):
        t = i / num_steps
        t = self._ease_in_out(t)  # Slow start, fast middle, slow end
        result = self._de_casteljau(cps, t)
        points.append(result)

    return points

def _ease_in_out(self, t: float) -> float:
    """Ease-in-out for natural acceleration/deceleration."""
    if t < 0.5:
        return 2 * t * t
    return -1 + (4 - 2 * t) * t

            agent/core/vision.py
            System Prompt
          
"""The LLM receives a comprehensive gameplay instruction set."""

OSRS_SYSTEM_PROMPT = """
You are an autonomous OSRS player. Analyze each screenshot
and return JSON actions.

## CRITICAL RULES
- ALWAYS return 2-3 actions per turn
- Your LAST action should be a click_minimap to keep moving
- NEVER open the Settings menu — it blocks the game view
- FINISH what you start: combat, tree chopping, conversations

## ACTIVITY PRIORITY
1. CHOP TREES — click any tree to gather logs
2. COMBAT — attack nearby creatures, pick up drops
3. TALK TO NPCs — engage in conversation
4. PICK UP ITEMS — bones, coins, weapons, everything!
5. EXPLORE — walk to new towns, castles, bridges

## AVAILABLE ACTIONS
- click(x, y) — left-click game coordinates
- right_click(x, y) — context menu
- click_minimap(x, y) — navigate via minimap
- type_text(text) — type in chat
- press_key(key) — press keyboard key
- click_inventory_slot(slot) — interact with item
- rotate_camera(direction, duration) — camera control
- wait(min, max) — brief pause
"""

            agent/memory/memory_store.py
            Python
          
class MemoryType(Enum):
    """12 categories of persistent agent memory."""
    OBSERVATION  = "observation"     # What was seen on screen
    ACTION       = "action_result"   # Successful action outcomes
    LOCATION     = "location"        # Map knowledge & landmarks
    NPC          = "npc"             # NPC behavior & dialogue
    QUEST        = "quest"           # Quest progress
    SKILL        = "skill"           # Training techniques
    ITEM         = "item"            # Item properties
    COMBAT       = "combat"          # Monster knowledge
    DEATH        = "death"           # Learn from mistakes
    FAILURE      = "failure"         # Failed attempts
    NAVIGATION   = "navigation"      # Path instructions
    STRATEGY     = "strategy"        # General approaches

def get_context_for_situation(self, query: str) -> str:
    """Retrieve semantically relevant memories for the LLM."""
    results = self.collection.query(
        query_texts=[query],
        n_results=settings.MEMORY_TOP_K,  # Top 10 matches
    )
    # Format memories as context for the LLM prompt
    memories = []
    for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
        memories.append(f"[{meta['type']}] {doc}")
    return "\n".join(memories)

            agent/planning/goal_planner.py
            Python
          
@dataclass
class Goal:
    """Hierarchical goal with prerequisites and retry logic."""
    id: str
    name: str
    description: str
    status: GoalStatus          # PENDING | ACTIVE | COMPLETED | FAILED
    priority: int               # 1-10, higher = more urgent
    parent_id: Optional[str]
    children_ids: list[str]
    prerequisites: list[str]   # Goal IDs that must complete first
    attempts: int = 0
    max_attempts: int = 10

def get_next_goal(self) -> Optional[Goal]:
    """Select highest-priority goal with met prerequisites."""
    candidates = [
        g for g in self.goals.values()
        if g.status == GoalStatus.PENDING
        and all(
            self.goals[p].status == GoalStatus.COMPLETED
            for p in g.prerequisites
            if p in self.goals
        )
    ]
    if not candidates:
        return None
    return max(candidates, key=lambda g: g.priority)

def complete_goal(self, goal_id: str):
    """Complete a goal. Auto-cascades to parent if all children done."""
    goal = self.goals[goal_id]
    goal.status = GoalStatus.COMPLETED
    # Check if parent should auto-complete
    if goal.parent_id and goal.parent_id in self.goals:
        parent = self.goals[goal.parent_id]
        if all(self.goals[c].status == GoalStatus.COMPLETED
               for c in parent.children_ids):
            self.complete_goal(parent.id)  # Recursive cascade

Common Questions

FAQ

What is the perception pipeline and how does vision inference work?

The agent implements a real-time perception-action loop running at sub-3-second tick intervals. Each cycle captures a lossless screenshot of the game window, applies JPEG compression with perceptual quality optimization, and encodes it as base64 for multimodal inference. The frame is dispatched to Claude Opus 4.6 via Anthropic's vision API, which performs scene decomposition — identifying entities, UI state, spatial relationships, and interactive elements. The model returns structured JSON containing semantic observations, chain-of-thought reasoning, coordinate-precise actions, and memory annotations. Zero game memory access, zero client injection — pure visual grounding.

Does ClaudeScape modify the game client or inject into process memory?

No. The architecture is deliberately non-invasive. ClaudeScape operates as an entirely separate process with no hooks, no bytecode injection, no memory reads, and no client-side modifications. Screen capture uses the OS display pipeline (identical to screen recording software), and input simulation uses standard HID event dispatch via PyAutoGUI with Bezier-curve humanization. The game client remains 100% vanilla — the agent has no more access than a human sitting at the keyboard.

Why dedicated hardware instead of cloud infrastructure?

ClaudeScape runs 24/7 on a single Mac Mini — a deliberate architectural choice. The Mac Mini handles native game rendering, display-pipeline screen capture, local input execution, ChromaDB vector storage, PostgreSQL telemetry pushing, and stuck-detection signal processing. Only the LLM inference is offloaded to Anthropic's servers. This edge-compute model eliminates cloud VM costs, reduces latency on the capture-to-action loop, and enables unattended autonomous operation with automatic session recovery. Total infrastructure: one machine + one API key.

How does the persistent memory system work?

The agent implements retrieval-augmented generation (RAG) over its own experiential history. Every tick, observations, action outcomes, navigation routes, combat encounters, deaths, and strategic insights are encoded as vector embeddings and stored in ChromaDB with typed metadata across 12 memory categories. Before each decision cycle, the agent performs semantic similarity search against its memory store, retrieving the most relevant past experiences for the current context. The result: emergent long-term learning without fine-tuning — the agent genuinely improves over time by recalling what worked and what didn't.

How does the stuck detection system handle false positives?

Stuck detection uses a three-layer signal fusion approach. Layer 1: NumPy-based minimap frame differencing with a calibrated threshold to detect movement cessation. Layer 2: NLP-based scene analysis scanning recent observations for obstacle keywords (fences, gates, walls). Layer 3: action sequence pattern matching to detect repetitive click loops. The critical innovation is context-aware suppression — the system distinguishes between "stuck" and "intentionally stationary" by checking combat state, skilling activity, and dialogue engagement. If a force-unstick triggers and combat begins, the override is immediately cancelled. Five-tick threshold, not three — tuned through extensive real-world testing.

What does the real-time telemetry pipeline look like?

Every third tick, the agent serializes its full state — screenshot, observations, vitals, inventory, counters, location, active goals, memory distribution, and action history — into a JSONB blob and upserts it to a PostgreSQL instance on Railway. The Vercel-hosted dashboard polls the REST API every 2 seconds, rendering the live feed terminal, world map with GPS-style tracking, player vitals, activity counters, XP tracker, and session history. Single-row upsert means zero table growth — only the latest state matters. The full pipeline: Mac Mini → PostgreSQL (Railway) → REST API → Vercel CDN → browser.

Is the codebase open source?

Fully open source. The entire system — vision engine, action executor, memory store, goal planner, stuck detector, input humanization, desktop overlay, database layer, live dashboard, and deployment config — is available on GitHub. 3,500+ lines of Python, zero proprietary dependencies.