A Mac Mini captures the screen, sends it to Claude Opus 4.6 via API, receives action decisions, and executes them with human-like input simulation — all running autonomously.
This is an autonomous AI agent that plays Old School RuneScape entirely through vision, running 24/7 on a Mac Mini. It doesn't read game memory, inject code, or use any plugins — it simply looks at the screen and decides what to do, just like a human player would.
Every few seconds, the Mac Mini captures a screenshot of the game and sends it to Claude Opus 4.6 (Anthropic's most powerful AI model), which returns a structured decision: what it sees, what it thinks, and what actions to take. Those actions — mouse clicks, keyboard presses, minimap navigation — are executed with human-like Bezier curve mouse paths and randomized timing.
The agent has persistent memory powered by a vector database. It remembers where it's been, what worked, what failed, and what killed it. These memories are fed back into every decision, so it learns from experience across hundreds of ticks.
Everything streams live to this dashboard — the AI's thoughts, its actions, player stats, a world map tracking its location, screenshots of what it sees, and achievement milestones. You're watching an AI explore Gielinor in real time.
Capture the game window
Claude Opus 4.6 analyzes the scene
Choose actions from 30+ types
Human-like Bezier mouse paths
Store experience in vector DB
Loop forever, learn always
Real output from the perception-action loop. Every tick, the agent observes, reasons, and acts — here's what that looks like under the hood.
Real-time stats extracted from the game via vision AI. Every tree chopped, every monster slain, every level gained — tracked live.
Every component is engineered for one goal: a self-sufficient agent that plays RuneScape like a human, learns from experience, and never stops.
Sends raw screenshots to Claude Opus 4.6. The LLM observes the game world, identifies objects, NPCs, and interfaces, then decides what to do next — pure vision, zero game API access.
Claude Opus 4.6Mouse movements follow Bezier curves with de Casteljau interpolation, ease-in-out acceleration, random jitter, and occasional hesitation pauses. Indistinguishable from a real human player.
Bezier CurvesChromaDB vector database stores 12 types of memory: observations, combat knowledge, navigation paths, NPC encounters, death learnings, and more. Semantic search retrieves relevant past experiences.
ChromaDB + EmbeddingsTree-based goal system with prerequisites, priority ordering, auto-cascading completion, and retry logic. Goals decompose from "Complete Tutorial Island" into atomic sub-tasks.
Goal TreeThree-level detection: repeated clicks, scene-stuck keywords, and area-stuck monitoring. Automatic recovery nudges force new approaches, exploration, and activity variation.
Self-RecoveryOSRS-themed transparent overlay shows the agent's real-time thoughts, reasoning, and actions. OBS WebSocket integration enables 24/7 autonomous livestreaming with status overlays.
OBS IntegrationClean separation of concerns allows each subsystem to evolve independently. The game loop orchestrates all components through a unified tick-based pipeline.
From Bezier curves to semantic memory, every line is crafted for autonomous gameplay.
def _bezier_points(self, start: Point, end: Point, control_points: int = 2, num_steps: int = 50) -> list[Point]: """Generate points along a Bezier curve from start to end. Creates natural-looking curved mouse trajectories.""" points = [start] # Generate random control points that create a natural arc cps = [start] for _ in range(control_points): mid_x = (start.x + end.x) / 2 mid_y = (start.y + end.y) / 2 dist = math.hypot(end.x - start.x, end.y - start.y) spread = dist * 0.3 cp = Point( int(mid_x + random.uniform(-spread, spread)), int(mid_y + random.uniform(-spread, spread)), ) cps.append(cp) cps.append(end) # De Casteljau's algorithm for smooth Bezier interpolation for i in range(1, num_steps + 1): t = i / num_steps t = self._ease_in_out(t) # Slow start, fast middle, slow end result = self._de_casteljau(cps, t) points.append(result) return points def _ease_in_out(self, t: float) -> float: """Ease-in-out for natural acceleration/deceleration.""" if t < 0.5: return 2 * t * t return -1 + (4 - 2 * t) * t
"""The LLM receives a comprehensive gameplay instruction set.""" OSRS_SYSTEM_PROMPT = """ You are an autonomous OSRS player. Analyze each screenshot and return JSON actions. ## CRITICAL RULES - ALWAYS return 2-3 actions per turn - Your LAST action should be a click_minimap to keep moving - NEVER open the Settings menu — it blocks the game view - FINISH what you start: combat, tree chopping, conversations ## ACTIVITY PRIORITY 1. CHOP TREES — click any tree to gather logs 2. COMBAT — attack nearby creatures, pick up drops 3. TALK TO NPCs — engage in conversation 4. PICK UP ITEMS — bones, coins, weapons, everything! 5. EXPLORE — walk to new towns, castles, bridges ## AVAILABLE ACTIONS - click(x, y) — left-click game coordinates - right_click(x, y) — context menu - click_minimap(x, y) — navigate via minimap - type_text(text) — type in chat - press_key(key) — press keyboard key - click_inventory_slot(slot) — interact with item - rotate_camera(direction, duration) — camera control - wait(min, max) — brief pause """
class MemoryType(Enum): """12 categories of persistent agent memory.""" OBSERVATION = "observation" # What was seen on screen ACTION = "action_result" # Successful action outcomes LOCATION = "location" # Map knowledge & landmarks NPC = "npc" # NPC behavior & dialogue QUEST = "quest" # Quest progress SKILL = "skill" # Training techniques ITEM = "item" # Item properties COMBAT = "combat" # Monster knowledge DEATH = "death" # Learn from mistakes FAILURE = "failure" # Failed attempts NAVIGATION = "navigation" # Path instructions STRATEGY = "strategy" # General approaches def get_context_for_situation(self, query: str) -> str: """Retrieve semantically relevant memories for the LLM.""" results = self.collection.query( query_texts=[query], n_results=settings.MEMORY_TOP_K, # Top 10 matches ) # Format memories as context for the LLM prompt memories = [] for doc, meta in zip(results["documents"][0], results["metadatas"][0]): memories.append(f"[{meta['type']}] {doc}") return "\n".join(memories)
@dataclass class Goal: """Hierarchical goal with prerequisites and retry logic.""" id: str name: str description: str status: GoalStatus # PENDING | ACTIVE | COMPLETED | FAILED priority: int # 1-10, higher = more urgent parent_id: Optional[str] children_ids: list[str] prerequisites: list[str] # Goal IDs that must complete first attempts: int = 0 max_attempts: int = 10 def get_next_goal(self) -> Optional[Goal]: """Select highest-priority goal with met prerequisites.""" candidates = [ g for g in self.goals.values() if g.status == GoalStatus.PENDING and all( self.goals[p].status == GoalStatus.COMPLETED for p in g.prerequisites if p in self.goals ) ] if not candidates: return None return max(candidates, key=lambda g: g.priority) def complete_goal(self, goal_id: str): """Complete a goal. Auto-cascades to parent if all children done.""" goal = self.goals[goal_id] goal.status = GoalStatus.COMPLETED # Check if parent should auto-complete if goal.parent_id and goal.parent_id in self.goals: parent = self.goals[goal.parent_id] if all(self.goals[c].status == GoalStatus.COMPLETED for c in parent.children_ids): self.complete_goal(parent.id) # Recursive cascade
Primary vision LLM
Core application language
Vector database for memory
Input automation
Image processing
Livestream integration
Thought overlay UI
Claude model provider
Mathematical operations
24/7 hardware host
The agent implements a real-time perception-action loop running at sub-3-second tick intervals. Each cycle captures a lossless screenshot of the game window, applies JPEG compression with perceptual quality optimization, and encodes it as base64 for multimodal inference. The frame is dispatched to Claude Opus 4.6 via Anthropic's vision API, which performs scene decomposition — identifying entities, UI state, spatial relationships, and interactive elements. The model returns structured JSON containing semantic observations, chain-of-thought reasoning, coordinate-precise actions, and memory annotations. Zero game memory access, zero client injection — pure visual grounding.
No. The architecture is deliberately non-invasive. ClaudeScape operates as an entirely separate process with no hooks, no bytecode injection, no memory reads, and no client-side modifications. Screen capture uses the OS display pipeline (identical to screen recording software), and input simulation uses standard HID event dispatch via PyAutoGUI with Bezier-curve humanization. The game client remains 100% vanilla — the agent has no more access than a human sitting at the keyboard.
ClaudeScape runs 24/7 on a single Mac Mini — a deliberate architectural choice. The Mac Mini handles native game rendering, display-pipeline screen capture, local input execution, ChromaDB vector storage, PostgreSQL telemetry pushing, and stuck-detection signal processing. Only the LLM inference is offloaded to Anthropic's servers. This edge-compute model eliminates cloud VM costs, reduces latency on the capture-to-action loop, and enables unattended autonomous operation with automatic session recovery. Total infrastructure: one machine + one API key.
The agent implements retrieval-augmented generation (RAG) over its own experiential history. Every tick, observations, action outcomes, navigation routes, combat encounters, deaths, and strategic insights are encoded as vector embeddings and stored in ChromaDB with typed metadata across 12 memory categories. Before each decision cycle, the agent performs semantic similarity search against its memory store, retrieving the most relevant past experiences for the current context. The result: emergent long-term learning without fine-tuning — the agent genuinely improves over time by recalling what worked and what didn't.
Stuck detection uses a three-layer signal fusion approach. Layer 1: NumPy-based minimap frame differencing with a calibrated threshold to detect movement cessation. Layer 2: NLP-based scene analysis scanning recent observations for obstacle keywords (fences, gates, walls). Layer 3: action sequence pattern matching to detect repetitive click loops. The critical innovation is context-aware suppression — the system distinguishes between "stuck" and "intentionally stationary" by checking combat state, skilling activity, and dialogue engagement. If a force-unstick triggers and combat begins, the override is immediately cancelled. Five-tick threshold, not three — tuned through extensive real-world testing.
Every third tick, the agent serializes its full state — screenshot, observations, vitals, inventory, counters, location, active goals, memory distribution, and action history — into a JSONB blob and upserts it to a PostgreSQL instance on Railway. The Vercel-hosted dashboard polls the REST API every 2 seconds, rendering the live feed terminal, world map with GPS-style tracking, player vitals, activity counters, XP tracker, and session history. Single-row upsert means zero table growth — only the latest state matters. The full pipeline: Mac Mini → PostgreSQL (Railway) → REST API → Vercel CDN → browser.
Fully open source. The entire system — vision engine, action executor, memory store, goal planner, stuck detector, input humanization, desktop overlay, database layer, live dashboard, and deployment config — is available on GitHub. 3,500+ lines of Python, zero proprietary dependencies.