MiroFish-Offline, and How I Decide a Repo Is Worth Building On

Developer Workbench · Codebase Intelligence · Lesson 01

I've been keeping a list. Every week a few open-source repos cross my desk that look, at a glance, like more than a clever weekend project. Most aren't. But every so often one runs a whole architecture I recognize, end to end, on a laptop, no cloud account required. When that happens I flag it, map it, and ask the harder question: does this demonstrate something Totem Protocol is supposed to do?

MiroFish-Offline is the first repo to clear that bar cleanly. So it's the first case study for a small tool I'm building — the v0.1 codebase modeling agent skill — and the first post in this Developer Workbench series. The repo is the example. The system for picking the repo is the point.

What MiroFish-Offline Actually Does

Upload a document — a press release, a policy draft, an earnings report. MiroFish generates hundreds of AI agents, each with its own personality, opinion bias, reaction speed, influence level, and persistent memory. They post on a simulated social platform, argue, and shift positions over simulated hours. When the run finishes, you can interview any single agent and ask why it posted what it posted.

The original MiroFish was built for the Chinese market: Chinese UI, Zep Cloud for graph memory, DashScope for inference. The nikmcfly/MiroFish-Offline fork strips out every cloud dependency. Neo4j Community Edition 5.15 replaces Zep. Ollama (qwen2.5, nomic-embed-text for embeddings) replaces the hosted LLM. The frontend was translated into English string by string. docker compose up -d, pull two models, open localhost:3000. Underneath is OASIS from the CAMEL-AI team, a simulation substrate built for up to a million concurrent agents.

The fork's README puts a clean abstraction between the app and the graph database: a GraphStorage interface you can re-implement against any graph backend, dependency injection through Flask's app.extensions instead of global singletons, and a hybrid retriever weighting vector similarity against keyword search. That discipline matters more to me than the demo. A demo impresses you once. An abstraction you can swap is a component you can build on.

The Pipeline I've Built Three Times Now

Here's why this one stopped me. MiroFish's five stages map almost one-to-one onto the pipeline I keep rebuilding in different clothes.

Graph Build extracts entities and relationships into Neo4j. Environment Setup instantiates the agent population from that graph. Simulation runs them through 23 distinct social actions against an algorithmic feed, so agents react to each other and to what the feed surfaces. A ReportAgent interviews a focus group, queries the graph for evidence, and writes a structured analysis. Interaction lets you talk to the survivors.

Extract, structure, store, query, present. I've built that with InfraNodus and Claude Code skills for intelligence work. I built it again this month inside the Totem Persona intake pipeline, where a forked myKG induces a confidence-scored ontology from a document corpus and exports it as a navigable knowledge graph. Same shape, different domain. MiroFish runs that shape for crowd simulation. The recurrence is the signal. When the same architecture keeps showing up across simulation, intelligence, and identity modeling, that's a structural fact about the work, not a coincidence.

What I Was Missing in 2007, and What I Have Now

Totem Protocol's first artifacts came out of work I was doing in 2007 on complex-adaptive systems, cybernetics, ecological restoration, and resource modeling. I cut my teeth on the COUGAAR agent system — a real agent society with blackboard architectures and dynamic replanning, not a chatbot — and on Petri Nets and the Actor Model, which gave me a formal vocabulary for concurrency and isolated message-passing state before I had a use that justified it.

The agents I designed back then ran inside a handful of formally specified ontologies. That made them legible and rigid in equal measure. Every new domain cost months of knowledge engineering before a single agent could move, because the behavior and the environment were hand-built. If you'd told me in 2007 that nineteen years out an LLM would do the hardest of that work — the behavior, the environment, the natural-language reasoning — I'd have been wide-eyed. It would have meant I could stop grinding on specification and spend my attention deciding where one new capability does the most good, and building it there.

That's what MiroFish is. OASIS supplies the concurrency substrate Petri Nets taught me to want. The LLM supplies the behavioral specification that used to eat the calendar. Neo4j holds the knowledge graph. The old architecture, brought back with renewed power because the expensive part is now cheap.

The System Behind the List

The repo is a case study. The thing I'm actually building is the judgment that flagged it.

For about a week I've been formalizing how I decide whether an open-source library qualifies as a modular component or a case-study example for Totem Protocol. It started as taste and is becoming a skill. The v0.1 codebase modeling agent starts with a codebase-mapping pass — the cartographer workflow, where a team of reader agents and a synthesizer produce a CODEBASE_MAP.md that travels with the repo, so future agents grasp the architecture without re-reading every file. I ran exactly that on the myKG fork: 58 source files across six subsystems, mapped in one pass.

On top of the map sits the evaluation. The criteria I'm converging on:

- It solves a self-contained knowledge-work function that I'd otherwise hand-build. - It's architecturally composable — clean interfaces, swappable backends, dependency injection over global state. - It carries knowledge-graph or ontology structure, because that's the substrate everything else in Totem hangs from. - It runs local-first and provider-agnostic, so no engagement depends on someone else's billing. - It ships enough documentation to be reproduced, not just admired.

MiroFish-Offline scores high on every line except formal specification — there's no Petri Net or OWL layer underneath, the structure is implicit in the code rather than declared. That gap is useful. It tells me exactly where a Totem composition would extend it rather than reinvent it.

The Lineage This Sits In

None of this is new. It's old ideas finally meeting hardware that can run them.

Vannevar Bush described the Memex in 1945: a personal knowledge system you navigate by associative trails rather than hierarchical filing. MiroFish's agents traverse a knowledge graph by exactly that kind of association. Doug Engelbart turned the Memex into the Dynamic Knowledge Repository — a group's living, evolving knowledge base — which is what MiroFish's persistent agent memory becomes when the ReportAgent integrates what the population produced. Bret Victor's work on humane interfaces marks the edge MiroFish hasn't reached: a thousand-agent simulation is still legible mostly as scrolling text, and making systems like this genuinely seeable is one of the open problems I care most about.

Sense Collective exists to supply the missing piece in that lineage — the infrastructure, tools, skills, and education that turn the recently-possible into the immediately-useful. Getting good at AI and agentic systems transfers to almost any field you're already in. And there's a quieter benefit I notice every working day: when the people around you genuinely buy in, a layer of tedious friction drops away. You stop noticing how much that friction was costing you until it's gone.

The Learning Path

This post opens a series, and the series has a Learning Path behind it — the curriculum Sense Collective runs through the Pathfinder app, weekly reading groups, workshops, guided tutorials, use-case demos, capstone case studies, and hackathons. The path this post anchors runs Multi-Agent Systems → Simulation → Predictive Intelligence, and we'll work MiroFish-Offline as the hands-on artifact: stand it up locally, read the architecture against the criteria above, and trace where a Totem composition would extend it.

I'll keep posting from the workbench — short weekly entries on what I built, longer pieces like this one on the sources that gave Totem Protocol its shape over a decade of slow progress and then a sudden sprint. If you want to build alongside us, come to a reading group.

The Details

Repository: github.com/nikmcfly/MiroFish-Offline · AGPL-3.0

Stack: Python / Vue / Neo4j CE 5.15 / Ollama / OASIS (CAMEL-AI)

Hardware: 16 GB RAM minimum, 24 GB VRAM recommended for qwen2.5:32b

Learning Path: Multi-Agent Systems → Simulation → Predictive Intelligence