Dashboard/ Series/ Developer Workbench/ Codebase Intelligence
Lesson · 01postIP layer 2

The Repo That Passed Every Test: MiroFish-Offline and How I Decide What's Worth Keeping

Audience · technicalDomain · codebase_intelligencepublished
What this gives you

a repeatable way to judge whether an open-source repo is a Totem Protocol component

— Developer Workbench · Codebase Intelligence · Lesson 01 —

I've been keeping a list. Every week a few open-source repositories cross my desk that look, at a glance, like they might be more than a clever weekend project. Most aren't. But every so often one shows up that runs a whole architecture I recognize, end to end, on a laptop, with no cloud account required. When that happens I flag it, map it, and ask a harder question: does this demonstrate something Totem Protocol is supposed to do?

MiroFish-Offline is the first repo to clear that bar cleanly. So it's the first case study for a small tool I'm building, the v0.1 codebase modeling agent skill, and the first post in this Developer Workbench series. The repo is the example. The system for picking the repo is the point.

What MiroFish-Offline Actually Does

Upload a document — a press release, a policy draft, an earnings report. MiroFish generates hundreds of AI agents, each with its own personality, opinion bias, reaction speed, influence level, and persistent memory. Those agents post on a simulated social platform. They argue. They shift positions over simulated hours. When the run finishes, you can interview any single agent and ask why it posted what it posted.

The original MiroFish was built for the Chinese market: Chinese UI, Zep Cloud for graph memory, DashScope for inference. The nikmcfly/MiroFish-Offline fork strips out every cloud dependency. Neo4j Community Edition 5.15 replaces Zep. Ollama (qwen2.5, nomic-embed-text for embeddings) replaces the hosted LLM. The frontend was translated string by string into English. `docker compose up -d`, pull two models, open localhost:3000. The simulation substrate underneath is OASIS from the CAMEL-AI team, built for up to a million concurrent agents.

The fork's README puts a clean abstraction between the app and the graph database: a `GraphStorage` interface you can re-implement against any other graph backend, dependency injection through Flask's `app.extensions` instead of global singletons, and a hybrid retriever weighting vector similarity against keyword search. That discipline matters more to me than the demo. A demo impresses you once. An abstraction you can swap is a component you can build on.

The Pipeline I've Built Three Times Now

Here's why this one stopped me. MiroFish's five stages map almost one-to-one onto the pipeline I keep rebuilding in different clothes.

Graph Build extracts entities and relationships into Neo4j. Environment Setup instantiates the agent population from that graph. Simulation runs them through 23 distinct social actions against an algorithmic feed, so agents react not only to each other but to what the feed surfaces. A ReportAgent then interviews a focus group, queries the graph for evidence, and writes a structured analysis. Interaction lets you talk to the survivors.

Extract, structure, store, query, present. I've built that with InfraNodus and Claude Code skills for intelligence work. I built it again this month inside the Totem Persona intake pipeline, where a forked version of myKG induces a confidence-scored ontology from a document corpus and exports it as a navigable knowledge graph. Same shape, different domain. MiroFish runs that shape for crowd simulation. The recurrence is the signal. When the same architecture keeps showing up across simulation, intelligence, and identity modeling, it's a structural fact about the work, not a coincidence.

What I Was Missing in 2007, and What I Have Now

Totem Protocol's first artifacts came out of work I was doing in 2007 on complex-adaptive systems, cybernetics, ecological restoration, and resource modeling. I cut my teeth on the COUGAAR agent system — a real agent society with blackboard architectures and dynamic replanning, not a chatbot — and on Petri Nets and the Actor Model, which gave me a formal vocabulary for concurrency and isolated message-passing state before I had a use that justified it.

The agents I designed back then ran inside a handful of formally specified ontologies. That made them legible and rigid in equal measure. Every new domain cost months of knowledge engineering before a single agent could move, because behavioral specification and environment construction were hand-built. If you'd told me in 2007 that nineteen years out an LLM would do the hardest of that work — the behavior, the environment, the natural-language reasoning — I'd have been wide-eyed. It would have meant I could stop grinding on specification and spend my attention on the one thing that actually compounds: finding the highest point of leverage in the system and putting a capability there.

That's what MiroFish is. OASIS supplies the concurrency substrate Petri Nets taught me to want. The LLM supplies the behavioral specification that used to eat the calendar. Neo4j holds the knowledge graph. The old architecture, brought back with renewed power because the expensive part is now cheap.

The System Behind the List

The repo is a case study. The thing I'm actually building is the judgment that flagged it.

For about a week I've been formalizing how I decide whether an open-source library qualifies as a modular component or a case-study example for Totem Protocol. It started as taste and is becoming a skill. The v0.1 codebase modeling agent leans on a codebase-mapping pass — I've been using the cartographer workflow, where a team of reader agents and a synthesizer produce a `CODEBASE_MAP.md` that travels with the repo so future agents understand the architecture without re-reading every file. I ran exactly that on the myKG fork: 58 source files across six subsystems, mapped in one pass.

On top of the map sits the evaluation. The criteria I'm converging on:

- It solves a self-contained knowledge-work function that I'd otherwise hand-build. - It's architecturally composable — clean interfaces, swappable backends, dependency injection over global state. - It carries knowledge-graph or ontology structure, because that's the substrate everything else in Totem hangs from. - It runs local-first and provider-agnostic, so no engagement depends on someone else's billing. - It ships enough documentation to be reproduced, not just admired.

MiroFish-Offline scores high on every line except formal specification — there's no Petri Net or OWL layer underneath, the structure is implicit in the code rather than declared. That gap is itself useful. It tells me precisely where a Totem composition would extend it rather than reinvent it.

The Lineage This Sits In

None of this is new. It's old ideas finally meeting hardware that can run them.

Vannevar Bush described the Memex in 1945: a personal knowledge system you navigate by associative trails rather than hierarchical filing. MiroFish's agents traverse a knowledge graph by exactly that kind of association. Doug Engelbart turned the Memex into the Dynamic Knowledge Repository — a group's living, evolving knowledge base — which is what MiroFish's persistent agent memory becomes when the ReportAgent integrates what the population produced. Bret Victor's work on humane interfaces marks the edge MiroFish hasn't reached: a thousand-agent simulation is still legible mostly as scrolling text, and making systems like this genuinely seeable is one of the open problems I care most about.

Sense Collective exists to supply the missing piece in that lineage — the infrastructure, tools, skills, and education that let people actually build the recently-possible into the immediately-useful. The advantage you get from leveling up on AI and agentic systems transfers to almost any field you're already in. And there's a quieter benefit I notice every working day: when the people around you genuinely buy in, a layer of tedious friction drops away. It's the hum outside the window you stopped hearing until it stops, and suddenly the room is quiet.

The Learning Path

This post opens a series, and the series has a Learning Path behind it — the curriculum Sense Collective runs through the Pathfinder app, weekly reading groups, workshops, guided tutorials, use-case demos, capstone case studies, and hackathons. The path this post anchors runs Multi-Agent Systems → Simulation → Predictive Intelligence, and we'll work MiroFish-Offline together as the hands-on artifact: stand it up locally, read the architecture against the criteria above, and trace where a Totem composition would extend it.

I'll keep posting from the workbench — short weekly entries on what I built, longer pieces like this one on the sources that gave Totem Protocol its shape over a decade of slow progress and then a sudden sprint. If you want to build alongside us, come to a reading group. The curriculum has an entry point wherever your work already is.

The Details

Repository: github.com/nikmcfly/MiroFish-Offline · AGPL-3.0

Stack: Python / Vue / Neo4j CE 5.15 / Ollama / OASIS (CAMEL-AI)

Hardware: 16 GB RAM minimum, 24 GB VRAM recommended for qwen2.5:32b

Learning Path: Multi-Agent Systems → Simulation → Predictive Intelligence

developer-workbenchtotem-protocolcodebase-modelingmulti-agentcase-studylearning-path