Mondher
Mondher is a research-grade Formal Concept Analysis platform. It reads formal contexts in five formats, computes formal concepts, mines implication bases and association rules, and renders lattice diagrams as SVG and TikZ.
Mondher is built in Rust with a modular monolith architecture. The same engine powers the command-line tool, the HTTP API, the Python bindings, and the web frontend.
Project status
Phase 1 of the 12-month build plan is in progress. As of Day 91, Mondher ships:
- Five input/output formats: Burmeister
.cxt, CSV/TSV, FIMI.dat, FCA-XML, native JSON - NextClosure concept enumeration
- Duquenne-Guigues canonical implication base
- Association rule mining
- Sugiyama-style lattice layout
- SVG and TikZ rendering
- Cross-validated correctness against fcaR
Where to next
- New to Mondher? Start with the quickstart.
- Need a particular file format? See the formats reference.
- Want every command and flag? See the CLI reference.
- Curious about how correctness is verified? See the validation methodology.
Quickstart
This is the five-minute tour. By the end you'll have computed a concept lattice, mined some implications, and exported an SVG diagram.
Install
If you have Rust installed:
git clone https://github.com/Feudjio-Anthony/mondher
cd mondher
cargo install --path crates/mondher-bin
This installs mondher to ~/.cargo/bin/. For other ways to install,
see the installation page.
Your first context
Mondher reads formal contexts. Create a file birds-fish-dogs.cxt:
B
3
3
bird
fish
dog
can-fly
swims
warm-blooded
X.X
.X.
..X
Each X means "this object has this attribute"; each . means it
doesn't. So bird has can-fly and warm-blooded, fish has only
swims, and dog has only warm-blooded.
Compute the lattice
mondher compute birds-fish-dogs.cxt
You'll see five formal concepts printed, each with its extent and intent, plus the Hasse diagram's covering edges.
Mine the implication base
mondher implications birds-fish-dogs.cxt
You'll see the canonical Duquenne-Guigues base — the minimum set of
"rules" the context entails. For this tiny context: just one rule,
can-fly → warm-blooded.
Export a lattice diagram
mondher export --format svg birds-fish-dogs.cxt > lattice.svg
Open lattice.svg in a browser. Five circles connected by five lines
— the Hasse diagram of the lattice.
For LaTeX papers, use TikZ instead:
mondher export --format tikz birds-fish-dogs.cxt > lattice.tex
The output goes directly into a \begin{tikzpicture} block ready
for \input{lattice.tex} in your .tex source.
What next
- Other input formats: see formats.
- Every command and option: see the CLI reference.
- Mine partial rules (with confidence < 1):
mondher rules --help.
Installation
Mondher is in active development; Phase 1 reaches v0.1 around Day 98. Until then, the recommended install is from source.
From source
Requires Rust 1.75 or later. Install Rust via rustup if you don't have it.
git clone https://github.com/Feudjio-Anthony/mondher
cd mondher
cargo install --path crates/mondher-bin
This compiles Mondher in release mode and installs the mondher binary
to ~/.cargo/bin/. Make sure that directory is in your PATH:
echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
Verify:
mondher --version
Shell completions
Bash:
mondher completions bash > ~/.local/share/bash-completion/completions/mondher
Zsh:
mondher completions zsh > ~/.zsh/completions/_mondher
Fish:
mondher completions fish > ~/.config/fish/completions/mondher.fish
You may need to start a new shell session for the completions to take effect.
Coming in v0.1 (Day 98)
- Prebuilt binaries for Linux, macOS, and Windows
- Docker image:
docker run feudjio-anthony/mondher - Homebrew tap (macOS / Linux)
Context formats
Mondher reads and writes five formats. The CLI auto-detects format from the file extension or content sniffing on the first bytes.
| Format | Extension | Notes |
|---|---|---|
| Burmeister | .cxt | The FCA standard. Plain-text, named objects and attributes. |
| CSV / TSV | .csv, .tsv | Header row, one object per data row. |
| FIMI | .dat | Transactional format from the FIMI benchmark repository. |
| FCA-XML | .xml | Legacy XML used by ConExp and related tools. |
| Native JSON | .json | Mondher's diff-friendly interchange format. |
Burmeister .cxt
The de facto FCA standard. Plain text:
B
<n_objects>
<n_attributes>
<object names>
<attribute names>
<incidence rows: X = present, . = absent>
A 3×3 example:
B
3
3
bird
fish
dog
can-fly
swims
warm-blooded
X.X
.X.
..X
CSV / TSV
A header row of attribute names; one row per object. The first column
is the object name; remaining columns are 0/1 (or false/true, or
X/., or any of several truthy values).
,can-fly,swims,warm-blooded
bird,1,0,1
fish,0,1,0
dog,0,0,1
Mondher auto-detects the delimiter (comma vs tab) from the first line.
FIMI .dat
The transactional format used by the FIMI itemset-mining benchmark collection. One line per object; each line is the space-separated IDs of the attributes that object has.
0 2
1
2
No names. Mondher synthesizes g0, g1, ..., m0, m1, ... when it
reads a FIMI file. Use CSV or .cxt if you need named objects.
FCA-XML
Legacy XML used by ConExp and other tools from the 2000s:
<ConceptualSystem>
<Context>
<Attributes>
<Attribute Name="can-fly"/>
...
</Attributes>
<Objects>
<Object>
<Name>bird</Name>
<Intent>
<HasAttribute AttributeIdentifier="0"/>
...
</Intent>
</Object>
...
</Objects>
</Context>
</ConceptualSystem>
Verbose, but useful for interop with older datasets.
Native JSON
Mondher's own format, version-tagged and forward-compatible. Best for storing contexts as Git-tracked data.
{
"mondher_version": "0.1.0",
"context": {
"objects": ["bird", "fish", "dog"],
"attributes": ["can-fly", "swims", "warm-blooded"],
"incidence": [[true, false, true], [false, true, false], [false, false, true]]
}
}
Auto-detection
You never have to tell Mondher which format you're using — every subcommand calls the auto-detector. Extension first; content sniffing as fallback.
CLI reference
The mondher binary has seven subcommands. All accept any of the five
input formats — no --format flag needed for input
(auto-detection handles it).
mondher read
mondher read <PATH>
Reads the file and prints its dimensions. Used to sanity-check that Mondher recognizes the format.
mondher compute
mondher compute <PATH>
Computes every formal concept of the context using NextClosure (Ganter 1984). Prints each concept with its extent, intent, and reduced labeling, then the covering relation as a list of edges.
mondher implications
mondher implications <PATH> [--include-vacuous]
Computes the Duquenne-Guigues canonical implication base. By default
hides vacuous implications (those with support = 0); pass
--include-vacuous to show them.
mondher rules
mondher rules <PATH> [--min-support N] [--min-confidence F] [--max-rules N]
Mines association rules above given thresholds.
--min-support(default1): minimum number of objects whose intent contains the premise.--min-confidence(default0.6): minimum value of|extent(conclusion)| / |extent(premise)|, in[0.0, 1.0].--max-rules: cap the number of rules emitted. Useful for very permissive thresholds.
mondher verify
mondher verify <PATH>
Prints a structured report of the context's outputs: concept count, informative implications, vacuous implications, total implications. The format is grep-friendly so it can be diffed against reports from other FCA tools (ConExp, fcaR) for cross-validation. See validation for the methodology.
mondher export
mondher export --format svg|tikz <PATH>
Renders the lattice diagram. svg for browsers and Inkscape; tikz
for direct inclusion in LaTeX. Output goes to stdout; pipe to a file:
mondher export --format svg ctx.cxt > diagram.svg
mondher export --format tikz ctx.cxt > diagram.tex
mondher completions
mondher completions bash|zsh|fish
Prints a shell completion script. Save it to the appropriate location for your shell — see the installation page.
Validating Mondher's correctness
Mondher's correctness rests on three layers, each catching a different class of bug:
Layer 1 — Internal invariants
Every algorithm in Mondher carries its mathematical contract as a test. The Galois connection laws (extensivity, antitonicity, idempotence) are verified on hundreds of randomly generated contexts per CI run. The canonical base is verified to be both sound (every implication holds) and complete (every valid implication follows from it). The covering relation is verified to be a true Hasse diagram (no redundant edges, no missing edges).
These property tests catch bugs in our own algorithms without reference to any external tool.
Layer 2 — Canonical corpus
A hand-picked set of small contexts (tests/corpus/) gives us
hand-verified concept counts and base sizes. Failures here usually mean
a structural change in an algorithm — and the small size makes
diagnosis fast.
Layer 3 — External cross-validation
A frozen reference corpus (tests/reference/) records the outputs
Mondher must produce for ~10 canonical FCA contexts. Each reference
value is independently verified against at least one of:
- Published literature: e.g., the Living Beings context's concept count of 19 (Ganter and Wille 1999, p. 21).
- ConExp: the de facto FCA reference tool from 2003 onward.
- fcaR: the R package, actively maintained as of 2026.
- conexp-clj: the Clojure rewrite, also actively maintained.
When two of these tools agree on a value, we lock it as the reference. When they disagree, we treat the discrepancy as a research question and document it — disagreement between mature tools is rare and almost always traceable to a known historical bug.
This layer answers the question "is Mondher saying the same thing as the rest of the FCA world?" — not just internal consistency, but objective agreement.
Adding a new reference context
See crates/mondher-engine/tests/reference/README.md for the
mechanical procedure. The short version: drop a .cxt file, run it
through two reference tools, record the values in corpus.json, run
the cross-validation test. The commit message should cite the tools
that confirmed the values.
Provisionally-locked entries
Entries whose notes field contains "pending fcaR cross-check" or
"pending conexp-clj cross-check" are locked from Mondher's own output
because installing the external tool was infeasible at curation time.
These entries function as regression checks — they catch changes
to Mondher's behavior — but they are not yet true cross-validations.
They are upgraded to verified status when the external check is done.
What this gives us
Researchers using Mondher get three guarantees:
- Every operation produces output that is internally consistent with FCA theory.
- Every operation's output matches the FCA literature on canonical examples.
- Mondher disagrees with no other mature FCA tool on any context in the reference corpus.
Format coverage
Mondher reads five context formats interchangeably:
- Burmeister
.cxt— the de facto FCA standard. - CSV / TSV — what spreadsheets export.
- FIMI
.dat— the standard transactional format used in itemset-mining benchmarks (Mushroom, Adult, Chess, BMS-*). - FCA-XML — the legacy ConExp-era XML format.
- Native JSON — Mondher's own diff-friendly interchange format.
The binary auto-detects format from file extension or content; no
--format flag is needed. Cross-format consistency is verified by
the test suite: a context written in any one of these formats and
read back produces the same incidence matrix.
This matters for adoption: researchers can throw Mondher at whatever files they already have — no manual conversion step — and get the same answers as their existing toolchain.
The cost of building this is real — adding a reference context takes 30–60 minutes of cross-tool comparison. The payoff is the credibility needed for Mondher to be cited in FCA research.
Architecture
Mondher is a modular monolith. A single Rust binary contains all the algorithms; surfaces (CLI, HTTP API, Python bindings, web app) adapt that engine to different entry points.
The current Phase 1 surface set is the CLI; Phase 2 (Days 99-182) adds the API server, Python bindings, and web frontend, all driven by the same engine.
Crate layout
crates/ ├─ mondher-core Algebra: Context, Concept, Lattice, Implication ├─ mondher-engine Algorithms: NextClosure, canonical_base, association_rules ├─ mondher-formats File I/O: Burmeister, CSV, FIMI, FCA-XML, JSON, auto-detect ├─ mondher-layout Visualization: Sugiyama layout, SVG renderer, TikZ renderer ├─ mondher-cli Clap subcommands and handlers ├─ mondher-bin Tiny entry point delegating to mondher-cli └─ mondher-storage In-memory + future PostgreSQL storage
The dependency graph is strictly acyclic. mondher-core has no
internal dependencies; mondher-engine depends only on mondher-core;
mondher-formats depends on mondher-core. Each crate is fully tested
in isolation.
Design principles
Strong types over stringly-typed data
ObjectId(u32) and AttributeId(u32) are distinct newtypes that can't
be mixed up. The compiler catches "I passed an attribute index where
an object index was expected" before such code ever runs.
Deterministic output
Every algorithm produces output in a canonical order (lectic order for concepts and implications; sorted by source/target for covering edges). Running NextClosure twice on the same context produces byte-identical results, which makes content-hashing analyses meaningful.
Property-based correctness
For each algorithm, we encode its mathematical contract as a property test that runs on hundreds of random contexts per CI push. The Galois laws, lattice soundness/completeness, and Hasse-diagram correctness are all verified continuously.
External validation
Beyond internal property tests, a frozen reference corpus records expected outputs from fcaR for a set of canonical contexts. Mondher's CI checks agreement on every push. Failures point either to a Mondher regression or a corpus error — either way, useful signal.
Modular monolith
All algorithms live in mondher-engine. Surfaces (CLI, future API,
future Python) are thin adapters. Adding a new algorithm makes it
available from every surface simultaneously, with minimal adapter code.
Algorithm references
| Component | Algorithm | Reference |
|---|---|---|
| Concept generation | NextClosure | Ganter 1984 |
| Implication base | Duquenne-Guigues via NextClosure on pseudo-intents | Stumme 1996 |
| Association rules | Exhaustive subset enumeration | Agrawal et al. 1993 (background) |
| Layout | Sugiyama layered method | Sugiyama, Tagawa, Toda 1981 |
| Covering relation | O(|L|²) naive check | Standard |
| Reduced labeling | Smallest-extent / largest-intent search | Ganter and Wille 1999 |
Phase 2 will add faster algorithms (In-Close5, FCbO, AddIntent) for
large contexts. Phase 3 will add LinCbO and parallel CbO via Rayon.
The ConceptGenerator trait abstracts over the algorithm choice, so
existing code does not change.
Storage layer
Mondher's persistence layer is a small set of async traits — one per domain slice — plus production implementations against PostgreSQL, S3-compatible blob storage, and Redis. The same traits are implemented by in-memory backends for tests.
This page documents the storage layer's shape, conventions, and
operational characteristics. Source lives in crates/mondher-storage/.
Traits
Five traits cover everything persistent in Mondher today:
| Trait | What it persists | Production backend |
|---|---|---|
UserStore | User accounts | PostgreSQL users table |
WorkspaceStore | Workspaces and analyses | PostgreSQL workspaces + analyses |
AnnotationStore | Annotations and threaded comments | PostgreSQL annotations + comments |
ContextBlob | Raw context bytes keyed by hash | S3 / MinIO bucket |
LatticeCache | Computed lattices keyed by hash | Redis with TTL |
Every method is async fn via the async_trait macro so the traits
remain object-safe and can be used as Arc<dyn TraitName>. Errors are
returned as StorageResult<T> — a Result<T, StorageError> alias.
Errors
StorageError is deliberately small: four tuple variants covering the
operations callers care about distinguishing.
#![allow(unused)] fn main() { pub enum StorageError { NotFound(String), Conflict(String), Backend(String), Unavailable(String), } }
NotFound and Conflict are expected — callers should match on them
and respond accordingly (404 vs 409 in an API). Backend and
Unavailable are unexpected — they propagate as 500-class errors.
Driver-specific errors (sqlx, AWS SDK, redis) are mapped to these four
variants at the backend boundary. A consumer never sees a sqlx::Error
or an aws_sdk_s3::Error; they see only StorageError.
Composition
Storage is the composition root: one struct holding Arc<dyn ...>
for each trait. Two constructors:
Storage::from_env()— readsDATABASE_URL,REDIS_URL,MINIO_URLetc. and builds production backends. UsesStorageConfig::from_env()for pool sizing and timeouts.Storage::in_memory()— builds an all-in-memory stack with no external services. Use for unit tests and CI.
#![allow(unused)] fn main() { // Production binary let storage = Storage::from_env().await?; // Tests let storage = Storage::in_memory(); }
The Arc<dyn ...> indirection means consumers can be tested by handing
them mock implementations of each trait. The HTTP API (Days 113+) holds
Arc<Storage> in its app state and clones it into each handler.
Configuration
StorageConfig exposes seven tunables, all settable via environment
variables. Defaults are sized for the bundled docker compose stack on a
developer laptop.
| Setting | Default | Env var |
|---|---|---|
| Postgres max connections | 10 | MONDHER_PG_MAX_CONNECTIONS |
| Postgres acquire timeout | 5 s | MONDHER_PG_ACQUIRE_TIMEOUT_MS |
| Postgres connection max lifetime | 30 min | MONDHER_PG_MAX_LIFETIME_SECS |
| Postgres idle timeout | 10 min | MONDHER_PG_IDLE_TIMEOUT_SECS |
| S3 call timeout | 10 s | MONDHER_S3_CALL_TIMEOUT_MS |
| Redis call timeout | 2 s | MONDHER_REDIS_CALL_TIMEOUT_MS |
| Postgres max retries | 3 | MONDHER_PG_MAX_RETRIES |
The defaults are conservative — production tuning depends on workload
and infrastructure. For example, a deployment behind a load balancer
with N application instances and a Postgres pool sized to M total
connections should set MONDHER_PG_MAX_CONNECTIONS to M/N (leaving
headroom for migrations and one-off scripts).
Retries and timeouts
Two helpers wrap storage operations with operational guarantees:
with_retry
Retries transient PostgreSQL errors with bounded exponential backoff. Transient means:
StorageError::Unavailable— any.StorageError::Backendcontaining one of these SQLSTATEs:40001serialization failure40P01deadlock detected08006connection broken08000connection exception (generic)57P03cannot connect now (server starting)
Non-transient errors (NotFound, Conflict, malformed input) are
returned immediately. The default budget is 3 attempts (initial + 2
retries), capped at 1 second of backoff per attempt.
#![allow(unused)] fn main() { let user = with_retry(3, || storage.users.get(id)).await?; }
with_timeout
Wraps any async operation with a hard deadline. Operations that don't
complete within the budget return StorageError::Unavailable.
#![allow(unused)] fn main() { let user = with_timeout(Duration::from_secs(5), storage.users.get(id)) .await?; }
Typical HTTP handlers allocate part of their request budget to storage: "I have 30 seconds total; storage gets 5 seconds before I 503."
The retry and timeout helpers compose; wrap a retry inside a timeout so retries can't cumulatively exceed the deadline.
Swap-in patterns
For unit tests
Use Storage::in_memory(). Every trait method has an in-memory
implementation that matches the production semantics (idempotency,
not-found behavior, conflict detection, FK-like cascades).
For integration tests with one specific backend
Construct each store directly and inject:
#![allow(unused)] fn main() { let pool = sqlx::PgPool::connect(&database_url).await?; let users = Arc::new(PostgresUserStore::new(pool)); let storage = Storage { users, workspaces: Arc::new(InMemoryWorkspaceStore::default()), // ... }; }
This lets you exercise the real Postgres path while keeping other stores in-memory.
For mocking specific behavior
Implement a trait directly:
#![allow(unused)] fn main() { struct AlwaysConflictUserStore; #[async_trait] impl UserStore for AlwaysConflictUserStore { async fn create(&self, _: User) -> StorageResult<()> { Err(StorageError::Conflict("forced conflict".into())) } // ... other methods } }
Arc<dyn UserStore> accepts any type implementing the trait. The HTTP
handlers don't care whether they're talking to Postgres, an in-memory
HashMap, or a deliberately-broken mock.
Migrations
Migrations live in crates/mondher-storage/migrations/ and are managed
by sqlx-cli. Filenames are timestamp-prefixed; SQLx applies them in
order.
Common operations:
# Apply pending migrations
make db-migrate
# Drop everything and re-migrate from scratch
make db-reset
# Add a new migration
sqlx migrate add --source crates/mondher-storage/migrations my_new_table
After writing any new sqlx::query! macro:
make sqlx-prepare
This regenerates .sqlx/, the offline query cache that CI uses. Commit
the changes; CI requires it.
Fixtures
make fixtures-load populates the dev database with one researcher
(Alice), one workspace, one analysis, two annotations, and one
two-message comment thread. All IDs are deterministic
(00000000-0000-0000-0000-000000000001 for Alice, etc.) so manual
exploration and tests can hard-code references.
The fixture data does not run in CI and is not part of any production deployment. It exists strictly for local development.
Benchmarks
cargo bench -p mondher-storage runs a criterion suite measuring the
in-memory backends. Results land in target/criterion/; the HTML
report at target/criterion/report/index.html includes plots and
statistical analysis.
We do not benchmark PostgreSQL, Redis, or S3/MinIO. Those numbers depend on external service performance and have no stable meaning across hardware. Production performance is measured through application-level observability, not microbenchmarks.
Typical in-memory numbers (on a 2024-class developer laptop):
UserStore::create: ~500 ns - 1.5 µsUserStore::get(hit): ~80 nsWorkspaceStore::get_workspace: ~80 nsContextBlob::put(1 KB): ~1 µsLatticeCache::get(hit): ~150 ns
Sub-microsecond reads, low-microsecond writes. The trait dispatch and lock acquisition dominate; the data structures themselves cost almost nothing.
When something goes wrong
| Symptom | Likely cause | Fix |
|---|---|---|
| Tests pass locally, fail in CI | .sqlx/ cache outdated | make sqlx-prepare, commit |
cargo build fails with "DATABASE_URL not set" | sqlx macro can't find live DB and no offline cache | export DATABASE_URL or set SQLX_OFFLINE=true |
| Tests pass, then suddenly start failing | Stale migration state in dev DB | make db-reset |
make fixtures-load errors on duplicate keys | Fixtures already loaded | Run make db-reset first |
| Storage tests time out in CI | Service slow to start (rare) | Rerun the job; if persistent, paste the log |
Phase 3 considerations
The current layer is sized for v0.2. Things we know we'll want before production but haven't implemented:
- Read replicas:
PgPoolOptionsonly takes one URL. Phase 3 may introduce aPgRouterthat picks read vs write pools. - Distributed tracing: SQL queries should report span context for observability. Tracing is added in Phase 3 alongside the API server.
- Connection draining: graceful shutdown should let in-flight queries complete before closing pools. Currently we rely on process exit cleanup.
- Auditing: a
who-changed-what-whenlog table. Phase 4 territory.
These are intentional gaps. Shipping v0.2 with the current shape is the priority; the layer is built to extend without breaking.