Mondher

Mondher is a research-grade Formal Concept Analysis platform. It reads formal contexts in five formats, computes formal concepts, mines implication bases and association rules, and renders lattice diagrams as SVG and TikZ.

Mondher is built in Rust with a modular monolith architecture. The same engine powers the command-line tool, the HTTP API, the Python bindings, and the web frontend.

Project status

Phase 1 of the 12-month build plan is in progress. As of Day 91, Mondher ships:

  • Five input/output formats: Burmeister .cxt, CSV/TSV, FIMI .dat, FCA-XML, native JSON
  • NextClosure concept enumeration
  • Duquenne-Guigues canonical implication base
  • Association rule mining
  • Sugiyama-style lattice layout
  • SVG and TikZ rendering
  • Cross-validated correctness against fcaR

Where to next

Quickstart

This is the five-minute tour. By the end you'll have computed a concept lattice, mined some implications, and exported an SVG diagram.

Install

If you have Rust installed:

git clone https://github.com/Feudjio-Anthony/mondher
cd mondher
cargo install --path crates/mondher-bin

This installs mondher to ~/.cargo/bin/. For other ways to install, see the installation page.

Your first context

Mondher reads formal contexts. Create a file birds-fish-dogs.cxt:

B

3
3

bird
fish
dog
can-fly
swims
warm-blooded
X.X
.X.
..X

Each X means "this object has this attribute"; each . means it doesn't. So bird has can-fly and warm-blooded, fish has only swims, and dog has only warm-blooded.

Compute the lattice

mondher compute birds-fish-dogs.cxt

You'll see five formal concepts printed, each with its extent and intent, plus the Hasse diagram's covering edges.

Mine the implication base

mondher implications birds-fish-dogs.cxt

You'll see the canonical Duquenne-Guigues base — the minimum set of "rules" the context entails. For this tiny context: just one rule, can-fly → warm-blooded.

Export a lattice diagram

mondher export --format svg birds-fish-dogs.cxt > lattice.svg

Open lattice.svg in a browser. Five circles connected by five lines — the Hasse diagram of the lattice.

For LaTeX papers, use TikZ instead:

mondher export --format tikz birds-fish-dogs.cxt > lattice.tex

The output goes directly into a \begin{tikzpicture} block ready for \input{lattice.tex} in your .tex source.

What next

  • Other input formats: see formats.
  • Every command and option: see the CLI reference.
  • Mine partial rules (with confidence < 1): mondher rules --help.

Installation

Mondher is in active development; Phase 1 reaches v0.1 around Day 98. Until then, the recommended install is from source.

From source

Requires Rust 1.75 or later. Install Rust via rustup if you don't have it.

git clone https://github.com/Feudjio-Anthony/mondher
cd mondher
cargo install --path crates/mondher-bin

This compiles Mondher in release mode and installs the mondher binary to ~/.cargo/bin/. Make sure that directory is in your PATH:

echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Verify:

mondher --version

Shell completions

Bash:

mondher completions bash > ~/.local/share/bash-completion/completions/mondher

Zsh:

mondher completions zsh > ~/.zsh/completions/_mondher

Fish:

mondher completions fish > ~/.config/fish/completions/mondher.fish

You may need to start a new shell session for the completions to take effect.

Coming in v0.1 (Day 98)

  • Prebuilt binaries for Linux, macOS, and Windows
  • Docker image: docker run feudjio-anthony/mondher
  • Homebrew tap (macOS / Linux)

Context formats

Mondher reads and writes five formats. The CLI auto-detects format from the file extension or content sniffing on the first bytes.

FormatExtensionNotes
Burmeister.cxtThe FCA standard. Plain-text, named objects and attributes.
CSV / TSV.csv, .tsvHeader row, one object per data row.
FIMI.datTransactional format from the FIMI benchmark repository.
FCA-XML.xmlLegacy XML used by ConExp and related tools.
Native JSON.jsonMondher's diff-friendly interchange format.

Burmeister .cxt

The de facto FCA standard. Plain text:

B

<n_objects>
<n_attributes>

<object names>
<attribute names>
<incidence rows: X = present, . = absent>

A 3×3 example:

B

3
3

bird
fish
dog
can-fly
swims
warm-blooded
X.X
.X.
..X

CSV / TSV

A header row of attribute names; one row per object. The first column is the object name; remaining columns are 0/1 (or false/true, or X/., or any of several truthy values).

,can-fly,swims,warm-blooded
bird,1,0,1
fish,0,1,0
dog,0,0,1

Mondher auto-detects the delimiter (comma vs tab) from the first line.

FIMI .dat

The transactional format used by the FIMI itemset-mining benchmark collection. One line per object; each line is the space-separated IDs of the attributes that object has.

0 2
1
2

No names. Mondher synthesizes g0, g1, ..., m0, m1, ... when it reads a FIMI file. Use CSV or .cxt if you need named objects.

FCA-XML

Legacy XML used by ConExp and other tools from the 2000s:

<ConceptualSystem>
  <Context>
    <Attributes>
      <Attribute Name="can-fly"/>
      ...
    </Attributes>
    <Objects>
      <Object>
        <Name>bird</Name>
        <Intent>
          <HasAttribute AttributeIdentifier="0"/>
          ...
        </Intent>
      </Object>
      ...
    </Objects>
  </Context>
</ConceptualSystem>

Verbose, but useful for interop with older datasets.

Native JSON

Mondher's own format, version-tagged and forward-compatible. Best for storing contexts as Git-tracked data.

{
  "mondher_version": "0.1.0",
  "context": {
    "objects": ["bird", "fish", "dog"],
    "attributes": ["can-fly", "swims", "warm-blooded"],
    "incidence": [[true, false, true], [false, true, false], [false, false, true]]
  }
}

Auto-detection

You never have to tell Mondher which format you're using — every subcommand calls the auto-detector. Extension first; content sniffing as fallback.

CLI reference

The mondher binary has seven subcommands. All accept any of the five input formats — no --format flag needed for input (auto-detection handles it).

mondher read

mondher read <PATH>

Reads the file and prints its dimensions. Used to sanity-check that Mondher recognizes the format.

mondher compute

mondher compute <PATH>

Computes every formal concept of the context using NextClosure (Ganter 1984). Prints each concept with its extent, intent, and reduced labeling, then the covering relation as a list of edges.

mondher implications

mondher implications <PATH> [--include-vacuous]

Computes the Duquenne-Guigues canonical implication base. By default hides vacuous implications (those with support = 0); pass --include-vacuous to show them.

mondher rules

mondher rules <PATH> [--min-support N] [--min-confidence F] [--max-rules N]

Mines association rules above given thresholds.

  • --min-support (default 1): minimum number of objects whose intent contains the premise.
  • --min-confidence (default 0.6): minimum value of |extent(conclusion)| / |extent(premise)|, in [0.0, 1.0].
  • --max-rules: cap the number of rules emitted. Useful for very permissive thresholds.

mondher verify

mondher verify <PATH>

Prints a structured report of the context's outputs: concept count, informative implications, vacuous implications, total implications. The format is grep-friendly so it can be diffed against reports from other FCA tools (ConExp, fcaR) for cross-validation. See validation for the methodology.

mondher export

mondher export --format svg|tikz <PATH>

Renders the lattice diagram. svg for browsers and Inkscape; tikz for direct inclusion in LaTeX. Output goes to stdout; pipe to a file:

mondher export --format svg ctx.cxt > diagram.svg
mondher export --format tikz ctx.cxt > diagram.tex

mondher completions

mondher completions bash|zsh|fish

Prints a shell completion script. Save it to the appropriate location for your shell — see the installation page.

Validating Mondher's correctness

Mondher's correctness rests on three layers, each catching a different class of bug:

Layer 1 — Internal invariants

Every algorithm in Mondher carries its mathematical contract as a test. The Galois connection laws (extensivity, antitonicity, idempotence) are verified on hundreds of randomly generated contexts per CI run. The canonical base is verified to be both sound (every implication holds) and complete (every valid implication follows from it). The covering relation is verified to be a true Hasse diagram (no redundant edges, no missing edges).

These property tests catch bugs in our own algorithms without reference to any external tool.

Layer 2 — Canonical corpus

A hand-picked set of small contexts (tests/corpus/) gives us hand-verified concept counts and base sizes. Failures here usually mean a structural change in an algorithm — and the small size makes diagnosis fast.

Layer 3 — External cross-validation

A frozen reference corpus (tests/reference/) records the outputs Mondher must produce for ~10 canonical FCA contexts. Each reference value is independently verified against at least one of:

  • Published literature: e.g., the Living Beings context's concept count of 19 (Ganter and Wille 1999, p. 21).
  • ConExp: the de facto FCA reference tool from 2003 onward.
  • fcaR: the R package, actively maintained as of 2026.
  • conexp-clj: the Clojure rewrite, also actively maintained.

When two of these tools agree on a value, we lock it as the reference. When they disagree, we treat the discrepancy as a research question and document it — disagreement between mature tools is rare and almost always traceable to a known historical bug.

This layer answers the question "is Mondher saying the same thing as the rest of the FCA world?" — not just internal consistency, but objective agreement.

Adding a new reference context

See crates/mondher-engine/tests/reference/README.md for the mechanical procedure. The short version: drop a .cxt file, run it through two reference tools, record the values in corpus.json, run the cross-validation test. The commit message should cite the tools that confirmed the values.

Provisionally-locked entries

Entries whose notes field contains "pending fcaR cross-check" or "pending conexp-clj cross-check" are locked from Mondher's own output because installing the external tool was infeasible at curation time. These entries function as regression checks — they catch changes to Mondher's behavior — but they are not yet true cross-validations. They are upgraded to verified status when the external check is done.

What this gives us

Researchers using Mondher get three guarantees:

  1. Every operation produces output that is internally consistent with FCA theory.
  2. Every operation's output matches the FCA literature on canonical examples.
  3. Mondher disagrees with no other mature FCA tool on any context in the reference corpus.

Format coverage

Mondher reads five context formats interchangeably:

  • Burmeister .cxt — the de facto FCA standard.
  • CSV / TSV — what spreadsheets export.
  • FIMI .dat — the standard transactional format used in itemset-mining benchmarks (Mushroom, Adult, Chess, BMS-*).
  • FCA-XML — the legacy ConExp-era XML format.
  • Native JSON — Mondher's own diff-friendly interchange format.

The binary auto-detects format from file extension or content; no --format flag is needed. Cross-format consistency is verified by the test suite: a context written in any one of these formats and read back produces the same incidence matrix.

This matters for adoption: researchers can throw Mondher at whatever files they already have — no manual conversion step — and get the same answers as their existing toolchain.

The cost of building this is real — adding a reference context takes 30–60 minutes of cross-tool comparison. The payoff is the credibility needed for Mondher to be cited in FCA research.

Architecture

Mondher is a modular monolith. A single Rust binary contains all the algorithms; surfaces (CLI, HTTP API, Python bindings, web app) adapt that engine to different entry points.

The current Phase 1 surface set is the CLI; Phase 2 (Days 99-182) adds the API server, Python bindings, and web frontend, all driven by the same engine.

Crate layout

crates/ ├─ mondher-core Algebra: Context, Concept, Lattice, Implication ├─ mondher-engine Algorithms: NextClosure, canonical_base, association_rules ├─ mondher-formats File I/O: Burmeister, CSV, FIMI, FCA-XML, JSON, auto-detect ├─ mondher-layout Visualization: Sugiyama layout, SVG renderer, TikZ renderer ├─ mondher-cli Clap subcommands and handlers ├─ mondher-bin Tiny entry point delegating to mondher-cli └─ mondher-storage In-memory + future PostgreSQL storage

The dependency graph is strictly acyclic. mondher-core has no internal dependencies; mondher-engine depends only on mondher-core; mondher-formats depends on mondher-core. Each crate is fully tested in isolation.

Design principles

Strong types over stringly-typed data

ObjectId(u32) and AttributeId(u32) are distinct newtypes that can't be mixed up. The compiler catches "I passed an attribute index where an object index was expected" before such code ever runs.

Deterministic output

Every algorithm produces output in a canonical order (lectic order for concepts and implications; sorted by source/target for covering edges). Running NextClosure twice on the same context produces byte-identical results, which makes content-hashing analyses meaningful.

Property-based correctness

For each algorithm, we encode its mathematical contract as a property test that runs on hundreds of random contexts per CI push. The Galois laws, lattice soundness/completeness, and Hasse-diagram correctness are all verified continuously.

External validation

Beyond internal property tests, a frozen reference corpus records expected outputs from fcaR for a set of canonical contexts. Mondher's CI checks agreement on every push. Failures point either to a Mondher regression or a corpus error — either way, useful signal.

Modular monolith

All algorithms live in mondher-engine. Surfaces (CLI, future API, future Python) are thin adapters. Adding a new algorithm makes it available from every surface simultaneously, with minimal adapter code.

Algorithm references

ComponentAlgorithmReference
Concept generationNextClosureGanter 1984
Implication baseDuquenne-Guigues via NextClosure on pseudo-intentsStumme 1996
Association rulesExhaustive subset enumerationAgrawal et al. 1993 (background)
LayoutSugiyama layered methodSugiyama, Tagawa, Toda 1981
Covering relationO(|L|²) naive checkStandard
Reduced labelingSmallest-extent / largest-intent searchGanter and Wille 1999

Phase 2 will add faster algorithms (In-Close5, FCbO, AddIntent) for large contexts. Phase 3 will add LinCbO and parallel CbO via Rayon. The ConceptGenerator trait abstracts over the algorithm choice, so existing code does not change.

Storage layer

Mondher's persistence layer is a small set of async traits — one per domain slice — plus production implementations against PostgreSQL, S3-compatible blob storage, and Redis. The same traits are implemented by in-memory backends for tests.

This page documents the storage layer's shape, conventions, and operational characteristics. Source lives in crates/mondher-storage/.

Traits

Five traits cover everything persistent in Mondher today:

TraitWhat it persistsProduction backend
UserStoreUser accountsPostgreSQL users table
WorkspaceStoreWorkspaces and analysesPostgreSQL workspaces + analyses
AnnotationStoreAnnotations and threaded commentsPostgreSQL annotations + comments
ContextBlobRaw context bytes keyed by hashS3 / MinIO bucket
LatticeCacheComputed lattices keyed by hashRedis with TTL

Every method is async fn via the async_trait macro so the traits remain object-safe and can be used as Arc<dyn TraitName>. Errors are returned as StorageResult<T> — a Result<T, StorageError> alias.

Errors

StorageError is deliberately small: four tuple variants covering the operations callers care about distinguishing.

#![allow(unused)]
fn main() {
pub enum StorageError {
    NotFound(String),
    Conflict(String),
    Backend(String),
    Unavailable(String),
}
}

NotFound and Conflict are expected — callers should match on them and respond accordingly (404 vs 409 in an API). Backend and Unavailable are unexpected — they propagate as 500-class errors.

Driver-specific errors (sqlx, AWS SDK, redis) are mapped to these four variants at the backend boundary. A consumer never sees a sqlx::Error or an aws_sdk_s3::Error; they see only StorageError.

Composition

Storage is the composition root: one struct holding Arc<dyn ...> for each trait. Two constructors:

  • Storage::from_env() — reads DATABASE_URL, REDIS_URL, MINIO_URL etc. and builds production backends. Uses StorageConfig::from_env() for pool sizing and timeouts.
  • Storage::in_memory() — builds an all-in-memory stack with no external services. Use for unit tests and CI.
#![allow(unused)]
fn main() {
// Production binary
let storage = Storage::from_env().await?;

// Tests
let storage = Storage::in_memory();
}

The Arc<dyn ...> indirection means consumers can be tested by handing them mock implementations of each trait. The HTTP API (Days 113+) holds Arc<Storage> in its app state and clones it into each handler.

Configuration

StorageConfig exposes seven tunables, all settable via environment variables. Defaults are sized for the bundled docker compose stack on a developer laptop.

SettingDefaultEnv var
Postgres max connections10MONDHER_PG_MAX_CONNECTIONS
Postgres acquire timeout5 sMONDHER_PG_ACQUIRE_TIMEOUT_MS
Postgres connection max lifetime30 minMONDHER_PG_MAX_LIFETIME_SECS
Postgres idle timeout10 minMONDHER_PG_IDLE_TIMEOUT_SECS
S3 call timeout10 sMONDHER_S3_CALL_TIMEOUT_MS
Redis call timeout2 sMONDHER_REDIS_CALL_TIMEOUT_MS
Postgres max retries3MONDHER_PG_MAX_RETRIES

The defaults are conservative — production tuning depends on workload and infrastructure. For example, a deployment behind a load balancer with N application instances and a Postgres pool sized to M total connections should set MONDHER_PG_MAX_CONNECTIONS to M/N (leaving headroom for migrations and one-off scripts).

Retries and timeouts

Two helpers wrap storage operations with operational guarantees:

with_retry

Retries transient PostgreSQL errors with bounded exponential backoff. Transient means:

  • StorageError::Unavailable — any.
  • StorageError::Backend containing one of these SQLSTATEs:
    • 40001 serialization failure
    • 40P01 deadlock detected
    • 08006 connection broken
    • 08000 connection exception (generic)
    • 57P03 cannot connect now (server starting)

Non-transient errors (NotFound, Conflict, malformed input) are returned immediately. The default budget is 3 attempts (initial + 2 retries), capped at 1 second of backoff per attempt.

#![allow(unused)]
fn main() {
let user = with_retry(3, || storage.users.get(id)).await?;
}

with_timeout

Wraps any async operation with a hard deadline. Operations that don't complete within the budget return StorageError::Unavailable.

#![allow(unused)]
fn main() {
let user = with_timeout(Duration::from_secs(5), storage.users.get(id))
    .await?;
}

Typical HTTP handlers allocate part of their request budget to storage: "I have 30 seconds total; storage gets 5 seconds before I 503."

The retry and timeout helpers compose; wrap a retry inside a timeout so retries can't cumulatively exceed the deadline.

Swap-in patterns

For unit tests

Use Storage::in_memory(). Every trait method has an in-memory implementation that matches the production semantics (idempotency, not-found behavior, conflict detection, FK-like cascades).

For integration tests with one specific backend

Construct each store directly and inject:

#![allow(unused)]
fn main() {
let pool = sqlx::PgPool::connect(&database_url).await?;
let users = Arc::new(PostgresUserStore::new(pool));
let storage = Storage {
    users,
    workspaces: Arc::new(InMemoryWorkspaceStore::default()),
    // ...
};
}

This lets you exercise the real Postgres path while keeping other stores in-memory.

For mocking specific behavior

Implement a trait directly:

#![allow(unused)]
fn main() {
struct AlwaysConflictUserStore;

#[async_trait]
impl UserStore for AlwaysConflictUserStore {
    async fn create(&self, _: User) -> StorageResult<()> {
        Err(StorageError::Conflict("forced conflict".into()))
    }
    // ... other methods
}
}

Arc<dyn UserStore> accepts any type implementing the trait. The HTTP handlers don't care whether they're talking to Postgres, an in-memory HashMap, or a deliberately-broken mock.

Migrations

Migrations live in crates/mondher-storage/migrations/ and are managed by sqlx-cli. Filenames are timestamp-prefixed; SQLx applies them in order.

Common operations:

# Apply pending migrations
make db-migrate

# Drop everything and re-migrate from scratch
make db-reset

# Add a new migration
sqlx migrate add --source crates/mondher-storage/migrations my_new_table

After writing any new sqlx::query! macro:

make sqlx-prepare

This regenerates .sqlx/, the offline query cache that CI uses. Commit the changes; CI requires it.

Fixtures

make fixtures-load populates the dev database with one researcher (Alice), one workspace, one analysis, two annotations, and one two-message comment thread. All IDs are deterministic (00000000-0000-0000-0000-000000000001 for Alice, etc.) so manual exploration and tests can hard-code references.

The fixture data does not run in CI and is not part of any production deployment. It exists strictly for local development.

Benchmarks

cargo bench -p mondher-storage runs a criterion suite measuring the in-memory backends. Results land in target/criterion/; the HTML report at target/criterion/report/index.html includes plots and statistical analysis.

We do not benchmark PostgreSQL, Redis, or S3/MinIO. Those numbers depend on external service performance and have no stable meaning across hardware. Production performance is measured through application-level observability, not microbenchmarks.

Typical in-memory numbers (on a 2024-class developer laptop):

  • UserStore::create: ~500 ns - 1.5 µs
  • UserStore::get (hit): ~80 ns
  • WorkspaceStore::get_workspace: ~80 ns
  • ContextBlob::put (1 KB): ~1 µs
  • LatticeCache::get (hit): ~150 ns

Sub-microsecond reads, low-microsecond writes. The trait dispatch and lock acquisition dominate; the data structures themselves cost almost nothing.

When something goes wrong

SymptomLikely causeFix
Tests pass locally, fail in CI.sqlx/ cache outdatedmake sqlx-prepare, commit
cargo build fails with "DATABASE_URL not set"sqlx macro can't find live DB and no offline cacheexport DATABASE_URL or set SQLX_OFFLINE=true
Tests pass, then suddenly start failingStale migration state in dev DBmake db-reset
make fixtures-load errors on duplicate keysFixtures already loadedRun make db-reset first
Storage tests time out in CIService slow to start (rare)Rerun the job; if persistent, paste the log

Phase 3 considerations

The current layer is sized for v0.2. Things we know we'll want before production but haven't implemented:

  • Read replicas: PgPoolOptions only takes one URL. Phase 3 may introduce a PgRouter that picks read vs write pools.
  • Distributed tracing: SQL queries should report span context for observability. Tracing is added in Phase 3 alongside the API server.
  • Connection draining: graceful shutdown should let in-flight queries complete before closing pools. Currently we rely on process exit cleanup.
  • Auditing: a who-changed-what-when log table. Phase 4 territory.

These are intentional gaps. Shipping v0.2 with the current shape is the priority; the layer is built to extend without breaking.