Reference

Git-First Architecture

Status note

The git-first storage layer is functional and used in production for artifact ingest, requirement persistence, workflow entities, and decision logging. The transition from Supabase-first to git-first as the sole persistence layer is ongoing.

Principle

Git is the authoritative record for all engineering context in M45. Every entity — requirement, artifact, workflow state, decision — is a versioned file in a git repository. The database (Postgres in cloud, SQLite on-prem) is a derived index that can be dropped and rebuilt from the git history without data loss.

This separation enables cloud, on-prem, and air-gapped deployments to share the same canonical storage contract.


Workspaces

A workspace is the storage-layer representation of a project. Each project in M45 has exactly one workspace, and each workspace is a self-contained git repository. The workspace holds everything that belongs to the project: ingested artifacts, extracted requirements, workflow state (iterations, review units, hypotheses, exceptions, recommendations), and decision records.

Workspaces are identified by the project ID. When a project is created or first accessed, M45 initializes a git repository for it with a manifest describing the project name, owner, and — where applicable — the applicable standard and assurance level (e.g. DO-178C, DAL-A).

Workspaces are isolated from each other at the git level. Cross-project queries — such as searching requirements across multiple projects — are handled by the database index layer, not by linking repositories. This keeps each workspace self-contained and independently portable.


Why git

Auditability. Every change is an immutable, content-addressed commit. Modifying any file changes the commit hash. The full inference and decision chain is preserved.

Portability. A workspace repository is self-contained. It can be cloned, shipped on USB, or deployed air-gapped. No database connection is required to read or inspect the data.

Compliance. Files are human-readable YAML and Markdown. A DER can open them in a text editor. No proprietary tooling is required to review what M45 has produced.

Rubberstamp resistance. The commit history records every review decision — who accepted, rejected, or edited each hypothesis, exception, and recommendation, and when. Patterns like bulk approval without meaningful review become visible in the log. This does not prevent rubberstamping, but it makes it auditable and harder to do quietly.

AI-native. The file structure is self-documenting. LLMs can read and reason over workspace contents natively.


Workspace repository layout

Each workspace is a git repository stored under the configured workspaces root directory. The layout is:

workspace-repo/
├── .m45/
│   ├── manifest.yaml           # Workspace metadata
│   └── schema-version          # Current schema version
├── artifacts/
│   ├── specs/reqif/
│   │   ├── {filename}.reqif        # Original artifact
│   │   ├── {filename}.meta.yaml    # Artifact metadata sidecar
│   │   └── {filename}.extracted.md # AI-readable extraction
│   └── _events/
│       └── {timestamp}--{artifactId}.yaml  # Ingest event records
├── requirements/
│   ├── {requirementId}.yaml    # Individual requirement files
│   └── edges.yaml              # Traceability graph
├── workflows/
│   ├── {workflowId}/
│   │   ├── config.yaml         # Iteration metadata and status
│   │   ├── review-units/
│   │   │   └── {reviewUnitId}.yaml
│   │   ├── hypotheses/
│   │   │   ├── {hypothesisId}.yaml
│   │   │   └── {hypothesisId}.rationale.md
│   │   ├── exceptions/
│   │   │   └── {exceptionId}.yaml
│   │   ├── recommendations/
│   │   │   ├── {recommendationId}.yaml
│   │   │   └── bundles/
│   │   │       └── {bundleId}.yaml
│   │   └── decisions/
│   │       └── run-{seq}-{runType}/
│   │           ├── run.yaml    # Decision run metadata
│   │           └── steps.jsonl # Append-only decision steps
│   └── regulatory-intelligence/
│       ├── answers/
│       │   └── {answerId}.yaml
│       └── chat-snapshots/
│           └── {snapshotId}.yaml
├── decisions/                  # Top-level decision records
│   └── {decisionBucket}/
│       └── run-{seq}-{runType}/
│           ├── run.yaml
│           └── steps.jsonl
└── .index/ (gitignored)
    └── cache.sqlite            # Local rebuildable index

Workspace manifest

The manifest at .m45/manifest.yaml identifies the workspace:

version: 1
workspace:
  id: string
  name: string
  description: string
  createdAt: ISO8601
  createdBy: string
  system:
    standard: "DO-178C"
    level: "DAL-A"

Entity storage

All entities are stored as YAML files. Each entity type has its own directory and schema.

Requirements

Each requirement is a standalone YAML file in requirements/. Fields include the source artifact reference, source identifier, title, text, object type, attributes, and import metadata. Requirements are keyed by a sanitized source key derived from the original artifact.

The edges.yaml file captures the traceability graph — relationships like trace links and decomposition — imported from the source artifact.

Workflow entities

Iterations, review units, hypotheses, exceptions, recommendations, and recommendation bundles are stored as YAML files under workflows/{workflowId}/. The config.yaml at the workflow root holds iteration-level metadata and status.

Decision records

Each decision run is a directory containing run.yaml (metadata: type, status, triggering actor, timestamps) and steps.jsonl (an append-only stream of decision steps). Each step records the actor type (AI or human), model ID, input and output snapshots, duration, and status. Decision records can live under a specific workflow or at the top-level decisions/ directory.

Artifacts

Original artifacts are stored alongside a metadata sidecar (.meta.yaml) and an AI-readable extraction (.extracted.md). Ingest events are recorded in artifacts/_events/ as timestamped YAML files.


Versioning and history

Every write-and-commit cycle creates an immutable git commit. The commit hash is returned to the caller. Point-in-time recovery is supported — any file can be read at any past commit.

Commit messages follow a semantic pattern:

  • Import ReqIF: {filename} ({n} created, {n} updated, {n} deleted, {n} edges)
  • Create iteration workflow config: {id}
  • Update hypothesis: {id}
  • Decision run completed: {runType}/run-{seq}

Deterministic identity

Artifact IDs are derived from sha256(projectId + idempotencyKey), not random UUIDs. This means retried ingests converge on the same canonical ID. A unique partial index prevents duplicates.


Concurrency handling

The adapter tracks dirty paths per workspace using an optimistic locking strategy. After each file write, the path is marked dirty. On commit, only tracked dirty paths are staged. Before committing, the adapter checks whether the parent HEAD has changed since the operation started. If the remote has diverged, the adapter attempts a fast-forward merge or aborts with conflict detection.


Remote synchronization

Workspaces can be configured with a remote git repository. When a remote is present, the sync cycle is:

  1. Fetch from origin/main
  2. Compare local HEAD to remote HEAD
  3. Fast-forward if possible; merge with conflict detection if diverged
  4. Push local commits to remote

Push can be configured as auto-push-on-commit or as required sync on critical commits (with retry and exponential backoff).


Deployment models

M45 uses standard git protocols for remote synchronization and is not locked to any specific hosting provider.

Cloud. The default integration is GitLab, but any git remote that supports HTTP transport can serve as the remote backend — including GitHub, Bitbucket, Azure DevOps, or a self-hosted git server. The database index runs on Postgres (via Supabase in the current cloud deployment).

On-premises and air-gapped. Workspaces can operate without any remote at all. The local git repository remains fully functional as the source of truth. The database index runs on SQLite, stored alongside the workspace. A remote can be added later when connectivity or policy permits.


Relationship to the database

The database is a derived index, not the source of truth. Both Postgres (cloud via Supabase) and SQLite (on-prem) are supported as index targets. The index can be rebuilt deterministically by replaying the git workspace at HEAD.

In the current transitional state, some write paths still go through Supabase first. The migration is structured as capability gates (A through H), most of which are complete. The remaining work focuses on retiring legacy bypass writes and decoupling auth from Supabase.


Security properties

Immutability. Commit hashes are content-addressed (Merkle tree). Modifying any file changes the hash. Protected refs prevent force pushes.

Auditability. Decision steps are append-only. Each step records actor, timestamp, inputs, outputs, and status. Git log provides the complete change history.

Portability. Repositories are self-contained and can be cloned, transferred, or air-gapped without external dependencies.

Access control. Workspace-level access is gated by the remote provider's group or organization membership. Entity-level access uses attribute-based policies (clearance and compartments), enforced on all reads.