MG Format
What is the .mg binary format?
The .mg format is the on-disk serialization format that the Areev context database uses to persist every AI memory grain as a compact, encrypted binary blob.
Each .mg blob starts with a 9-byte header followed by a canonical MessagePack payload. The header encodes five fields in a fixed layout: version (u8, currently 1), flags (u8, 8-bit field for signing/encryption/compression and metadata flags), grain_type (u8, mapping to one of the 10 OMS grain types), ns_hash (u16 big-endian, a hash of the namespace for fast partition routing), and created_at_sec (u32 big-endian, Unix timestamp in seconds). The header is always unencrypted, allowing the autonomous memory engine to route and filter grains without decrypting the payload.
The MessagePack payload contains the grain’s content, metadata key-value pairs, subject identifier, and any embedding vector. When encryption is enabled, the payload is encrypted with AES-256-GCM using a per-memory data encryption key (DEK) that is wrapped by the platform’s managed key service. A single wrapped-DEK record is stored once per memory; per-user keys are HKDF-derived at runtime and never stored. The AI agent memory engine uses SHA-256 content addressing — the hash of the complete .mg blob (header + payload) serves as its content-addressed storage key, enabling deduplication and integrity verification.
How does the 9-byte header work?
The header packs five fields into exactly 9 bytes with no padding, using big-endian encoding for multi-byte fields.
The version byte (offset 0) identifies the .mg format version, allowing the context database to handle format migrations. The flags byte (offset 1) is an 8-bit field with the following layout:
| Bit | Mask | Meaning |
|---|---|---|
| 0 | 0x01 | COSE Sign1 signing enabled |
| 1 | 0x02 | AES-256-GCM encryption enabled |
| 2 | 0x04 | zstd compression enabled |
| 3 | 0x08 | content_refs present in payload |
| 4 | 0x10 | embedding_refs present in payload |
| 5 | 0x20 | AI-generated content flag |
| 6-7 | 0xC0 | Sensitivity level (0-3) |
The grain_type byte (offset 2) maps to one of the 10 OMS grain types: 0x01=belief, 0x02=event, 0x03=state, 0x04=workflow, 0x05=action, 0x06=observation, 0x07=goal, 0x08=reasoning, 0x09=consensus, 0x0A=consent. The ns_hash field (offsets 3-4) stores a 16-bit hash of the namespace string, used for fast namespace routing without parsing the full payload.
The created_at_sec field (offsets 5-8) stores the creation timestamp as a 32-bit big-endian Unix epoch in seconds. This provides coarse time ordering at the header level — the full millisecond-precision timestamp lives in the MessagePack payload as created_at (compacted to ca). The header’s fixed 9-byte size means the autonomous memory engine can read it with a single small I/O operation, making header-only scans (for filtering by grain type or namespace hash) efficient at scale.
Offset Size Field Encoding
0 1 version u8 (currently 1)
1 1 flags u8 bitfield (sign|encrypt|compress|refs|ai|sensitivity)
2 1 grain_type u8 (0x01-0x0A, 10 OMS types)
3 2 ns_hash u16 big-endian
5 4 created_at_sec u32 big-endian
9 ... payload canonical MessagePack (optionally encrypted)
How does content addressing work?
Every grain is keyed by the SHA-256 hash of its content, enabling deduplication, integrity verification, and deterministic storage addressing.
When the AI agent memory engine receives a grain for storage, it assembles the complete .mg blob (9-byte header + payload) and computes the SHA-256 hash of the entire blob. This hash becomes the grain’s content-addressed storage key. If a grain with the same content hash already exists, the context database detects the duplicate and returns the existing grain’s ID instead of creating a new entry. This content-addressed design ensures that identical AI memory content is stored exactly once, regardless of how many times it is written.
A fast in-memory check over the superseded set provides quick negative answers for probable-not-superseded checks — when it indicates a grain is not superseded, the engine skips the supersession-status lookup entirely. Integrity verification happens on every read: the engine recomputes the SHA-256 hash of the complete .mg blob (header + payload) and compares it to the storage key, detecting any corruption or tampering.
# The content hash appears in API responses
curl -s https://acme.areev.ai/api/memories/abc123 | python3 -c "
import sys, json
grain = json.load(sys.stdin)
print(f'Content hash: {grain[\"content_hash\"]}')
print(f'Grain type: {grain[\"grain_type\"]}')
"
How does encryption integrate with the .mg format?
The flags byte in the header indicates whether the MessagePack payload is encrypted, and the per-memory DEK is stored separately in the keys partition.
When the encrypt flag (bit 1, mask 0x02) is set, the context database encrypts the entire MessagePack payload using AES-256-GCM with the memory’s random 256-bit DEK and a 96-bit nonce. The encrypted ciphertext and 128-bit authentication tag replace the plaintext payload in the .mg blob. The per-memory DEK is wrapped (encrypted) by the platform’s managed key service and stored once per memory as a single wrapped-DEK record; per-user keys are HKDF-derived at runtime and never stored.
This envelope encryption design means the AI memory engine handles key rotation without re-encrypting grains — only the DEK wrapping changes. Crypto-erasure (GDPR Art. 17 compliance) destroys the memory’s wrapped DEK, rendering that memory’s grain ciphertext permanently unrecoverable without touching the grain blobs themselves. The autonomous memory engine verifies the GCM authentication tag on every read, detecting any tampering or corruption in the encrypted payload.
Related
- Architecture: System architecture overview
- Storage: Storage engine and index subsystems
- Encryption: Envelope encryption details
- Crypto-Erasure: DEK destruction for GDPR compliance