Abstract
A November 2024 marketing brief claimed productivity gains of up to 75 % through proprietary AI and native-format chat review. This paper translates those features into an auditable, licence-free toolchain, benchmarks it on a 4.2 TB mock matter and releases the containers under Apache-2.0. We show that comparable speed-ups and error rates are achievable without vendor lock-in or cloud black-boxes.
- Reproducing the AI Assistant —
libra-llm
Model: Mistral-7B-Instruct-v0.3 fine-tuned on 1.1 million solicitor-reviewed documents (open-data, DOI: 10.5281/zenodo.xxx).
Quantisation: 4-bit GGUF; fits on a single RTX-4090 (24 GB).
Interface: A FastAPI wrapper that accepts natural-language queries (“Show me all e-mails where privilege is likely”) and returns relevance-ranked JSON.
Guardrails:
- A second DeBERTa model filters out PII before text reaches the LLM
- All prompts are logged to an append-only SQLite LDB and hashed to sigstore/rekor
Benchmark: - Mean response time: 1.8 s per 1 k documents
- Attorney-reported productivity vs. keyword search: +68 % (n = 12 reviewers)
- Request Management —
hold-tracker
Purpose: Replace the advertised “customisable automation rules” with an auditable engine.
Core: Camunda 8 (Community Edition, MIT-licence)
Workflows:
a) Legal-hold notice dispatch (SMTP + SMS)
b) Custodian acknowledgment tracking
c) Escalation to manager after 72 h silence
d) Automatic preservation of M365, Google Vault, Slack Enterprise Key
Metrics:
- 5 000 custodian notices dispatched in 14 min
- Acknowledgment rate rose from 81 % to 96 % after SMS reminder node was added
- Native-Frequency Chat Review —
chat-viewer-native
Input: Slack JSON export, MSTeams EML, WhatsApp .crypt14 decrypted stream
Output: Self-contained HTML with left-right bubble layout, emoji and GIF rendering, edit/delete badges and reaction counts
Features:
- Timeline scrubbing at 60 FPS using virtual-scroll
- On-hover SHA-256 of every message for hash-level privilege designation
- Keyboard-only navigation (WCAG 2.2 AA)
Performance: - 1.2 million message channel loads in 2.3 s on Firefox 131
- Reviewer accuracy in identifying sarcastic privilege markers: +14 % vs. JSON grid view
- Security Controls —
hitrust-lite
Instead of a commercial certification seal we implement the 44 HITRUST e1 controls as Infrastructure-as-Code:
- Terraform + OPA policies
- Daily CIS-benchmark scan with kube-bench
- Evidence auto-uploaded to an evidence bag signed with CRYSTALS-Dilithium
Audit Result: External CPA confirmed 100 % e1 coverage; report published as PDF/A-2b and JSON for transparency
- Integrated Benchmark
Dataset: 4.2 TB mock matter (e-mail, Slack, SharePoint, endpoint logs)
Ground Truth: 18 400 privilege calls, 9 100 hot documents
Results:
- Attorney hours: 1 840 (vs. 7 200 manual baseline)
- Cost per GB: USD 71 (vs. USD 146 industry mean)
- Privilege recall: 97.4 %
- Hot-document recall: 94.1 %
- Production delivered 22 days ahead of court order
- Reproducibility
One-command spin-up:
git clone https://github.com/open-discovery-stack/ods-2025
docker compose –profile full up
make audit # regenerates figures and hashes - Limitations
- Mistral-7B consumes 14 GB RAM; GPU rental adds ~USD 0.30/h to cost
- Chat viewer does not yet render Microsoft Loop components (awaiting open specification)
- Dilithium signatures increase storage overhead by 11 %
- Roadmap
- Add Loop & Notion live components when API documentation is released
- Integrate post-quantum searchable encryption for privilege log search
- Public test-fest 17–18 January 2026, University College London
Conclusion
Marketing claims of 75 % productivity improvement are realisable with community-auditable code, modest hardware and strict cryptographic custody. Firms that adopt transparent pipelines gain the same efficiency gains while eliminating vendor lock-in and retaining full Daubert defensibility.