Quantra Documentation

Architecture Overview

Quantra is built as a layered, modular platform that separates concerns across distinct tiers. The browser-based frontend communicates with a Django application server, which delegates pipeline orchestration to the CNO (Central Node Orchestrator). The CNO coordinates work across a fleet of gRPC-based microservices, each responsible for a specific processing capability. All processed data flows into and out of a configurable data layer.

System Architecture Diagram

 +-------------------------------------------------------------+
 |                        BROWSER LAYER                         |
 |  HTML5 Canvas  |  JavaScript UI  |  WebSocket / Fetch API    |
 +------------------------------+------------------------------+
                                |
                          HTTPS (port 4443)
                                |
 +------------------------------v------------------------------+
 |                     DJANGO APPLICATION                      |
 |                                                              |
 |  REST API (111 routes)  |  Auth / MFA  |  Project Manager    |
 |  Plugin Registry        |  Graph Engine |  Audit Logger      |
 |  Static Files           |  Admin Views  |  User Management   |
 +------------------------------+------------------------------+
                                |
                    HTTP/NDJSON Streaming (port 8443)
                                |
 +------------------------------v------------------------------+
 |                  CNO  (Central Node Orchestrator)             |
 |                                                              |
 |  Topological Sort  |  Node Scheduler  |  Progress Streamer   |
 |  Edge Resolver     |  Error Handler   |  Result Aggregator   |
 +--------+-------------------+--------------------+-----------+
          |                   |                    |
     gRPC + mTLS         gRPC + mTLS          gRPC + mTLS
          |                   |                    |
 +--------v------+  +---------v-----+  +-----------v-----------+
 |  Datasource   |  |    Tool       |  |     Workbench         |
 |  Microservices|  |  Microservices|  |     Microservices      |
 |               |  |               |  |                       |
 |  Windows Share|  |  OCR Tool     |  |  SAR Workbench        |
 |  MSSQL       |  |  PII Tool     |  |  Review Workbench     |
 |  Snowflake   |  |  Summary Tool |  |                       |
 |  SharePoint  |  |  Hash Tool    |  |                       |
 |  Outlook     |  |  Redactor     |  |                       |
 |  Box.com     |  |  ...          |  |                       |
 +---------+-----+  +-------+------+  +-----------+-----------+
           |                |                      |
           +----------------+----------------------+
                            |
              +-------------v--------------+
              |        DATA LAYER          |
              |                            |
              |  SQLite / PostgreSQL / MSSQL|
              |  File System Storage       |
              |  Snowflake Warehouse       |
              |  External APIs             |
              +----------------------------+

Component Overview

The following table summarises every major component in the Quantra architecture, its responsibility, and where it runs.

Component Responsibility Technology Port / Location
Django Application REST API, authentication, project management, plugin registry, audit logging, admin interface Python / Django 4443 (HTTPS)
CNO (Central Node Orchestrator) Pipeline execution, topological sorting of nodes, progress streaming, node scheduling Python 8443 (HTTP/NDJSON)
Datasource Microservices Ingesting data from external systems (file shares, databases, cloud storage, email) Python / gRPC 50051 - 50059
Tool Microservices Processing operations (OCR, PII detection, hashing, summarisation, redaction) Python / gRPC 50064 - 50090
Workbench Microservices Interactive human-in-the-loop review, annotation, approval workflows Python / gRPC Varies
Browser Frontend Visual pipeline designer (HTML5 Canvas), project UI, workbench interfaces JavaScript / HTML5 Client-side
Data Layer Persistent storage for projects, graphs, results, user data, audit logs SQLite / PostgreSQL / MSSQL Configurable

Communication Protocols

gRPC with Protocol Buffers

All communication between the CNO and individual microservices uses gRPC, Google's high-performance remote procedure call framework. Each microservice defines its interface using Protocol Buffer (.proto) files located in the shared grpc/ directory. This approach provides several advantages:

  • Strong typing — Protocol Buffers enforce a strict schema for all messages, catching integration errors at compile time rather than runtime.
  • Streaming support — gRPC natively supports server-side, client-side, and bidirectional streaming, enabling microservices to send progress updates and large result sets incrementally.
  • Mutual TLS (mTLS) — Every gRPC channel is secured with mutual TLS authentication. Both the client (CNO) and server (microservice) present certificates and verify each other's identity. Certificates are stored in the /ms/certs/ directory and referenced in settings.py.
  • Efficient serialisation — Protocol Buffers use a compact binary format that is significantly smaller and faster to parse than JSON or XML.

HTTP/NDJSON Streaming

The Django application communicates with the CNO over HTTP using Newline-Delimited JSON (NDJSON) streaming on port 8443. When a user executes a pipeline, Django opens a streaming HTTP connection to the CNO and receives a continuous stream of progress events. Each line in the stream is a complete JSON object representing an event such as:

  • Node execution started
  • Node progress percentage update
  • Node execution completed with results
  • Node execution failed with error details
  • Overall pipeline completion status

These events are forwarded to the browser in real time, allowing users to see live progress as their pipeline executes.

REST API

The Django application exposes 111 REST API routes that power all browser interactions. These routes handle project CRUD operations, user authentication, graph management, plugin queries, service endpoint administration, audit log retrieval, and more. The API follows standard REST conventions with JSON request and response bodies, and all endpoints require authentication via session cookies or tokens. Detailed route documentation is available in the API Reference appendix.

Data Flow Through Pipelines

Understanding how data moves through Quantra is essential for both users and developers. The following describes the complete lifecycle of a pipeline execution from design to result delivery.

Step 1: Pipeline Design

The user opens the visual canvas editor in the browser and constructs a pipeline by dragging datasource, tool, and workbench nodes onto the canvas. They connect nodes with edges to define data flow. Each node is configured with parameters specific to its type (e.g., a Windows Share datasource node is configured with a server address, share name, and credentials). The entire graph structure — nodes, edges, positions, and configurations — is stored as a JSON document.

Step 2: Graph Serialisation

When the user clicks "Run", the browser serialises the complete graph (nodes, edges, and all configuration) and submits it to the Django REST API via an HTTP POST request. Django validates the graph structure, checks user permissions, and creates a new execution record in the database.

Step 3: CNO Submission

Django forwards the validated graph to the CNO over the HTTP/NDJSON streaming connection on port 8443. The CNO receives the full graph definition and begins orchestration.

Step 4: Topological Sort

The CNO performs a topological sort on the directed acyclic graph (DAG) to determine the correct execution order. Nodes with no incoming edges (typically datasources) execute first. Downstream nodes execute only after all their upstream dependencies have completed. This ensures data flows correctly through the pipeline without race conditions.

Step 5: Node Execution

For each node in topological order, the CNO resolves the corresponding microservice endpoint from the service registry, establishes a gRPC connection with mTLS, and invokes the appropriate RPC method with the node's configuration and any upstream data. Each microservice processes its input and returns results. The CNO streams progress events back to Django throughout this process.

Step 6: Result Delivery

As each node completes, its results are stored and made available to downstream nodes. When the entire pipeline finishes, the CNO sends a final completion event. Django updates the execution record in the database, and the browser displays the final results to the user. Results can include processed documents, extracted data, generated reports, or flagged items requiring human review in a workbench.

Plugin Architecture

Quantra's functionality is extended through a plugin system. Plugins are self-contained packages that add new capabilities to the platform without modifying core code. There are three types of plugins:

Plugin Type Purpose Examples
Datasources Ingest data from external systems into the pipeline Windows Share, MSSQL, Snowflake, SharePoint, Outlook, Box.com
Tools Process, transform, analyse, or enrich documents and data OCR, PII Detection, Summarisation, Hashing, Redaction
Workbenches Provide interactive human-in-the-loop review and decision interfaces SAR Workbench, Document Review Workbench

Plugin Manifest (plugin.json)

Every plugin includes a plugin.json manifest file that describes the plugin to the platform. This file contains metadata such as the plugin's name, version, type, description, author, configuration schema, and the microservice endpoint it communicates with. The platform reads these manifests at startup to populate the node palette in the canvas editor.

{
    "name": "ocr_tool",
    "display_name": "OCR Tool",
    "type": "tool",
    "version": "1.4.2",
    "description": "Optical Character Recognition with multiple backend engines",
    "author": "Quantra",
    "endpoint": "ocr_service",
    "port": 50064,
    "config_schema": {
        "backend": {
            "type": "select",
            "options": ["textract", "paddleocr", "trocr"],
            "default": "paddleocr"
        },
        "language": {
            "type": "string",
            "default": "en"
        }
    }
}

Dynamic Discovery

Plugins are discovered dynamically from the /plugins/ directory at application startup. The platform scans subdirectories, reads each plugin.json manifest, validates the schema, and registers the plugin in the internal registry. This means new plugins can be added simply by placing them in the plugins directory and restarting the platform — no code changes to the core application are required.

Technology Stack

Layer Technology Purpose
Web Framework Django (Python) REST API, authentication, ORM, admin interface, template rendering
Language Python 3.10+ All server-side code including microservices
RPC Framework gRPC High-performance communication between CNO and microservices
Serialisation Protocol Buffers Strongly-typed message definitions for gRPC interfaces
Security Mutual TLS (mTLS) Certificate-based mutual authentication for all gRPC channels
Database (Dev) SQLite Lightweight development and testing database
Database (Prod) PostgreSQL / MSSQL Production-grade relational database
Data Warehouse Snowflake Cloud data warehouse integration for analytics workloads
Frontend JavaScript / HTML5 Canvas Visual pipeline designer, interactive workbench UIs
Streaming HTTP/NDJSON Real-time progress streaming from CNO to Django
Note: The architecture is designed to be horizontally scalable. Each microservice runs as an independent process and can be deployed on separate machines or containers. The CNO manages service discovery through the endpoint registry stored in the Django database.