Quantra Docs — Tools

Tools

Tools are the processing nodes that sit in the middle of a pipeline. Each tool does one job — extracting text, hashing files, summarising content, detecting biometrics, and so on — and produces results that downstream tools or workbenches can use. This section describes the tools available in Quantra, what each one does, and what to fill in when configuring it.

About Artificial Intelligence (AI) helpers. Several tools in this section can use an AI helper to add Artificial Intelligence (AI) capabilities such as document classification or natural-language summarisation. An AI helper is a small node that you connect to a tool with a special edge type called a helper edge. The AI helpers themselves (for example, an OpenAI helper) are registered and credentialed by an administrator; once that is done, end users only need to draw the helper edge from the helper to the tool that needs it.

Hash

Calculates a similarity hash for each document and stores it alongside the document's metadata. Two documents that have very similar content end up with very similar hashes, so the hash makes it easy to spot near-duplicates — for example, two slightly different copies of the same letter.

What you'll need

An upstream datasource or tool that supplies the documents to hash.

Settings you provide

Setting	Required	Description
Display note	No	A label for this node on the canvas.
Minimum file size (megabytes)	No	Skip files smaller than this size. Defaults to 0 (hash every file).
Note	No	A free-text description for your own reference.

Common uses

Spotting duplicate or near-duplicate documents in a large collection.
Tagging documents so a downstream review process can group similar items together.

Summary

Reads the content of documents, images, and videos, extracts the text and basic technical information, and — if an AI helper is connected — produces a short, plain-language summary of each item.

What you'll need

An upstream datasource or tool that supplies the items to summarise.
(Optional) An AI helper connected by a helper edge if you want plain-language summaries or richer image descriptions.

Settings you provide

Setting	Required	Description
Display note	No	A label for this node on the canvas.
Only process recognised document types	No	Skip files in formats Summary does not understand. Defaults to on.
Extract Exchangeable Image File Format (EXIF) metadata	No	Pull camera, phone, and image metadata such as date taken and location from photographs and video files. Defaults to on.
Video sample rate (frames)	No	For video files, sample one frame for every N frames. Lower values mean more frames are inspected. Defaults to 60.
Enable Artificial Intelligence (AI) Vision	No	Send images to the connected AI helper to produce richer descriptions of their content. Defaults to off.
Note	No	A free-text description for your own reference.

Administrator-only setup. The optional AI helper used for summaries and AI Vision (for example, an OpenAI helper) is registered and credentialed by an administrator. Once the helper exists on the canvas, any user can draw a helper edge from it to a Summary node.

Common uses

Generating short summaries of long documents to help reviewers triage a backlog.
Pulling text and EXIF data out of mixed photographs, scans, and videos.

Biometrics

Biometrics is a category of two related tools that detect human biometric features in documents and images. They are used most often as part of a Personally Identifiable Information (PII) workflow, where a face or fingerprint visible in a document needs to be flagged so it can be redacted.

Face Biometrics

Counts the human faces visible in pictures and videos. It can also extract a small amount of facial information so that downstream tools and workbenches can flag the face for redaction.

Settings you provide

Setting	Required	Description
Display note	No	A label for this node on the canvas.
Minimum confidence	No	How sure the detector has to be (between 0 and 1) before it counts something as a face. Defaults to 0.6.
Also analyse videos	No	Look for faces in video files as well as still images. Defaults to off.
Note	No	A free-text description for your own reference.

Fingerprint Detection

Detects images of fingerprints in scanned documents and flags them as PII so they can be reviewed and redacted.

Settings you provide

Setting	Required	Description
Enable fingerprint detection	No	Turn detection on or off. Defaults to on.
Confidence threshold	No	How sure the detector has to be (between 0.3 and 0.9) before it flags an image as containing a fingerprint. Defaults to 0.6.
Extract minutiae points	No	Advanced: pull out the small ridge endings and bifurcations of the fingerprint. Defaults to off.
Detection timeout (seconds)	No	The most time the detector will spend on a single image. Defaults to 30.
Maximum image size (pixels)	No	Skip images larger than this size to keep processing time predictable. Defaults to 40,000,000.

Common uses

Flagging photographs of people in documents that will be released externally.
Spotting fingerprint images in scanned identity documents so they can be redacted.

NHS Summary

Looks at documents drawn from National Health Service (NHS) records and pulls out the information that is most useful for clinical and administrative review: the type of document (for example, letter, lab result, discharge summary), the dates it covers, any NHS numbers and Medical Record Numbers (MRN) it contains, and a short plain-language summary of its content.

What you'll need

An upstream datasource or tool that supplies the NHS documents to read.
An AI helper connected by a helper edge. The NHS Summary tool relies on Artificial Intelligence (AI) to produce its output and will not run without one.

Settings you provide

Setting	Required	Description
Display note	No	A label for this node on the canvas.
Classify document type	No	Identify whether each document is a letter, lab result, discharge summary, and so on. Defaults to on.
Generate summary	No	Produce a short plain-language summary of each document. Defaults to on.
Extract NHS numbers	No	Find and validate any NHS numbers in the document. Defaults to on.
Extract Medical Record Numbers (MRN)	No	Find any local hospital identifiers and patient record numbers. Defaults to on.

Administrator-only setup. The AI helper required by NHS Summary (for example, an OpenAI helper) must be registered and credentialed by an administrator. The list of recognised document types and identifier formats is also configured at the platform level.

Common uses

Triaging a backlog of clinical letters by type and topic.
Building a quick index of patient records by NHS number and MRN.

HWT-OCR

Handwritten Text Optical Character Recognition (HWT-OCR) reads the text on a page and turns it into machine-readable words, including the position of each word on the page. It can read both printed text and handwritten text. The position information is what later tools use to draw redaction boxes accurately on the original document.

What you'll need

An upstream datasource or tool that supplies the documents and images to read.
For the cloud reading mode (Amazon Textract), the Amazon Web Services (AWS) credentials must be configured by an administrator — see the administrator note below.

Settings you provide

Setting	Required	Description
Display note	No	A label for this node on the canvas.
Reading mode	Yes	Pick the reading engine: Textract (cloud, best for printed text and tables), PaddleOCR (local, best for printed text in many languages), or TrOCR (local, best for handwriting).
Detect tables	Only with Textract	Also extract the contents of tables, with cell positions. Defaults to off.
Detect forms	Only with Textract	Also extract form fields as label-and-value pairs. Defaults to off.
Primary language	Only with PaddleOCR	The main language of the documents you are reading. Pick from the supported language list. Defaults to English.

Administrator-only setup. The Textract reading mode requires AWS credentials configured by an administrator. The local reading modes (PaddleOCR and TrOCR) need no credentials, but a Graphics Processing Unit (GPU) is recommended for reasonable speed and is provisioned by the administrator who deploys the platform.

Common uses

Turning scanned PDFs into searchable text for downstream summarisation, classification, or PII detection.
Reading handwritten notes on forms or annotated documents.
Producing word-level positions that feed Q-DACT and other redaction workbenches.

Meta to DB

Takes basic information about each document — filename, size, dates, and so on — and writes it into a database table. This makes a pipeline's worth of documents searchable and reportable from any other tool that can read the database.

What you'll need

An upstream datasource or tool that supplies the documents whose metadata you want to record.
A downstream database datasource (for example, Microsoft SQL Server, PostgreSQL, or Snowflake) connected by an edge. The metadata is written to that database.

Settings you provide

Setting	Required	Description
Display note	No	A label for this node on the canvas.
Note	No	A free-text description for your own reference.
Metadata mappings	Yes	One row per piece of metadata you want to record. For each row, pick the metadata field on the left (for example, filename, size, created date) and type the column name to write it to on the right. Click Add mapping to add another row.

Administrator-only setup. The database itself is configured separately as a datasource, which usually relies on administrator setup — pre-registered approved servers and shared credentials. See the Datasources section for details.

Common uses

Building a searchable inventory of every document a pipeline has processed.
Feeding a reporting dashboard with up-to-date document statistics.

Review

Opens an interactive viewer (the Q-DACT Redaction Viewer) where a person can look at every redaction the pipeline has proposed for a document, accept or reject each one, draw new ones by hand, and approve the document for release. Q-DACT works on Portable Document Format (PDF) files, images, and videos — including time-based redactions on video.

What you'll need

An upstream tool or workbench that supplies the documents and the proposed redactions to review (typically PII Detect, Biometrics, or HWT-OCR followed by a redaction-suggesting tool).

Settings you provide

Setting	Required	Description
Display note	No	A label for this node on the canvas.

The Review tool has no other configuration in the canvas. Everything else is done interactively when a reviewer opens the viewer.

Common uses

Letting a person check and adjust every redaction before a document is released.
Reviewing video footage to redact people, screens, or other sensitive frames.

SAR Release

Applies the approved redactions to each document, packages everything into a single Subject Access Request (SAR) release, and uploads it to a chosen destination. This is the tool that produces the final deliverable at the end of a SAR workflow.

What you'll need

An upstream Review (Q-DACT) node so that redactions have been reviewed and approved.
A downstream datasource node (for example, a SharePoint folder or a Network Drive) connected by an edge. The release is uploaded there.

Settings you provide

Setting	Required	Description
Display note	No	A label for this node on the canvas.
Redaction style	No	The fill colour for the redaction boxes that get burned into the released documents: Black, White, or Grey. Defaults to Black.
Include manifest file	No	Add a small manifest file listing every document in the release together with redaction counts. Defaults to on.
Include audit log	No	Add an audit log file recording exactly which redactions were applied to which document. Defaults to on.

Administrator-only setup. The destination where the release is uploaded is configured as a datasource, which usually relies on administrator setup — pre-registered servers, shared credentials, and so on. See the Datasources section for details.

Common uses

Producing the final redacted package at the end of a Subject Access Request workflow.
Generating tamper-evident release packages with manifest and audit log for compliance.

PII Detect

Scans documents for Personally Identifiable Information (PII) — names, addresses, contact details, government identifiers, and so on — and records each piece of PII it finds together with where it appeared. The output feeds review and redaction tools further down the pipeline.

What you'll need

An upstream HWT-OCR node (or another text-extraction tool) so that PII Detect has text to scan.
(Optional) An AI helper connected by a helper edge if you want PII Detect to use AI for the harder, more ambiguous cases.

Settings you provide

Setting	Required	Description
Display note	No	A label for this node on the canvas.
Minimum confidence	No	How sure the detector has to be (between 0 and 1) before it records something as PII. Defaults to 0.5.
Context window (characters)	No	How many characters of surrounding text to keep with each detection so that reviewers can see it in context. Defaults to 300.
Maximum context samples	No	The most context samples the AI will look at per piece of PII when classifying. Defaults to 5.
Enable AI classification	No	Send ambiguous detections to the connected AI helper for a second opinion. Defaults to on.
PII types to detect	Yes	Tick which categories of PII you want the tool to look for. Includes general categories (names, addresses, phone numbers, email addresses, payment card numbers) and country-specific identifiers (NHS numbers, National Insurance Numbers (NINO), passport numbers, tax identifiers, and others).
Note	No	A free-text description for your own reference.

Administrator-only setup. The optional AI helper used for harder PII cases (for example, an OpenAI or Anthropic helper) is registered and credentialed by an administrator. The local pattern-based detection always works without an AI helper.

Common uses

Finding and flagging personal data across a document set ahead of redaction.
Producing the input to a Review (Q-DACT) and SAR Release pipeline.
Building a register of which documents contain which kinds of PII.

Setting	Required	Description
Display note	No	A label for this node on the canvas.
Output mode	Yes	Either Archive (one packaged file) or Folder (the files left as separate items).
Archive format	Required when output mode is Archive	The format of the archive: `ZIP`, `TAR`, `TAR.GZ`, or `7-Zip`.
Naming pattern	No	The name of the archive file. You can include placeholders such as `{release_code}` and `{timestamp}` in the pattern. Defaults to `SAR_{release_code}_{timestamp}`.
Include manifest file	No	Adds a small manifest file to the archive listing every item it contains together with file sizes and hashes. Defaults to on.
Note	No	A free-text description for your own reference.

Tools

Archive

What you'll need

Settings you provide

Common uses

Hash

What you'll need

Settings you provide

Common uses

Summary

What you'll need

Settings you provide

Common uses

Biometrics

Face Biometrics

Settings you provide

Fingerprint Detection

Settings you provide

Common uses

NHS Summary

What you'll need

Settings you provide

Common uses

HWT-OCR

What you'll need

Settings you provide

Common uses

Meta to DB

What you'll need

Settings you provide

Common uses

Review

What you'll need

Settings you provide

Common uses

SAR Release

What you'll need

Settings you provide

Common uses

PII Detect

What you'll need

Settings you provide

Common uses