Quantra Documentation

Tools

Tools are the processing nodes that sit in the middle of a pipeline. Each tool does one job — extracting text, hashing files, summarising content, detecting biometrics, and so on — and produces results that downstream tools or workbenches can use. This section describes the tools available in Quantra, what each one does, and what to fill in when configuring it.

About Artificial Intelligence (AI) helpers. Several tools in this section can use an AI helper to add Artificial Intelligence (AI) capabilities such as document classification or natural-language summarisation. An AI helper is a small node that you connect to a tool with a special edge type called a helper edge. The AI helpers themselves (for example, an OpenAI helper) are registered and credentialed by an administrator; once that is done, end users only need to draw the helper edge from the helper to the tool that needs it.

Archive

Packages a set of files into a single archive (or collection of archives) that you can hand off to another system or download. Useful at the end of a pipeline when you want to deliver a tidy bundle of processed documents.

What you'll need

  • An upstream node that produces files for the archive (a datasource, a redaction tool, and so on).

Settings you provide

SettingRequiredDescription
Display noteNoA label for this node on the canvas.
Output modeYesEither Archive (one packaged file) or Folder (the files left as separate items).
Archive formatRequired when output mode is ArchiveThe format of the archive: ZIP, TAR, TAR.GZ, or 7-Zip.
Naming patternNoThe name of the archive file. You can include placeholders such as {release_code} and {timestamp} in the pattern. Defaults to SAR_{release_code}_{timestamp}.
Include manifest fileNoAdds a small manifest file to the archive listing every item it contains together with file sizes and hashes. Defaults to on.
NoteNoA free-text description for your own reference.

Common uses

  • Bundling redacted documents into a single download for delivery.
  • Producing a deliverable archive at the end of a Subject Access Request (SAR) pipeline.
  • Packaging a folder's worth of pipeline output for hand-off to another system.

Hash

Calculates a similarity hash for each document and stores it alongside the document's metadata. Two documents that have very similar content end up with very similar hashes, so the hash makes it easy to spot near-duplicates — for example, two slightly different copies of the same letter.

What you'll need

  • An upstream datasource or tool that supplies the documents to hash.

Settings you provide

SettingRequiredDescription
Display noteNoA label for this node on the canvas.
Minimum file size (megabytes)NoSkip files smaller than this size. Defaults to 0 (hash every file).
NoteNoA free-text description for your own reference.

Common uses

  • Spotting duplicate or near-duplicate documents in a large collection.
  • Tagging documents so a downstream review process can group similar items together.

Summary

Reads the content of documents, images, and videos, extracts the text and basic technical information, and — if an AI helper is connected — produces a short, plain-language summary of each item.

What you'll need

  • An upstream datasource or tool that supplies the items to summarise.
  • (Optional) An AI helper connected by a helper edge if you want plain-language summaries or richer image descriptions.

Settings you provide

SettingRequiredDescription
Display noteNoA label for this node on the canvas.
Only process recognised document typesNoSkip files in formats Summary does not understand. Defaults to on.
Extract Exchangeable Image File Format (EXIF) metadataNoPull camera, phone, and image metadata such as date taken and location from photographs and video files. Defaults to on.
Video sample rate (frames)NoFor video files, sample one frame for every N frames. Lower values mean more frames are inspected. Defaults to 60.
Enable Artificial Intelligence (AI) VisionNoSend images to the connected AI helper to produce richer descriptions of their content. Defaults to off.
NoteNoA free-text description for your own reference.
Administrator-only setup. The optional AI helper used for summaries and AI Vision (for example, an OpenAI helper) is registered and credentialed by an administrator. Once the helper exists on the canvas, any user can draw a helper edge from it to a Summary node.

Common uses

  • Generating short summaries of long documents to help reviewers triage a backlog.
  • Pulling text and EXIF data out of mixed photographs, scans, and videos.

Biometrics

Biometrics is a category of two related tools that detect human biometric features in documents and images. They are used most often as part of a Personally Identifiable Information (PII) workflow, where a face or fingerprint visible in a document needs to be flagged so it can be redacted.

Face Biometrics

Counts the human faces visible in pictures and videos. It can also extract a small amount of facial information so that downstream tools and workbenches can flag the face for redaction.

Settings you provide
SettingRequiredDescription
Display noteNoA label for this node on the canvas.
Minimum confidenceNoHow sure the detector has to be (between 0 and 1) before it counts something as a face. Defaults to 0.6.
Also analyse videosNoLook for faces in video files as well as still images. Defaults to off.
NoteNoA free-text description for your own reference.

Fingerprint Detection

Detects images of fingerprints in scanned documents and flags them as PII so they can be reviewed and redacted.

Settings you provide
SettingRequiredDescription
Enable fingerprint detectionNoTurn detection on or off. Defaults to on.
Confidence thresholdNoHow sure the detector has to be (between 0.3 and 0.9) before it flags an image as containing a fingerprint. Defaults to 0.6.
Extract minutiae pointsNoAdvanced: pull out the small ridge endings and bifurcations of the fingerprint. Defaults to off.
Detection timeout (seconds)NoThe most time the detector will spend on a single image. Defaults to 30.
Maximum image size (pixels)NoSkip images larger than this size to keep processing time predictable. Defaults to 40,000,000.

Common uses

  • Flagging photographs of people in documents that will be released externally.
  • Spotting fingerprint images in scanned identity documents so they can be redacted.

NHS Summary

Looks at documents drawn from National Health Service (NHS) records and pulls out the information that is most useful for clinical and administrative review: the type of document (for example, letter, lab result, discharge summary), the dates it covers, any NHS numbers and Medical Record Numbers (MRN) it contains, and a short plain-language summary of its content.

What you'll need

  • An upstream datasource or tool that supplies the NHS documents to read.
  • An AI helper connected by a helper edge. The NHS Summary tool relies on Artificial Intelligence (AI) to produce its output and will not run without one.

Settings you provide

SettingRequiredDescription
Display noteNoA label for this node on the canvas.
Classify document typeNoIdentify whether each document is a letter, lab result, discharge summary, and so on. Defaults to on.
Generate summaryNoProduce a short plain-language summary of each document. Defaults to on.
Extract NHS numbersNoFind and validate any NHS numbers in the document. Defaults to on.
Extract Medical Record Numbers (MRN)NoFind any local hospital identifiers and patient record numbers. Defaults to on.
Administrator-only setup. The AI helper required by NHS Summary (for example, an OpenAI helper) must be registered and credentialed by an administrator. The list of recognised document types and identifier formats is also configured at the platform level.

Common uses

  • Triaging a backlog of clinical letters by type and topic.
  • Building a quick index of patient records by NHS number and MRN.

HWT-OCR

Handwritten Text Optical Character Recognition (HWT-OCR) reads the text on a page and turns it into machine-readable words, including the position of each word on the page. It can read both printed text and handwritten text. The position information is what later tools use to draw redaction boxes accurately on the original document.

What you'll need

  • An upstream datasource or tool that supplies the documents and images to read.
  • For the cloud reading mode (Amazon Textract), the Amazon Web Services (AWS) credentials must be configured by an administrator — see the administrator note below.

Settings you provide

SettingRequiredDescription
Display noteNoA label for this node on the canvas.
Reading modeYesPick the reading engine: Textract (cloud, best for printed text and tables), PaddleOCR (local, best for printed text in many languages), or TrOCR (local, best for handwriting).
Detect tablesOnly with TextractAlso extract the contents of tables, with cell positions. Defaults to off.
Detect formsOnly with TextractAlso extract form fields as label-and-value pairs. Defaults to off.
Primary languageOnly with PaddleOCRThe main language of the documents you are reading. Pick from the supported language list. Defaults to English.
Administrator-only setup. The Textract reading mode requires AWS credentials configured by an administrator. The local reading modes (PaddleOCR and TrOCR) need no credentials, but a Graphics Processing Unit (GPU) is recommended for reasonable speed and is provisioned by the administrator who deploys the platform.

Common uses

  • Turning scanned PDFs into searchable text for downstream summarisation, classification, or PII detection.
  • Reading handwritten notes on forms or annotated documents.
  • Producing word-level positions that feed Q-DACT and other redaction workbenches.

Meta to DB

Takes basic information about each document — filename, size, dates, and so on — and writes it into a database table. This makes a pipeline's worth of documents searchable and reportable from any other tool that can read the database.

What you'll need

  • An upstream datasource or tool that supplies the documents whose metadata you want to record.
  • A downstream database datasource (for example, Microsoft SQL Server, PostgreSQL, or Snowflake) connected by an edge. The metadata is written to that database.

Settings you provide

SettingRequiredDescription
Display noteNoA label for this node on the canvas.
NoteNoA free-text description for your own reference.
Metadata mappingsYesOne row per piece of metadata you want to record. For each row, pick the metadata field on the left (for example, filename, size, created date) and type the column name to write it to on the right. Click Add mapping to add another row.
Administrator-only setup. The database itself is configured separately as a datasource, which usually relies on administrator setup — pre-registered approved servers and shared credentials. See the Datasources section for details.

Common uses

  • Building a searchable inventory of every document a pipeline has processed.
  • Feeding a reporting dashboard with up-to-date document statistics.

Review

Opens an interactive viewer (the Q-DACT Redaction Viewer) where a person can look at every redaction the pipeline has proposed for a document, accept or reject each one, draw new ones by hand, and approve the document for release. Q-DACT works on Portable Document Format (PDF) files, images, and videos — including time-based redactions on video.

What you'll need

  • An upstream tool or workbench that supplies the documents and the proposed redactions to review (typically PII Detect, Biometrics, or HWT-OCR followed by a redaction-suggesting tool).

Settings you provide

SettingRequiredDescription
Display noteNoA label for this node on the canvas.

The Review tool has no other configuration in the canvas. Everything else is done interactively when a reviewer opens the viewer.

Common uses

  • Letting a person check and adjust every redaction before a document is released.
  • Reviewing video footage to redact people, screens, or other sensitive frames.

SAR Release

Applies the approved redactions to each document, packages everything into a single Subject Access Request (SAR) release, and uploads it to a chosen destination. This is the tool that produces the final deliverable at the end of a SAR workflow.

What you'll need

  • An upstream Review (Q-DACT) node so that redactions have been reviewed and approved.
  • A downstream datasource node (for example, a SharePoint folder or a Network Drive) connected by an edge. The release is uploaded there.

Settings you provide

SettingRequiredDescription
Display noteNoA label for this node on the canvas.
Redaction styleNoThe fill colour for the redaction boxes that get burned into the released documents: Black, White, or Grey. Defaults to Black.
Include manifest fileNoAdd a small manifest file listing every document in the release together with redaction counts. Defaults to on.
Include audit logNoAdd an audit log file recording exactly which redactions were applied to which document. Defaults to on.
Administrator-only setup. The destination where the release is uploaded is configured as a datasource, which usually relies on administrator setup — pre-registered servers, shared credentials, and so on. See the Datasources section for details.

Common uses

  • Producing the final redacted package at the end of a Subject Access Request workflow.
  • Generating tamper-evident release packages with manifest and audit log for compliance.

PII Detect

Scans documents for Personally Identifiable Information (PII) — names, addresses, contact details, government identifiers, and so on — and records each piece of PII it finds together with where it appeared. The output feeds review and redaction tools further down the pipeline.

What you'll need

  • An upstream HWT-OCR node (or another text-extraction tool) so that PII Detect has text to scan.
  • (Optional) An AI helper connected by a helper edge if you want PII Detect to use AI for the harder, more ambiguous cases.

Settings you provide

SettingRequiredDescription
Display noteNoA label for this node on the canvas.
Minimum confidenceNoHow sure the detector has to be (between 0 and 1) before it records something as PII. Defaults to 0.5.
Context window (characters)NoHow many characters of surrounding text to keep with each detection so that reviewers can see it in context. Defaults to 300.
Maximum context samplesNoThe most context samples the AI will look at per piece of PII when classifying. Defaults to 5.
Enable AI classificationNoSend ambiguous detections to the connected AI helper for a second opinion. Defaults to on.
PII types to detectYesTick which categories of PII you want the tool to look for. Includes general categories (names, addresses, phone numbers, email addresses, payment card numbers) and country-specific identifiers (NHS numbers, National Insurance Numbers (NINO), passport numbers, tax identifiers, and others).
NoteNoA free-text description for your own reference.
Administrator-only setup. The optional AI helper used for harder PII cases (for example, an OpenAI or Anthropic helper) is registered and credentialed by an administrator. The local pattern-based detection always works without an AI helper.

Common uses

  • Finding and flagging personal data across a document set ahead of redaction.
  • Producing the input to a Review (Q-DACT) and SAR Release pipeline.
  • Building a register of which documents contain which kinds of PII.