Tools
Tools are the processing nodes that sit in the middle of a pipeline. Each tool does one job — extracting text, hashing files, summarising content, detecting biometrics, and so on — and produces results that downstream tools or workbenches can use. This section describes the tools available in Quantra, what each one does, and what to fill in when configuring it.
Archive
Packages a set of files into a single archive (or collection of archives) that you can hand off to another system or download. Useful at the end of a pipeline when you want to deliver a tidy bundle of processed documents.
What you'll need
- An upstream node that produces files for the archive (a datasource, a redaction tool, and so on).
Settings you provide
| Setting | Required | Description |
|---|---|---|
| Display note | No | A label for this node on the canvas. |
| Output mode | Yes | Either Archive (one packaged file) or Folder (the files left as separate items). |
| Archive format | Required when output mode is Archive | The format of the archive: ZIP, TAR, TAR.GZ, or 7-Zip. |
| Naming pattern | No | The name of the archive file. You can include placeholders such as {release_code} and {timestamp} in the pattern. Defaults to SAR_{release_code}_{timestamp}. |
| Include manifest file | No | Adds a small manifest file to the archive listing every item it contains together with file sizes and hashes. Defaults to on. |
| Note | No | A free-text description for your own reference. |
Common uses
- Bundling redacted documents into a single download for delivery.
- Producing a deliverable archive at the end of a Subject Access Request (SAR) pipeline.
- Packaging a folder's worth of pipeline output for hand-off to another system.
Hash
Calculates a similarity hash for each document and stores it alongside the document's metadata. Two documents that have very similar content end up with very similar hashes, so the hash makes it easy to spot near-duplicates — for example, two slightly different copies of the same letter.
What you'll need
- An upstream datasource or tool that supplies the documents to hash.
Settings you provide
| Setting | Required | Description |
|---|---|---|
| Display note | No | A label for this node on the canvas. |
| Minimum file size (megabytes) | No | Skip files smaller than this size. Defaults to 0 (hash every file). |
| Note | No | A free-text description for your own reference. |
Common uses
- Spotting duplicate or near-duplicate documents in a large collection.
- Tagging documents so a downstream review process can group similar items together.
Summary
Reads the content of documents, images, and videos, extracts the text and basic technical information, and — if an AI helper is connected — produces a short, plain-language summary of each item.
What you'll need
- An upstream datasource or tool that supplies the items to summarise.
- (Optional) An AI helper connected by a helper edge if you want plain-language summaries or richer image descriptions.
Settings you provide
| Setting | Required | Description |
|---|---|---|
| Display note | No | A label for this node on the canvas. |
| Only process recognised document types | No | Skip files in formats Summary does not understand. Defaults to on. |
| Extract Exchangeable Image File Format (EXIF) metadata | No | Pull camera, phone, and image metadata such as date taken and location from photographs and video files. Defaults to on. |
| Video sample rate (frames) | No | For video files, sample one frame for every N frames. Lower values mean more frames are inspected. Defaults to 60. |
| Enable Artificial Intelligence (AI) Vision | No | Send images to the connected AI helper to produce richer descriptions of their content. Defaults to off. |
| Note | No | A free-text description for your own reference. |
Common uses
- Generating short summaries of long documents to help reviewers triage a backlog.
- Pulling text and EXIF data out of mixed photographs, scans, and videos.
Biometrics
Biometrics is a category of two related tools that detect human biometric features in documents and images. They are used most often as part of a Personally Identifiable Information (PII) workflow, where a face or fingerprint visible in a document needs to be flagged so it can be redacted.
Face Biometrics
Counts the human faces visible in pictures and videos. It can also extract a small amount of facial information so that downstream tools and workbenches can flag the face for redaction.
Settings you provide
| Setting | Required | Description |
|---|---|---|
| Display note | No | A label for this node on the canvas. |
| Minimum confidence | No | How sure the detector has to be (between 0 and 1) before it counts something as a face. Defaults to 0.6. |
| Also analyse videos | No | Look for faces in video files as well as still images. Defaults to off. |
| Note | No | A free-text description for your own reference. |
Fingerprint Detection
Detects images of fingerprints in scanned documents and flags them as PII so they can be reviewed and redacted.
Settings you provide
| Setting | Required | Description |
|---|---|---|
| Enable fingerprint detection | No | Turn detection on or off. Defaults to on. |
| Confidence threshold | No | How sure the detector has to be (between 0.3 and 0.9) before it flags an image as containing a fingerprint. Defaults to 0.6. |
| Extract minutiae points | No | Advanced: pull out the small ridge endings and bifurcations of the fingerprint. Defaults to off. |
| Detection timeout (seconds) | No | The most time the detector will spend on a single image. Defaults to 30. |
| Maximum image size (pixels) | No | Skip images larger than this size to keep processing time predictable. Defaults to 40,000,000. |
Common uses
- Flagging photographs of people in documents that will be released externally.
- Spotting fingerprint images in scanned identity documents so they can be redacted.
NHS Summary
Looks at documents drawn from National Health Service (NHS) records and pulls out the information that is most useful for clinical and administrative review: the type of document (for example, letter, lab result, discharge summary), the dates it covers, any NHS numbers and Medical Record Numbers (MRN) it contains, and a short plain-language summary of its content.
What you'll need
- An upstream datasource or tool that supplies the NHS documents to read.
- An AI helper connected by a helper edge. The NHS Summary tool relies on Artificial Intelligence (AI) to produce its output and will not run without one.
Settings you provide
| Setting | Required | Description |
|---|---|---|
| Display note | No | A label for this node on the canvas. |
| Classify document type | No | Identify whether each document is a letter, lab result, discharge summary, and so on. Defaults to on. |
| Generate summary | No | Produce a short plain-language summary of each document. Defaults to on. |
| Extract NHS numbers | No | Find and validate any NHS numbers in the document. Defaults to on. |
| Extract Medical Record Numbers (MRN) | No | Find any local hospital identifiers and patient record numbers. Defaults to on. |
Common uses
- Triaging a backlog of clinical letters by type and topic.
- Building a quick index of patient records by NHS number and MRN.
HWT-OCR
Handwritten Text Optical Character Recognition (HWT-OCR) reads the text on a page and turns it into machine-readable words, including the position of each word on the page. It can read both printed text and handwritten text. The position information is what later tools use to draw redaction boxes accurately on the original document.
What you'll need
- An upstream datasource or tool that supplies the documents and images to read.
- For the cloud reading mode (Amazon Textract), the Amazon Web Services (AWS) credentials must be configured by an administrator — see the administrator note below.
Settings you provide
| Setting | Required | Description |
|---|---|---|
| Display note | No | A label for this node on the canvas. |
| Reading mode | Yes | Pick the reading engine: Textract (cloud, best for printed text and tables), PaddleOCR (local, best for printed text in many languages), or TrOCR (local, best for handwriting). |
| Detect tables | Only with Textract | Also extract the contents of tables, with cell positions. Defaults to off. |
| Detect forms | Only with Textract | Also extract form fields as label-and-value pairs. Defaults to off. |
| Primary language | Only with PaddleOCR | The main language of the documents you are reading. Pick from the supported language list. Defaults to English. |
Common uses
- Turning scanned PDFs into searchable text for downstream summarisation, classification, or PII detection.
- Reading handwritten notes on forms or annotated documents.
- Producing word-level positions that feed Q-DACT and other redaction workbenches.
Meta to DB
Takes basic information about each document — filename, size, dates, and so on — and writes it into a database table. This makes a pipeline's worth of documents searchable and reportable from any other tool that can read the database.
What you'll need
- An upstream datasource or tool that supplies the documents whose metadata you want to record.
- A downstream database datasource (for example, Microsoft SQL Server, PostgreSQL, or Snowflake) connected by an edge. The metadata is written to that database.
Settings you provide
| Setting | Required | Description |
|---|---|---|
| Display note | No | A label for this node on the canvas. |
| Note | No | A free-text description for your own reference. |
| Metadata mappings | Yes | One row per piece of metadata you want to record. For each row, pick the metadata field on the left (for example, filename, size, created date) and type the column name to write it to on the right. Click Add mapping to add another row. |
Common uses
- Building a searchable inventory of every document a pipeline has processed.
- Feeding a reporting dashboard with up-to-date document statistics.
Review
Opens an interactive viewer (the Q-DACT Redaction Viewer) where a person can look at every redaction the pipeline has proposed for a document, accept or reject each one, draw new ones by hand, and approve the document for release. Q-DACT works on Portable Document Format (PDF) files, images, and videos — including time-based redactions on video.
What you'll need
- An upstream tool or workbench that supplies the documents and the proposed redactions to review (typically PII Detect, Biometrics, or HWT-OCR followed by a redaction-suggesting tool).
Settings you provide
| Setting | Required | Description |
|---|---|---|
| Display note | No | A label for this node on the canvas. |
The Review tool has no other configuration in the canvas. Everything else is done interactively when a reviewer opens the viewer.
Common uses
- Letting a person check and adjust every redaction before a document is released.
- Reviewing video footage to redact people, screens, or other sensitive frames.
SAR Release
Applies the approved redactions to each document, packages everything into a single Subject Access Request (SAR) release, and uploads it to a chosen destination. This is the tool that produces the final deliverable at the end of a SAR workflow.
What you'll need
- An upstream Review (Q-DACT) node so that redactions have been reviewed and approved.
- A downstream datasource node (for example, a SharePoint folder or a Network Drive) connected by an edge. The release is uploaded there.
Settings you provide
| Setting | Required | Description |
|---|---|---|
| Display note | No | A label for this node on the canvas. |
| Redaction style | No | The fill colour for the redaction boxes that get burned into the released documents: Black, White, or Grey. Defaults to Black. |
| Include manifest file | No | Add a small manifest file listing every document in the release together with redaction counts. Defaults to on. |
| Include audit log | No | Add an audit log file recording exactly which redactions were applied to which document. Defaults to on. |
Common uses
- Producing the final redacted package at the end of a Subject Access Request workflow.
- Generating tamper-evident release packages with manifest and audit log for compliance.
PII Detect
Scans documents for Personally Identifiable Information (PII) — names, addresses, contact details, government identifiers, and so on — and records each piece of PII it finds together with where it appeared. The output feeds review and redaction tools further down the pipeline.
What you'll need
- An upstream HWT-OCR node (or another text-extraction tool) so that PII Detect has text to scan.
- (Optional) An AI helper connected by a helper edge if you want PII Detect to use AI for the harder, more ambiguous cases.
Settings you provide
| Setting | Required | Description |
|---|---|---|
| Display note | No | A label for this node on the canvas. |
| Minimum confidence | No | How sure the detector has to be (between 0 and 1) before it records something as PII. Defaults to 0.5. |
| Context window (characters) | No | How many characters of surrounding text to keep with each detection so that reviewers can see it in context. Defaults to 300. |
| Maximum context samples | No | The most context samples the AI will look at per piece of PII when classifying. Defaults to 5. |
| Enable AI classification | No | Send ambiguous detections to the connected AI helper for a second opinion. Defaults to on. |
| PII types to detect | Yes | Tick which categories of PII you want the tool to look for. Includes general categories (names, addresses, phone numbers, email addresses, payment card numbers) and country-specific identifiers (NHS numbers, National Insurance Numbers (NINO), passport numbers, tax identifiers, and others). |
| Note | No | A free-text description for your own reference. |
Common uses
- Finding and flagging personal data across a document set ahead of redaction.
- Producing the input to a Review (Q-DACT) and SAR Release pipeline.
- Building a register of which documents contain which kinds of PII.