Archiving

The archive is scanned. Now what?

Digitising is not scanning. It is deciding based on scanned information and correct metadata.

In the basement there are a hundred metres of archive cabinets. Nobody knows exactly what’s inside. Cardboard boxes with stickers like “HR 2003-2008” and “Northern projects - misc.” The annual storage costs aren’t trivial. The director asks: “Can’t we just get rid of it?” The answer is more complicated than he had hoped.

The Dutch Archives Act 1995 requires public-sector organisations — and bodies with public duties — to keep information in line with retention periods set out in a disposal schedule. Only when a retention period has expired may, and must, information be destroyed. Records of enduring value go to an archive service. The new Archives Act is currently before the Senate, with intended entry into force in 2027; among other things it will shorten the transfer period from 20 to 10 years.

Crucial in both versions: destruction is only permitted on the basis of a structured registration with metadata. Unstructured information formally may not be destroyed, because you can’t demonstrate what you’re destroying. Translate that to a hundred metres of boxes: you can’t simply order a skip. You have to determine, per box, per file, sometimes per document, what it is, when it was closed, which category from the disposal schedule applies, and whether the period has expired. And if personal data are involved, the GDPR also kicks in — which says personal data may not be kept longer than necessary. The opposite is just as awkward: information that should have been destroyed but is still lying around must simply be disclosed when a Woo (FOI) request comes in.

The classic solution is a scanning operation: boxes out of the basement, industrial scanning, PDFs back onto a server. Done, everyone thinks. In reality you have now replaced a hundred metres of paper with terabytes of unstructured PDFs — neither findable nor legally defensible to destroy. The problem has moved, not been solved. The real challenge is the second step: determining per document what it is, which retention period applies, and whether it can go. With a hundred metres of archive, quickly hundreds of thousands of documents, that is years of manual work.

We treat a physical archive as a three-step pipeline in a single pass. First scanning, with automatic rotation correction and quality control, and OCR that also works on old typewritten letters, carbon copies and handwritten notes. Then the intelligent layer: AI classification recognises document types (contract, letter, memo, personnel file, construction drawing, financial report), links them to the relevant categories in your disposal schedule, extracts the relevant metadata and works out the retention period from there. For each document a recommendation follows: keep, transfer, or destroy, with the reasoning attached. The third step is human validation where it matters. At 95% accuracy on 500,000 documents you still have 25,000 potential errors; doubtful cases go to an archivist who takes the final decision. 90-95% is automated; the sensitive 5% gets exactly the extra attention it needs.

By the end, three things have been achieved at the same time. The physical archive is gone — only what really has to remain physical stays. What has been digitised has not only been scanned but structured: searchable, with metadata, linked to the disposal schedule. And there is a destruction certificate, as the Archives Act requires, with a complete specification of what has been destroyed, on which category, and when. The director gets his basement back. The DPO and the archivist get a demonstrably lawful destruction process. And the next Woo (FOI) request can be answered from an archive that knows itself, instead of a hundred metres of ‘let’s hope it doesn’t become relevant this afternoon’.

The archive is scanned. Now what?

Recognise this situation?

The privacy leak nobody saw coming

If your dossier isn't in order, you pay twice.

The archive is scanned. Now what?

Recognise this situation?

See also

The privacy leak nobody saw coming

If your dossier isn't in order, you pay twice.