DiAGRAM - The Digital Archiving Graphical Risk Assessment Model

Version 1.0.0

DiAGRAM is an online tool designed to help archivists manage the risks to their digital collections. By answering a set of questions relating to archives such as storage media, system security and technical skills, the tool will use statistical methods to calculate the probability that your digital material is preserved.

Who created it?

The tool was designed by archivists from a range of organisations, including a large national institution to local authorities, universities or businesses.

Who should use it?

The tool is based partly on data from UK sources and will give the best results for archives based in the UK however, it will also be useful to archivists working anywhere in the world.

  • Understand the risks involved in digital preservation as defined in the model and how the risk events are linked together
  • Create a model that reflects the records and practices of your digital archive
  • Test alternative scenarios to see how this impacts the risk score
  • Download your models and a summary of the results to include in a report or business case
  • Upload a model from a previous session and continue exploring scenarios from there

How do I use it?

In order to get the most meaningful results from DiAGRAM you will need to have assessed your archive against some digital preservation standards in advance. See 'How to use the tool' for the information you will need to use DiAGRAM effectively.

If you are using DiAGRAM for the first time it will take about 45 minutes to complete it providing you have the information you need with you.

Your results will only be saved for twenty-four hours. You can click on any of the other pages of the tool and return to your question or your results.

How does it work?

DiAGRAM uses a quantitative approach to digital preservation using a statistical method called a Bayesian network. This is the first time a quantitative approach to measuring digital preservation risk has been attempted. Bayesian networks consider the conditional relationships between variables to compute the probability of risk events occurring. DiAGRAM uses real evidence and data acquired through a structured expert judgement workshop to calculate these probabilities.

For more information about how the tool works see 'Learn about DiAGRAM'.

Project Team

DiAGRAM was built by The National Archives and the University of Warwick with support from the National Lottery Heritage Fund and the Engineering and Physical Sciences Research Council.

DiAGRAM structure

Flow diagram of the DiAGRAM Bayesian network The diagram shows the nodes of the Bayesian network and how probabilities flow through network from the root nodes downwards, finally outputting scores for renderability and intellectual control VBackground-1 Solid

Node relationships

The image describes the relationships between the nodes that make up the dependency structure of Bayesian network model. The nodes in the diagram of the model are arranged into five rows, the first being the inputs, the last being the outputs.

The first row includes the following nodes, each of which relates to the input questions of the create a model process. Each node has one or more child nodes and subsequent grandchild nodes, the relationship of which describes the conditional probability structure of the model.

  • Operating Environment, your archive's policy on the storage location of your digital material (child: Storage Life).
  • Replication and Refreshment, your archive's policies on making copies and regularly moving digital material on to newer versions of the storage media (child: Storage Life).
  • Physical Disaster, the risk of a flood at your archive's primary storage location. For this first version of the tool, we will only consider a flood as a physical disaster (child: Storage Life).
  • Storage Medium, the type of media on which your digital material is stored such as USB hard drives, CDs or the Cloud (children: Storage Life, Obsolescence).
  • Technical Skills, bespoke digital preservation skills such as awareness of technological trends, detailed knowledge of storage media, hardware and software, skills to perform file format migration, skills to find emulating software etc? (children: Obsolescence, Tools to Render, Technical Metadata).
  • Digital Object, the proportion of your archive made up of born-digital, digitised and surrogate files (children: File Format, Conditions of Use, Content Metadata).
  • Information Management, internal systems and support for coherent information management and documentation of preservation actions. This is needed to ensure integrity and provenance of the digital object (children: Integrity, Identity).
  • System Security, a secure system can protect data from deletion or modification from any unauthorized party, and it ensures that when an authorized person makes a change that should not have been made, the damage can be reversed. Definition from Forcepoint (child: Integrity).
  • Checksum, a unique numerical signature derived from a file that can be used to compare copies (definition from the DPC handbook.). A checksum is needed to ensure integrity of the digital object (child: Integrity).

The second row contains just one node: File Format. A file format is a standard way that information is encoded for storage in a computer file. It tells the computer how to display, print, and process, and save the information. It is dictated by the application program which created the file, and the operating system under which it was created and stored (definition from Wikipedia). The file format can be described as ubiquitous if it is widely known and used by non-specialists. If the file format is not proprietary, it can be described as open. The availability of tools and software to render a digital object depends on the file format ( parent: Digital Object; child: Tools to Render ).

The third row contains seven nodes:

  • Storage Life, the length of time for which the physical storage device is expected to store the digital object's bit stream. The end of storage life can be described as the point at which you can no longer store or retrieve data due to hardware defects or malfunction ( parents: Operating Environment, Replication and Refreshment, Physical Disaster, Storage Medium; child: Bit Preservation ).
  • Obsolescence, essential equipment, hardware or software required to access the bit stream becoming out of date or unusable. For example, hardware no longer being produced or essential software to access the media no longer supported ( parents: Storage Medium, Technical Skills; child: Bit Preservation ).
  • Tools to Render, availability of tools and software to render the digital material and the expertise to use them ( parent: Technical Skills; child: Renderability ).
  • Technical Metadata, technical information that describes how a file functions and that enables a computer to understand it at the bit level, so that it can be rendered in a way that is faithful to its original content. Definition adapted from NEDCC - Fundamentals of AV Preservation - Chapter 4 ( parent: Technical Skills; child: Renderability ).
  • Conditions of Use, knowing of the conditions of use and any restrictions on the digital material, including the legal status, copyright, who owns the intellectual property and Freedom of Information restrictions ( parent: Digital Object; child: Intellectual Control ).
  • Content Metadata, describes the intellectual entity through properties such as author and title, and supports discovery and delivery of digital content. It may also provide an historic context, by, for example, specifying which print-based material was the original source for a digital derivative (source provenance). It also includes provenance information of who has cared for the digital object and what preservation actions have been performed on it. This definition is adapted from the Digital Preservation Metadata Standards publication ( parent: Digital Object; child: Identity ).
  • Integrity, the assurance that the bit-stream is identical to when it was added to the archive ( parents: Information Management, System Security, Checksum; child: Bit Preservation ).

The fourth row contains just two nodes.

  • Bit Preservation, a term used to describe a very basic level of preservation of digital resource files as they were received from the depositor (literally preservation of the bits forming that form a digital resource). Activities may include maintaining onsite and offsite backup copies, virus checking, fixity-checking, and periodic refreshment to new storage media ( parents: Storage Life, Obsolescence, Integrity; child: Renderability ).
  • Identity, knowing what the material is and where it is from. Specifically… Can you locate the file? Is it sufficiently described for you to know this is what you want? Can you understand its context within the archive? Can you find other versions of the file which were created by preservation actions? Can you find the provenance of the file? ( parents: Information Management, Content Metadata; child: Intellectual Control ).

The fifth and final row of the network diagram contains two nodes, the outputs of the model.

  • Renderability, the object is a sufficiently useful representation of the original file ( parents: Tools to Render, Technical Metadata, Bit Preservation ).
  • Intellectual Control, having full knowledge of the material content, provenance and conditions of use ( parents: Conditions of Use, Identity ).