Learn about DiAGRAM

DiAGRAM tells you how well you are managing your digital preservation risks. It does this by calculating two probabilities:

  1. intellectual control defined as the probability that you have full knowledge of the file’s content, provenance and conditions of use
  2. renderability defined as the probability that you can provide a sufficiently useful representation of the original file

You can then use DiAGRAM to explore how these probabilities would change if you made different decisions or your collection changed. This tool uses a statistical method called a Bayesian network to quantify digital preservation risks. This method uses probability to consider the conditional relationships between the nodes (things like System Security and Technical Skills) to compute the probability of risk events occuring.

Flow diagram of the DiAGRAM Bayesian network The diagram shows the nodes of the Bayesian network and how probabilities flow through network from the root nodes downwards, finally outputting scores for renderability and intellectual control VBackground-1 Solid

Node relationships

The image describes the relationships between the nodes that make up the dependency structure of Bayesian network model. The nodes in the diagram of the model are arranged into five rows, the first being the inputs, the last being the outputs.

The first row includes the following nodes, each of which relates to the input questions of the create a model process. Each node has one or more child nodes and subsequent grandchild nodes, the relationship of which describes the conditional probability structure of the model.

  • Operating Environment, your archive's policy on the storage location of your digital material (child: Storage Life).
  • Replication and Refreshment, your archive's policies on making copies and regularly moving digital material on to newer versions of the storage media (child: Storage Life).
  • Physical Disaster, the risk of a flood at your archive's primary storage location. For this first version of the tool, we will only consider a flood as a physical disaster (child: Storage Life).
  • Storage Medium, the type of media on which your digital material is stored such as USB hard drives, CDs or the Cloud (children: Storage Life, Obsolescence).
  • Technical Skills, bespoke digital preservation skills such as awareness of technological trends, detailed knowledge of storage media, hardware and software, skills to perform file format migration, skills to find emulating software etc? (children: Obsolescence, Tools to Render, Technical Metadata).
  • Digital Object, the proportion of your archive made up of born-digital, digitised and surrogate files (children: File Format, Conditions of Use, Content Metadata).
  • Information Management, internal systems and support for coherent information management and documentation of preservation actions. This is needed to ensure integrity and provenance of the digital object (children: Integrity, Identity).
  • System Security, a secure system can protect data from deletion or modification from any unauthorized party, and it ensures that when an authorized person makes a change that should not have been made, the damage can be reversed. Definition from Forcepoint (child: Integrity).
  • Checksum, a unique numerical signature derived from a file that can be used to compare copies (definition from the DPC handbook.). A checksum is needed to ensure integrity of the digital object (child: Integrity).

The second row contains just one node: File Format. A file format is a standard way that information is encoded for storage in a computer file. It tells the computer how to display, print, and process, and save the information. It is dictated by the application program which created the file, and the operating system under which it was created and stored (definition from Wikipedia). The file format can be described as ubiquitous if it is widely known and used by non-specialists. If the file format is not proprietary, it can be described as open. The availability of tools and software to render a digital object depends on the file format ( parent: Digital Object; child: Tools to Render ).

The third row contains seven nodes:

  • Storage Life, the length of time for which the physical storage device is expected to store the digital object's bit stream. The end of storage life can be described as the point at which you can no longer store or retrieve data due to hardware defects or malfunction ( parents: Operating Environment, Replication and Refreshment, Physical Disaster, Storage Medium; child: Bit Preservation ).
  • Obsolescence, essential equipment, hardware or software required to access the bit stream becoming out of date or unusable. For example, hardware no longer being produced or essential software to access the media no longer supported ( parents: Storage Medium, Technical Skills; child: Bit Preservation ).
  • Tools to Render, availability of tools and software to render the digital material and the expertise to use them ( parent: Technical Skills; child: Renderability ).
  • Technical Metadata, technical information that describes how a file functions and that enables a computer to understand it at the bit level, so that it can be rendered in a way that is faithful to its original content. Definition adapted from NEDCC - Fundamentals of AV Preservation - Chapter 4 ( parent: Technical Skills; child: Renderability ).
  • Conditions of Use, knowing of the conditions of use and any restrictions on the digital material, including the legal status, copyright, who owns the intellectual property and Freedom of Information restrictions ( parent: Digital Object; child: Intellectual Control ).
  • Content Metadata, describes the intellectual entity through properties such as author and title, and supports discovery and delivery of digital content. It may also provide an historic context, by, for example, specifying which print-based material was the original source for a digital derivative (source provenance). It also includes provenance information of who has cared for the digital object and what preservation actions have been performed on it. This definition is adapted from the Digital Preservation Metadata Standards publication ( parent: Digital Object; child: Identity ).
  • Integrity, the assurance that the bit-stream is identical to when it was added to the archive ( parents: Information Management, System Security, Checksum; child: Bit Preservation ).

The fourth row contains just two nodes.

  • Bit Preservation, a term used to describe a very basic level of preservation of digital resource files as they were received from the depositor (literally preservation of the bits forming that form a digital resource). Activities may include maintaining onsite and offsite backup copies, virus checking, fixity-checking, and periodic refreshment to new storage media ( parents: Storage Life, Obsolescence, Integrity; child: Renderability ).
  • Identity, knowing what the material is and where it is from. Specifically… Can you locate the file? Is it sufficiently described for you to know this is what you want? Can you understand its context within the archive? Can you find other versions of the file which were created by preservation actions? Can you find the provenance of the file? ( parents: Information Management, Content Metadata; child: Intellectual Control ).

The fifth and final row of the network diagram contains two nodes, the outputs of the model.

  • Renderability, the object is a sufficiently useful representation of the original file ( parents: Tools to Render, Technical Metadata, Bit Preservation ).
  • Intellectual Control, having full knowledge of the material content, provenance and conditions of use ( parents: Conditions of Use, Identity ).

The Bayesian network consists of nodes which define the various factors that affect the preservation of digital material.

For example, by looking at the picture of the network you can see that improving Storage Medium will impact Renderability but improving Information Management will impact Intellectual Control.

For every node there are probabilities assigned to each of their possible states taken from existing data where it was available. As historical data were not available for some risks, the probabilities were established using a formal expert elicitation procedure involving a number of UK digital archivists and preservation experts.

These probabilities are used to calculate the final two probabilities of having full intellectual control and renderability of your digital material, the two key factors in the successful preservation of digital material.

For a more technical explanation of the methodology, see this paper from Barons, Wright and Smith on Eliciting Probabilistic Judgement for Integrated Decision Support Systems with an example of Food Security.

You can explore how the network works and the data we have used in more depth on the Advanced Customisation page. Here you can also update the probability tables with your data for the other nodes (the ones you do not provide data for by answering the questions). This allows you to create a more bespoke model. The advanced option does assume you are familiar with conditional and marginal probabilities and the theory behind a Bayesian network.