Archiving Principle 1: Assets are maintained for long term preservation with content owner approved fixity1 verification
For content owners to be comfortable with their storage medium, we need to state at the outset here, that we’re expecting the integrity of every data byte to be returned exactly as it was submitted. So, zero data loss or corruption over time.
As we’re expecting multiple infrastructures and systems to be used to provide resiliency and choice, and no doubt technology will evolve, we believe it’s important that content owners should be able to select their own method of verification method to check that their files are still bit-perfect and they should also define the frequency of checking. Today each cloud infrastructure provider and storage system does their own fixity checks on their own schedules. However, if studios want to check their own files on a public cloud they need to create a request to restore the file (reconstitute it, download it), move it to a compute instance and generate a checksum to compare with their master list. This is expensive, time consuming, inconsistent between clouds and ultimately a barrier to mass cloud adoption in archiving.
Archiving Principle 2: Assets for the archive are selected according to each studios criteria; ongoing curation of them is a continuous process throughout the content lifecycle
Even though we highlight the automated systems that can assist archivists in maintaining policy and ensure compliance, with this principle we are making it clear that the creation of those policies, unique for each studio and potentially different use cases, falls to the archiving function (as it does today) to create and maintain. As content moves through its lifecycle and technology continues to evolve, the archivists are there to maintain the archive and ensure the correct files are maintained, and if unnecessary files are being stored that they be removed. The same is true of the associated metadata and links/associations between files which are vital parts of maintaining a useful and searchable archive.
Archiving Principle 3: Management of assets is driven by preservation policies that are enforceable regardless of infrastructure on which they are hosted or the applications that manage them
The Preservation Policy, on a per object basis, defines how each asset should be stored. To make systems easy to manage across multiple clouds, that policy should be commonly understood and implemented. This enables scale and choice and yet provides the archivist with the confidence that if, for example, the Preservation Policy says an asset must be stored across 2 different cloud availability zones, with 1 copy always immutable and it’s only accessible by 3 users – then that is what is maintained, regardless of the vendor providing the storage or the management system. With this approach, contracts for storage systems and asset managers may come and go during the very long-term archive but the policies that govern each file can pass seamlessly between systems.
Archiving Principle 4: Assets in the Repository are stored with minimal reliance on proprietary formats.
We live in a digital world and digital files come in a myriad of types and subtypes. Some of those are well documented, broadly supported with applications that read and write to them and even open-source implementations such that they are known and likely to be usable in the future (and not just the near future but the distant future – 50 or more years away). It is these sorts of file and metadata formats that are ideal for archiving because there’s likely an ecosystem of software that can read and manipulate that data and if one product or provider withdraws from the market there will be other options.
There are of course other data and metadata files and formats that are proprietary to certain applications or ecosystems and often for good reason, because they contain critical or protected functionality that the owners of the software believe to be differentiating. The problem for the archivist is that those formats (while they may be the original source or highest quality version of the asset or expression) present a risk if stored in the archive that they may not be usable in the future. As archivists like to reduce risk, so they tend to avoid, if at all possible, formats that may provide false confidence that an asset is protected for the future.
So in the 2030 Archive we should always try, within reasonable boundaries, to store for perpetuity the highest quality version of the data, within a format that is going to continue to be supported and accessible in the future, or something that can be easily translated into a common format. There are always going to be cases, for example Original Camera Files (OCFs) from different camera manufacturers what it may not be possible to archive in a common format without losing some data precision in which case we may need to consider also archiving the codec, application or environment which would allow that file to be usable in the future.
Archiving Principle 5: Common ontologies and common metadata formats are used to support discoverability and automation
Automation systems work best with common formats that can be consistent across domains and allow interoperability. Selecting extensible common data formats also enables code reuse for systems that connect to the archive, reducing implementation costs and avoiding complexity as different systems parse and transform data. Future technologies, which we can’t even dream of today, may require different metadata fields and file formats and different terminology or standards, so we need to select approaches that enable extensibility to adapt to an unpredictable future. For example, until very recently archive file formats and databases (eg, Asset Management Systems) didn’t have a distinction for High Dynamic Range (HDR) and Standard Dynamic Range (SDR) versions of assets. In the first 100 years we have been storing movies the dynamic range was that of film and we’ve not needed to define that a particular copy of the film has it’s dynamic range constrained in a creative process. Defining an asset as HDR or SDR is a recent issue caused by the creation of HDR mastering for consumer delivery. The same is true of other technologies like resolution (2K, 4K, 8K), immersive audio (spatial /object audio versus 5.1 mixes), etc. Many other future technologies could create the same sort of challenge, and whereas we don’t know what those innovations will be, we do know they’ll come and we should prepare for them with extensibility now.
Archiving Principle 6: The archive repository provides a secure mechanism, so any authorized application can access assets via a common search and discovery API
Many archives today are accessed via MAMs or other administrative gateways which require a user to find the assets they want to use and then manually copy them into their application to manipulate that asset – requiring egress costs and creating duplicates with a chance of version control issues, data corruption or loss. When archivists are confident that the read/write permissions of their Preservation Policy are being correctly enforced, then in the 2030 Archive, they can enable creative applications to directly access the archive assets. For example, an editor working on a new show would be able to seamlessly review or reuse archival assets from prior shows directly within their Non-Linear Editor without requiring lengthy requests to find content or copy it from archives2. When the 2030 Archive is enabled it would allow this sort of access without fear that the user could delete, corrupt or edit the original asset (as they would not have sufficient permissions) and without the costs and delays of the current approach.
Archiving Principle 7: Assets are protected for their lifetime by security that allows only authorized actions
Common interfaces and security enable a seamless workflow with a security system that interweaves access and authorization, so that tasks can be approved and security provisioned, simultaneously. This commonality lowers development costs but also enables administrators to build trust that the archive is truly secure from malicious and accidental damage. Creating trust in this new system is our first objective such that authorized and intended participants can do the actions they need to do to get their work done quickly and efficiently, but bad actors are prevented from accessing, corrupting, stealing or otherwise attacking the archive. We take such security very seriously and believe the Common Security Architecture for Production (CSAP) should be considered for such protection as it is starts with Security by Design and is implemented as a Zero Trust Architecture.
Archiving Principle 8: Applications request access to assets, independent of their storage location
MovieLabs proposes an approach to asset management that uses identifiers to find and retrieve content rather than fixed file names/paths, locations or database instances. By using identifiers assets in the archive can refer to one or more ‘resolvers’ which can look up where those assets can be found. The MovieLabs blog on resolvers (Through the Looking Glass – MovieLabs) proposes an architecture that allows all systems, whether original or newly added, to consistently access assets independent from the underling cloud infrastructure(s). This approach supports multiple clouds (public, private, hybrid) and storage types with all addressed in a simple and consistent manner that can be easily integrated into role-based security and policy based management systems and expanded to new applications as they are added over time.
Conclusion
MovieLabs believes the new approach to archiving described in this blog series will enable a more secure, resilient, cost efficient and usable archive, while still supporting all of the archival use cases. The good news – most of what is proposed here is not novel technology that needs to be invented, although the industry will need to rally to implement it in ways to support this new 2030 Archive. What is required is the education, change management and industry alignment around these archival principles and concepts to enable us to start deploying 2030 archive aligned platforms.
We welcome input so if you have read this series and have questions or comments share them with MovieLabs (office@movielabs.com) or even better join an industry forum like the Academy Digital Preservation Forum and join us to discuss and debate these concepts and how we can jointly implement them.
[1] Fixity is defined as the state of something being unchanging or permanent. For example, today content owners generate a checksum (or hash) of a file before submitting file to a storage system. By pulling the file out later and generating a new checksum they can check if a file has changed or is exactly the same as what was stored. Fixity checks can tell if you a file is corrupt, although they can’t tell you how or where the file is damaged. Our objective is to increase fixity by minimizing any loss of data over time.
[2] Disney have already enabled this for Editors within the Marvel Cinematic Universe to reference or reuse any material from any of the prior movies or TV shows, see the 2030 Showcase case study here.