So, let’s dig in with a look at the foundational thinking behind our approach to the 2030 Archive.
Archive as part of the Content Lifecycle
In the 2030 Vision we assume all media is created in the cloud, therefore archiving it in the cloud actually simplifies ingestion because the content doesn’t need to be moved to be “archived” in the cloud as it’s already present1. However, the 2030 Archive requires a new way of looking at the role of archiving media as we no longer view the archive as the “end of the road” and the last step in the process where media is placed in a secured vault in perpetuity. Instead, we can view the archive as part of the overall content lifecycle. We view assets in the archive as in a place of secure stasis until they are needed again and then can be quickly accessed – the concept we discussed in the vision as making the archive an active “library”2 so assets are not locked away but readily accessible by teams that may need them, all managed via the preservation roles and policies that define the 2030 Archive (see below for more details).
In fact, the archiving lifecycle in the 2030 Archive can start as soon as productions create new assets (from the first concepts to the final masters), those assets can be automatically tagged as archival candidates, eliminating the crush as production wraps to find and source all the assets the studio decides it wants to archive.
The Importance of Policy
Some assets created during production are transient and can be readily deleted, others however could be “tagged” for long term preservation, as soon as they are created, if the content owner knows they’ll likely want to keep them. The key to enabling this future vision of archiving is a “Preservation Policy” that defines how each production asset should be treated for long term preservation and tagging of each asset with this policy as soon as they are created or ingested into the cloud. This Preservation Policy could define, for example:
- which assets should be added to the archive?
- how they should be protected when they are in the archive?
- how many copies should be stored?
- which clouds and tiers they should be stored on (and whether one or more of those copies should be entirely offline)
- which geographic region(s) these copies should be stored in?
- whether the asset should be marked as immutable (such that it cannot be accidentally changed)
- who has permissions to see or withdraw the asset?3
- how the encryption keys should be managed and preserved?
This preservation policy would likely not be a material part of the file but stored in a database or metadata file alongside other information such as the provenance of the file, its relationships to other files etc. The management of those assets can then be automated at enormous scale by changing the Preservation Policies and allowing underlying systems to then enforce those policies. If those Preservation Policies were expressed in a common way across the industry, they could be implemented regardless of which cloud they are stored on, which studio owns the asset, or which application created it.
Let’s look at an example of how this Preservation Policy could be used in a workflow scenario, for example with an Edit Decision List or EDL. The studio may set a policy that the final version of the Edit Decision List (EDL) should be retained in perpetuity. During production it can be hard to define which version is “final” as the file is being constantly iterated, sometimes by multiple teams, so it’s left until production wraps to define what was the final version and where it is stored. However, with an automated preservation policy we can see a different approach:
- As an EDL is created it is tagged with a Preservation Policy that defines how the studio wants it to be maintained – perhaps 3 copies in 2 different cloud regions, with read-only access and 1 copy being immutable.
- The current version of the EDL is effectively treated as final.
- As content creation proceeds the EDL version is superseded frequently with a newer version and in this example the policy for “final copy” moves to the latest iteration which would be protected at the highest level.
- Prior versions of the EDL can be automatically tagged with a different policy because it is no longer current (automation systems can then decide if those prior versions should be retained at all, or for a certain period of time, and automatically handle removing or reducing the protection on those versions).
It is this efficiency in enabling automation that will allow the studio to realize many of the benefits of the 2030 Archive – assets or asset types can be automatically tagged with a policy and then systems are empowered to enforce that policy without humans needing to manually manage each asset.
Storage Systems and Multi-Cloud
MovieLabs has been advocating for production systems that are independent of any single cloud (public, private or hybrid) and are inherently multi-cloud where all applications can store, find and retrieve assets regardless of the infrastructure they are operating on. By enabling this multi-cloud infrastructure, we provide long term choice in where media should be stored and reduce barriers to that choice – so if a new or more efficient storage solution is offered the archival assets can be seamlessly moved to it. Today there are many differences between cloud storage providers and the capabilities of their storage and data management tools, service, policies, buckets, file level permissions and so on. The preservation policy we are proposing here would allow storage service providers to manage data across this myriad of potential infrastructures to ensure that the preservation policies are being correctly managed over and above the choice of storage.
MovieLabs has also proposed abstracting the physical storage location of an asset away from each application by using a unique identifier for each file which can then be resolved to find the actual location(s) of it. Using this model of resolvable identifiers means each application does not need to know where every particular file is stored across any of a number of clouds and frees systems from being tied to legacy file systems and file paths. For archive, this enables future proofing the archive for potential changes in future storage platforms and provides a more durable reference when used for inter-asset references.
“Virtualizing” the Archive
The multi-cloud and policy driven approach to managing the 2030 Archive also enables a new opportunity to ‘virtualize’ archives by treating the same assets as different archives by applying different policies. This separation of policies enables characteristics of separate departmental archives and their specific use cases and yet still delivers on the single source of truth model with efficiencies in file management and storage costs – store a file once but allow it to appear differently depending on the policies of each department.
We recognize this is a different approach and will come with change management challenges to explain, design and deploy such systems and processes but it provides potentially enormous efficiency and resiliency benefits that will outweigh the short-term complexities.
Securing the 2030 Archive
We believe the Common Security Architecture for Production (CSAP) with its zero-trust approach to security provides the most robust way to protect assets in the 2030 Archive. CSAP ensures that every asset, and every action on that asset, must be both authenticated and authorized, every time. By integrating CSAP’s approach into workflow managers and archival management systems the archivist can define security policies that control which roles or individual people/systems can access which assets with robust protection from rogue actors or systems. In addition, we believe that the encryption keys for assets secured in the archive must be subject to the same, or even higher level, of diligence and protection as the files they unlock – because an active threat vector for highly valuable files is to ransom the keys or release them to the internet. Any long-term cloud archive therefore needs a plan for asset protection, metadata integrity, and encryption key(s) resilience.
A full discussion of CSAP and how it can be paired with Preservation Policy maybe the subject of a future blog, as it’s beyond the scope of what we can cover here.
Separating Assets from Metadata
Another core tenet from the MovieLabs approach to implementing Software-Defined Workflows is to treat metadata as a first-class citizen and store it separately from the assets, but in such a way that it is broadly available. The metadata can then be protected in the long run and also accessible, without pulling the media assets at all – as there are many use cases for the archive (searching for references, catalog management, changing permissions and policy, etc.) that do not necessitate the inherent (if small) risk and cost of pulling media assets out of the archive. Metadata should also be expressed in open formats (such as the extensible Ontology for Media Creation) so that all systems, both now and in the future, can address the archive in a common way and accessing the archive is not hobbled by proprietary data formats or legacy asset management systems.
What’s Next?
Now we’ve covered the core concepts behind the 2030 Archive, in Part 3 of this series we’ll coalesce these concepts down into eight core 2030 Archiving Principles which can help those implementing archival systems or processes to be aligned to the concepts.
[1] Of course, there may be a change in storage tier class or an intentional move from one cloud to another (for example from private cloud to hyperscale cloud provider, or vice versa, but we’re addressing here that the media would already be in a ”cloud” defined as storage/compute services connected to the public internet.
[2] Skywalker Keeps the Humanity in Automated Soundtrack Mastering – MovieLabs
[3] Note – we’re calling these Preservation Policies to differentiate from security access policies and other elements of authorization and authentication policy enforcement. Security policies are important for protecting these assets and are discussed in great detail in the Common Security Architecture for Production (CSAP). In an archive implementation security policies need to be created and enforced to protect the assets, in conjunction with these specific Preservation Policies that define how the asset should be managed for long term archiving.