Section 3.3
SOFTWARE-DEFINED WORKFLOWS
EXTENDED INSIGHTS: Exclusively Online
Building the Future of Media Production
Watch Jim Helman introduce the concepts behind Software-Defined Workflows
OVERVIEW
As technology advances at a blistering pace, the demands to constantly redesign production pipelines to accommodate new technologies are becoming untenable. Furthermore, dependence on an agglomeration of legacy tools results in a fragile environment, susceptible to failures that can ripple throughout an entire production process. This principle mitigates these issues by establishing standardized building blocks for workflow processes with common data file types, descriptive metadata and interfaces for applications to interface with those systems. By adopting a modular methodology for production pipelines, creatives can quickly construct and adapt workflows from these building blocks. The blocks will each have their own defined minimum data, metadata, input formats and output formats and will easily communicate with each other using consistent underlying data systems.
We also include the notion of non-destructive workflows; that is, whenever possible, the original asset is maintained in its original state. Production processes layer modifications that are described in metadata files. In that way, the original assets can always be retained and any changes or enhancements regressed back by peeling away layers of modifications.
We can envision an industry interface layer, likely in the form of a series of standardized application programming interfaces (APIs)2. Workflows would consist of processes exchanging assets data and associated metadata through this interface layer with other processes. This model would support a marketplace-style environment where providers could compete to offer the best components and/or services as modules that plug into a specific workflow/pipeline. Content creators can select, mix and match any of these services to design a workflow or swap them out without having to redesign their pipeline from scratch.
Figure 2: Standardized building blocks contain agreed common data and metadata for key processes in production. The API interface layer abstracts that information such that any application, tools, portals or service can interface through the API layer with any other tool without needing to know about it in advance.
Each major process used in production – from on-set and dailies down to dubbing and mastering – uses distinct file types and metadata as inputs to the creative processes. The resulting work may be altered assets (e.g., a composited image) or altered metadata (e.g., an editorial decision list or EDL). For the interface layer to work most effectively, we will need to describe both the standardized asset files and a minimum amount of metadata. Similarly, there are certain standardized outputs (e.g., data, metadata) for each production subprocess, which are contained in the data storage. Each content creator may have their own additional sets of metadata that they will want to track on a per-project basis, and the system will need to accommodate such ancillary datasets.
Over the years, the list of standardized metadata and data for the building blocks may grow, but for now, there are enough similarities in production processes that an initial set of data and an extensible metadata schema can be developed across studios and productions to start to bring some order to the chaos.
EXTENDED INSIGHTS: Exclusively Online
Current MovieLabs Projects Delivering on Software-Defined Workflows
MovieLabs has a number of ongoing programs to deliver on the these principles of interoperable media workflows
EXAMPLES
A new company is formed that creates a new niche production tool to track color management information from on-set. Instead of creating its own API, the company plugs into the existing interface layer, which now quickly and easily allows any content creator to integrate the tool and immediately begin ingesting information.
A VFX vendor delivers element packages back to the studio for archive. The interface layer would understand the various elements (3D models, textures, etc.) and extract the metadata from each asset to present to the studio’s databases for further data processing. Because this metadata extraction has been standardized, the studio does not need to create its own normalization of the data, as the VFX building blocks already contain typical data fields for each asset.
We cannot hope to predict every nuanced data field that may be required for future productions, but by defining an extensible schema, we may accommodate a new camera technology in the capture building block and allow it to interface with legacy software applications with no prior knowledge of the new technologies or file types.
IMPLICATIONS
The standardized building blocks coupled with an industry-wide interface layer enables the best of both worlds: economies of scale from consistency, and freedom and creativity for each production. By creating an industry standard interface layer, we can ensure that any number of web applications, creative tools, bots and other software that understand the underlying files, their structure and associated metadata can be developed.
By creating standards for consistent media and metadata nomenclature, hierarchy and storage interfaces, we can define what is the same for every production (scripts, production notes, video files, audio files etc.) and where they can be found in the cloud.
By 2030, we may have an entirely object- and metadata-based storage system, which means filenames become irrelevant. In the interim, however, an early step to building-block workflows would be to at least normalize the naming systems with an open data model so that productions could be consistent in how they describe foundational pieces, such as a scene or take, and how they name a 3D asset’s mesh versus its other component pieces.
We can envision physical studio facilities adapted to support these building blocks. They would have the ability to configure equipment and software to perform certain functions in a room one day, and the next day, rapidly modify the configuration using different building blocks to perform a different function.
EXTENDED INSIGHTS: Exclusively Online
The MovieLabs Ontology for Media Creation (OMC)
Since the publication of the 2030 Vision, MovieLabs has been actively building the OMC, the extensible metadata schema mentioned in this document.
Explore the OMC in detail: from the current version to its features and scope
OVERVIEW
Currently many creative processes happen without real-time feedback. For example, VFX renders can take 24 hours or more to create final composited frames. This makes iteration slow. However, in late 2018, the industry saw dramatic improvements in the quality of video game engines with the addition of hardware-accelerated GPU-based ray-tracing on affordable workstation and cloud graphics cards. In the future, a new suite of filmmaking tools will evolve from today’s game-creation engines – so as not to confuse these tools with game creation processes, we refer to them as real-time engines (RTEs). These new tools plus the new cloud foundation principles will dramatically change the creation and economics of filmmaking, potentially upending the sequence of creation, and enable new workflows at preproduction, production, postproduction and potentially even delivery of filmed media in the future.
Today’s (2019) game engines are increasingly used to previsualize sequences of films (usually the complex action-packed scenes). In some cases, entire movies have been previsualized on a game engine system. Game engine renderers have also been used to create final cut renders on some movies. Beyond these examples, there are many more opportunities for these real-time iterative workflows in the future.
EXAMPLES
Preproduction
By 2030, traditional camera-based productions will look increasingly like the workflow for the animated features of today, that is, a world without unneeded setups or unproductive production days. With animation today, the processes are iterative. The movie is first storyboarded out scene by scene. Then it is performed with “scratch audio,” and finally with increasingly advanced animatics. At all times, the director can arrange and rearrange the scenes, timing, characters and dialogue. With subsequent revisions, the movie is increasingly locked as it is, then performed with final voice talent, animated with full fidelity and rendered with realistic lighting. We foresee the early production steps for live-action movies using a similar approach and RTEs. The show, potentially with interim actors, can be designed, animated, correctly lit and edited and have accurate camera angles set in the pre-photography stage and potentially before it is greenlit. The quality of the title can iterate in this pre-photo stage and assist a production in making complex decisions and spotting potential issues before they reach a critical state and move to principal photography.
Principal Photography
We do not just foresee RTE tools being used in pre-photography; we can also expect to see those tools being used during production as content is being shot with traditional cameras. By using RTE tools combined with XR technologies, directors will be able to see photorealistic versions of digital characters or objects interacting on-set with physical actors and objects. By looking through the camera lens or via head-mounted displays (HMDs), the cast and crew will no longer see green screens or stand-in representations of digital characters. Two physical actors could act opposite each other despite being thousands of miles apart. A director and cinematographer would be able to make lighting and camera decisions with absolute confidence, knowing how the final output of the scene will look with both digital and physical elements blended seamlessly together.
Postproduction
The process of postproduction and VFX in 2030 could be shorter than it is today because the real-time engine will take away the pain of waiting for lengthy and expensive offline render farms to finish before artists can see with confidence the results of their work. The production could also deliver final plates to the VFX vendors with digital objects or characters already composited in, mitigating much of the work required in current postproduction processes. For some smaller productions, these RTE-rendered scenes may be fine to use for final rendered pixels. These time and cost savings could be used to reallocate budget elsewhere, perhaps forward to pre-photography, or to allow more time for iteration and improve the final product.
Distribution
We can also envision scenarios in which the rendering step occurs on the consumer’s device or, just before that, at the edge of the internet during distribution (using perhaps a cluster of GPUs in local neighborhoods with burstable capacity to handle shared graphics compute in a contended way, in much the same way that bandwidth is contended among neighbors in a local zone now). This enables a world of dynamic media that adapts to the consumer’s unique playback environment. If consumers are engaging in more immersive entertainment experiences (like video games), then it is possible that the finished form of the media is not a single piece of narrative video, but an entire CGI environment that contains the storylines, objects and characters the director wants to use. That experience could change and react to the audience and their viewing device (for example, to match the native display resolution, color gamut, and frame rate) and not be fixed every time it is consumed – much like a video game today.
Because all the required assets will already be securely in the cloud (Principles 1 & 7), linked to each other in asset packages (Principle 8) and designed with a new policy approach to publishing (Principle 3), we can then confidently unlock the power of edge compute to deliver these new experiences.
EXTENDED INSIGHTS: Exclusively Online
See Principle 10 in Action in the MovieLabs Showcase
Mathematic Accelerates Productions, Reduces Costs and Goes Green with Hammerspace
…the adage “we can fix it in post” may change to “we can fix it before we shoot.”
IMPLICATIONS
These examples illustrate how the adage “we can fix it in post” may change to “we can fix it before we shoot.” We can also expect changes in the structure and scheduling of major productions, with perhaps less time being devoted to postproduction; instead, those people, budgets and time will be shifted forward to much more robust and fully formed visualizations in pre-photography.
A new open standard real-time engine package would need to be developed that could deliver the range of digital assets to the consumer to be rendered in real time as they watch the experience.
The impact of the RTE will be considerable and broad, but we can see the need for standardization of some core components of the rendering, translation and packaging of digital assets, and that will require broad industry collaboration across tools, vendors, GPU providers and creatives.




