Is the Cloud Ready to Support Millions of Remote Creative Workers?

Assessing the readiness of Virtual Desktop Infrastructure (VDI) to support creative users for the 2030 vision

The Movielabs 2030 Vision for the future of production technology includes the principle that media will be stored in the cloud and applications will come to the media (not the other way around). That principle anticipates that many creative applications will be rebuilt to be “cloud native” or will require high power virtual machines/workstations (VMs) to run in the cloud where the media will be residing. We expect it will be many years before our most used creative applications will be rearchitected to be “cloud native,” and therefore we focused our attention on assessing the current versions of those applications on cloud-based VMs using Virtual Desktop Infrastructure (VDI) to stream those experiences to users.

2020 readiness assessment

We have looked especially at the work needed to enable the full 2030 Vision using these virtualized machines. Our benchmark is the sort of quality, latency and performance that a user can experience today on a physical machine in a production facility – what is required to replicate that experience from a cloud-based VM?  We have largely been assuming the same quality levels we use today for these experiences. (For example, editing of 4K material is often done using 1080p proxies with editors using 2-3 screens, 1 or 2 for UX and 1 for media playback.)  Of course, the quality and size of files will continue to increase, and work that is currently done at 1080p and 8-bit color will no doubt move to 4K and 8k with 10-bit or greater color precision, but our assessment is based largely on whether we can replicate today’s quality levels with cloud-based infrastructure.

In this post we’ll summarize the key findings from that research and call for the industry to accelerate innovation in some areas that are otherwise inhibiting an industry-wide migration of creative tasks to VDI. VDI is a mature technology and is used by millions of workers globally, but typically not for challenging tasks in the media creation industry that have unique issues to address.

COVID solutions get us only part way there

It’s worth noting that the global pandemic has accelerated the remote performance of creative tasks in a “work from home” way. However, we do not view this temporary situation in the same way as a wholesale movement of all workflow tasks to a cloud-powered infrastructure. The COVID response has been via a series of temporary installations, workarounds and workflow adjustments to accommodate social distancing. While COVID in some ways has prepared creatives for a future where work does not require physical co-location with the media and workstations, the 2030 Vision includes a much bolder version of cloud-based “work from anywhere” (including at a primary place of work). Productions and creatives should be able to tap into the many additional benefits that cloud-based workflows offer. When assets reside entirely in the cloud and do not move between workplaces, any user can work on any machine with security, permissions and authorizations intact. It is that vision, rather than the narrower 2020 COVID scenarios, that informs our cloud readiness assessments.

Creative work profiles

To assist our readiness assessments, we defined several categories of creative user profiles, along with some broad requirements for enabling that work to be performed remotely by VMs and VDI. These categories expand on typical worker use cases and address some unique requirements of our industry:

Creative Worker Type Example Use Cases Max Tolerable Latency Downlink Bandwidth per User
1.     General Production Use Cases (Task or Knowledge Workers) Data entry & management, MAM operations, production accounting 100 – 300 ms 5 – 20 Mbps
2.     Base GPU Creative Workstations General workloads including video and sound editing, compositing, etc., using mostly mouse & keyboard <30 ms for frame accurate control

<250 ms for review

20 – 60 Mbps


 More specialized use cases modify the demands of the base Creative Workstation:
GPU-A Color Accurate 10-bit, for color accurate editing, compositing, review of HDR WCG content, color grading on broadcast monitor < 30 ms for frame accurate control

< 250 ms for review

40 – 60 Mbps
GPU-B Color Critical 12-bit minimum for color grading on projector (DI), final color review and approvals, Color QC < 30 ms for frame accurate control

< 250 ms for review

Expected 60-90 Mbps, when systems can be tested
GPU-C Ultra-Low Latency Workstations Tablet, Pen or Touch interface users such as VFX artists (need pixel perfect rendering, with no softening from compression) < 25 ms stable and sustained 20 – 60 Mbps


Note 1: The estimates are generally based on 1080p streams at 24-30 fps and 8-bit color.

Note 2: Among existing codecs H.264 seems prevalent. To achieve higher color bit depth, we anticipate other codecs such as H.265 will be better suited, which will impact bandwidth (possibly in a positive direction).

The table helps us assess industry readiness to run creative applications in the cloud. Virtual Machines are generally available across clouds for Creative Worker Types 1 and 2 (General Production Use Cases and Base GPU Creative Workstations). However …

We need to be better than today for creative tasks to fully migrate

Our stretch cases in GPU-A, GPU-B and GPU-C are a different story. It is here where we have identified a number of gaps that must be filled to enable migration of full creative workflows to cloud-based virtual workstations. We can’t hope to migrate entire production workflows to the cloud without finding a way to run these creative applications, at mass scale, with similar performance to what an on-premises artist may experience today.

We’ve identified the following gaps the industry needs to close in order to migrate all tasks to cloud-based VMs and support the GPU-A, B and C users:

  1. No standards exist for measuring or comparing the quality of video from streaming VDI systems (which tend to use subjective terms such as “High”, “Medium” and “Low” to describe quality settings). The lack of objective metrics for the various dimensions of video quality (color, resolution, artifacts, frame jitter) make it difficult to compare the performance of solutions or establish where errors arise in a system. If a consumer sees a video glitch in a streaming OTT show, it likely will be ignored. However, a creative making or approving that content will not ignore a glitch, but will need to know if it was caused by a VDI system or is resident in the native content. In that circumstance creatives have little choice but to rewind and replay the content to ascertain (hopefully) where the problem lies. This uncertainty introduces delays in production that likely would not occur in an on-prem studio environment. To provide creatives with measurable certainty and trustin the VDI ecosystem, the industry needs to develop guidelines and agreed quality standards.
  2. VDI systems offer good support for 8-bit content with standard dynamic range in the legacy broadcast format Rec.-709. However, production is now moving to 10-bit or greater color depth, HDR and wider color spaces for Color Accurateuse cases. No direct VDI system currently supports those requirements; although support for these Color Accurate use cases is available with additional software/hardware and native support may be imminent in early 2021. It is worth noting that both the codecs employed by the VDI to stream the VM and the client machine must support 10-bit color. Typical thin clients do not support more than 8-bit color. The goal posts are likely to move again as we envision a more challenging environment coming from …
  3. … truly Color Critical VDI systems. We have defined some expectations for these GPU-B capable systems, but are not expecting systems to be available for several more years. Yet 12-bit depth quality will be required for Digital Intermediate (DI) color grading and color review in order to give a director and colorist the full flexibility available with on-prem systems today.
  4. Another issue with a high-end machine running in the cloud far away from a creative user is the connection between the remote machine and the screen available locally to the creative. While standard mechanisms exist that enable local workstations to communicate display modes and color spaces directly to attached monitors, remote VDI systems need to relay that display signaling to distant monitors over the internet. Currently, we are not aware of standards that support communication over VDI infrastructure between a remote PC and a professional video monitor that would facilitate remote calibration and confirmation of display modes and color spaces.
  5. A related issue is the connection of high-end production peripherals – color grading panels, scopes, audio mixing boards. These devices usually directly connect via USB or ethernet to a local PC with predictable latency and responsiveness. Artists need those I/O devices to be extended with connectivity to the remote VM via VDI platforms with predictable levels of responsiveness. Predictability is key – users can adjust to slight latency between input and response, but unpredictable latency can cause frustration and make it difficult for a user to adapt.
  6. Collaboration via VDI is also highly compromised compared to on-prem scenarios where multiple creatives in a physical room can discuss and collaborate in real time. Virtual machines currently do not have integrated support for simultaneous sharing of content with more than one destination. While Over-The-Shoulder (OTS) solutions exist to allow another user to see the output of a VDI session, those solutions typically require additional hardware or software, complicating the overall setup and introducing the potential for additional issues around security and synchronization. Additional complexity can reduce confidence on the part of creatives that the solution outputs the same view of the same content at the same time. Collaboration will need to improve before 2, 3, 5 or even 20 people will feel truly together when sharing streaming content from a single cloud storage source with overlaid communication tools.
  7. Support for multi-channel audio (5.1 and greater) is limited in VDI system, and there is no support for object-based audio systems. As so many audio tasks rely on surround sound at a minimum, further improvement will be required to ensure that sound editors confidently hear what a consumer will hear, not a mixed-down stereo version over VDI.
  8. Lastly, additional cloud technology will be required to address the dichotomy between a VM needing to be close to the user (to minimize latency) and the 2030 Vision principle that the “application comes to the media”, especially when that media may not be co-located in the same cloud or same cloud region as the VM. If one VDI user is in London and another is in LA, where should the media be stored so that both have a good low latency connection to it? As a compromise the media could reside in NYC, so both users have an equally bad experience, or some sort of local caching could be used to improve the experience of both. Regardless of the solution adopted, pre-positioning of data is a tough challenge that will require additional innovation to address fully. 

A flag in the sand …

We are hopeful that when we repeat this assessment in a few years, cloud VDI systems will be much improved and the areas outlined above will have been addressed. The 2030 Vision describes a world where thousands of workers on a major movie or TV series collaborate entirely online via virtual desktops using cloud-based media and existing media tools. The infrastructure is forming to enable that future, but there is work to be done across cloud infrastructure, video compression, creative applications, and VDI service providers to enable cloud collaboration to deliver the same experience online that creatives enjoy offline today.

#MovieLabs2030, #ML2030Cloud