How to Avoid Identifier Mayhem: Concepts (Part 2 of 3)

Posted on May 20, 2025

This is Part 2 of the Identifier Blog series. In Part 1, we examined why we care about identifiers in media creation workflows. In this part, we will explore several concepts related to identifiers as well as how the MovieLabs Ontology for Media Creation (OMC) defines and makes use of them. This will help us navigate the final part of this blog series, which focuses on best practices and recommendations.

This blog article is written in the form of Q&A. We discuss the general concepts and terminology and related OMC aspects. Let’s jump right into the Q&A.

What is an identifier in OMC?

  1. An identifier uniquely identifies an entity within a particular scope and typically is a string of characters. OMC specifies the kind of entities that can be given an identifier. A narrative scene, a slate, an asset, a task, and a participant are examples of such entities. Many more are defined in the OMC. OMC also defines when a collection of entities can be given its own identifier.
  2. Identifiers are useful for finding entities or descriptions of the entities and then we can reason or interact with those entities4 as discussed in the “What makes a good identifier?” question below. Identifiers are also useful to differentiate one entity from another entity.
  3. In this article, we use ‘thing’ interchangeably with ‘entity’. Referent is the technical term for this thing – it’s what the identifier refers to.

In the world of computers, more things are consumed, processed, and interacted with and at a much higher scale than in the physical world, making identifiers more important now than ever before.

A thing could have more than one identifier each in a different scope and that is fine if the systems around the thing are designed to handle them. In general, however, we should create a new identifier for a thing only if we cannot use an existing identifier in our workflows.

Conversely, when two or more things are referred to by the same identifier, it leads to ambiguity about the thing. Ambiguity in computer systems and especially in the world of data leads to bottlenecks, confusion and, often, overhead of engaging humans to resolve the ambiguity, which is often a frustrating and time-consuming process.

How do you figure out the boundaries of a thing?

The principle of Appropriate Granularity is useful here. It boils down to this: you should allot an identifier to a thing only if it needs to be distinguished from related things. For example:

  1. An identifier does not have to be associated with an individual file. It could, for example, be associated with a bucket of files. This will work if the process that is consuming the bucket need not distinguish between individual files, like when crunching website logs for the month to perform some analytics. On the other hand, if each file in the bucket is processed individually by tasks, such as when reviewing concept art files, then there is a reason to identify each file separately.
  2. Indeed, the identifier could be at a level below an individual file – for example, it is appropriate to allot an identifier to each narrative scene in a script in order to refer to any scene without ambiguity, or to each video frame within a sequence if we need to isolate each frame.
  3. In some situations, each version of a file may have its own identifier. This is especially true when, say, a downstream workflow needs to know which specific version of an asset it should include within a composition. In workflows in the 2030 Vision we strive for non-destructive workflows: we create a new version of an asset when a change is made so that we can ‘undo’ those changes back to the original source and therefore can have a proliferation of different versions all of which will need their own identifiers.
  4. In digital media, a review aggregator only needs an identifier for “the film,” but for digital distribution of that film you have to differentiate between the North American theatrical release from the German dub of the director’s cut to make sure the distributor and the customer access the right thing.

In general, to associate the identifier at the right level of specificity, you have to know how the thing, or things, are going to be used.

Where are identifiers used in software systems?

Identifiers are used across software systems to consistently track and reference entities in different applications and platforms. For instance, eBay listing numbers help buyers and sellers locate specific items, Amazon’s ASIN codes uniquely identify products, and employee ID numbers allow HR systems to manage personnel records. In media and entertainment, EIDR IDs track audiovisual content, asset management systems use identifiers to track assets and associated metadata, software-defined workflows use identifiers to track tasks, input data, output data, and participants. No matter the industry — e-commerce, enterprise, or media production — identifiers help with referencing, finding, and using information.

What makes a good identifier?

The “goodness” of an identifier depends on the circumstances. For instance, in the case of distributed systems, identifiers may need to be time-sortable to order data created on different machines based on creation time. In the case of the Internet related Transmission Control Protocol (TCP), the sequence numbers cannot be too short; otherwise, the amount of traffic that could be exchanged during a single session would be limited. In the case of memory addresses, longer pointers consume more memory. So, understanding the use case and appropriate identifier constraints is important.

The set of features to consider when designing identifiers therefore depends on the application, but here are commonly considered features of identifiers in most system designs:

  1. Uniqueness: One identifier should be used to refer to just one thing, not two or more things, at least within the scope where the thing is relevant.
  2. Persistence: The thing must be continuously bound to its identifier(s) for the duration of the thing’s relevance. Unbound identifiers lead to ambiguity about things. For example, if the person’s name is their identifier and the person changes their name, the identifier will no longer be relevant. Similarly, if a public key (from public key cryptography) of a participant is used as the participant’s identifier, then that identifier for the participant will not be relevant once the participant updates its public key. Therefore, identifiers should be independent of any properties liable to change over time.
  3. Resolvability: As the point of an identifier is to find the thing, consume it, reason about it, interact with it, etc., we usually need to be able to arrive at or access the thing (or information about the thing) from the identifier. This could be as simple as providing users a list of all things along with their identifiers (e.g., a filesystem with a mechanism to list all files along with their filenames) or as complex as a global resolution system (e.g., DNS to resolve each domain name on the internet to its record). Which mechanism to choose depends on the context (e.g., where are the users with respect to the things? how many things are there?), but some sort of identifier resolution is expected in most use cases.

What are OMC entities?

OMC defines several concepts, their properties, and the relationships between them. Those concepts include Creative Work, Context, Assets, narrative objects, production objects, Participants, Tasks, Infrastructure, and many more. When information is captured using these concepts, the resulting data instances are known as OMC entities. Practically speaking, production participants record information at various stages by employing these defined concepts, which results in several instances of OMC entities. OMC entities are designed to be identified using one or more identifiers, and the rationale for allowing multiple identifiers is discussed later in this article. OMC entities are typically serialized following the rules of OMC-JSON or OMC-RDF.

What are OMC identifiers?

OMC identifiers are not arbitrarily created just to represent data in OMC; rather, they serve as the definitive identifiers for the entities depicted within the production. For example, when an asset is represented in OMC, its OMC identifier is the asset’s own identifier along with its scope (as explained below). Typically, applications and processes generate or assign their own identifiers for the entities they manage. When OMC data is generated or captured, these existing identifiers are incorporated into the OMC representation. If an entity lacks an identifier, OMC requires that one be created. In essence, for OMC data to be valid, every represented entity must have an identifier. You can use the OMC Validator to validate your OMC-JSON data.

An OMC identifier consists of two parts – scope and value – which are discussed next.

What is scope in OMC identifier?

The OMC defines scope as the universe within which the identifier is unique and valid. It serves at least two key purposes.

Ensures Identifier Uniqueness: In media creation, where hundreds or thousands of processes and applications generate identifiers, avoiding collisions (i.e., assignment of the same identifier to different things, incorrectly) is a significant challenge. This challenge can be addressed by first ensuring the identifier values created by a participant are unique and then associating a scope, which itself is unique, to those identifier values. This two-step process of creating unique identifier values within a scope and creating a unique scope will ensure identifiers created by any participant are globally unique. Using a string for scope that is in the participant’s control, such as the domain name controlled by the participant, will ensure the scope is unique. However, we can do better than a domain name…

Simplifies Entity Discovery: A well-defined scope attached to the identifier of the entity at the time of its creation enables data consumers to filter, narrow down, and locate the correct entities within the data based on the scope. That is, scope can serve to separate identifier domains of use; for example, workflow management systems may adopt one set of scopes while editorial workflows use another. This separation not only aids in efficient entity discovery but also helps delineate the context in which each identifier is relevant.

Given the above two uses of scope, for creating the scope, start with the domain name that is in your control. Add other bits such as the project name and application name or ID. Use a dot or a forward slash to add these bits to the domain name. For example, scope could be app.com/project-academic/editorial, which conveys the identifiers are created within the editorial workflow of project academic by an app or a service hosted at app.com. In general, make sure the combination is meaningful to the identifier creator and consumers so entities can be filtered easily. One way to achieve that is to consistently use the same naming scheme when creating scopes.

Scope should be created using only ASCII characters and should avoid whitespace characters, control characters, backslashes, and emojis. Always validate your OMC data against these rules using the OMC Validator to ensure compliance.

What is the value part in OMC identifier?

The value portion of the OMC identifier is a string of characters that uniquely identifies the entity within a particular scope. It is essentially what we commonly refer to as the identifier itself. In practice, an OMC identifier is a composite structure that encapsulates this value part along with scope. The scope, as explained above, qualifies and contextualizes the value part, ensuring that the composite identifier5, aka OMC identifier , is both unique and meaningful.

We strongly encourage using the identifier values natively assigned by applications when creating OMC data, therefore the ontology does not place significant constraints on the character set. But as a matter of best practice, especially to avoid visual ambiguity and to simplify parsing rules, certain constraints are recommended. Characters from the Unicode character set are recommended, but avoid emojis, whitespace characters, control characters, and various kinds of slashes and slash-like characters (although, forward slash is permitted)6.

Again, the OMC Validator validates OMC data against these rules and responds with any errors or warnings. Greenfield applications are recommended to fix identifiers to address both errors and warnings. OMC data can be serialized using JSON and RDF, and UTF-8 is the recommended encoding for the character set.

Why are multiple OMC identifiers supported by OMC for an entity?

By design, every OMC entity must have at least one OMC identifier; however, the ontology permits associating multiple OMC identifiers with a single entity. (Although, multiple identifier values within a given scope for the same entity are not recommended). This flexibility is driven by several key factors:

  1. Decoupling Systems: In media production, data often moves between systems with differing policies. Requiring the use of an upstream identifier can tightly couple these systems, limiting flexibility. Allowing each system to continue to use the identifiers they had created or giving the system the flexibility to generate its own identifiers fosters independence and minimizes cross-boundary dependencies.
  2. Diverse System Requirements: Even within an organization, different systems have unique technical demands for identifier structure. A single identifier may not meet all needs; so, supporting multiple identifiers lets each system use the format that best fits its operational requirements.
  3. Parallel Workflows: In media production, entities are often created concurrently across different departments or workflows, making a single identifier policy impractical. For example, a narrative object might be simultaneously considered by both a pre-visualization department and by a Props department during pre-production. Assigning a single identifier to the narrative object in that scenario would require burdensome central coordination. Allowing multiple identifiers lets each department independently assign an identifier that fits its own context.

Here are a few other examples: consider a physical production prop. It might have one identifier used within the Props department and another used during the production process. Although these identifiers operate within different scopes, they refer to the same physical object. Similarly, a piece of computer graphics geometry might be assigned one identifier by the vendor producing the model and a different identifier once it is accepted and used by the studio. Each identifier serves its own contextual purpose.

Importantly, OMC does not designate any identifier as primary, secondary, or alternate. Instead, the relevance of an identifier depends on the specific needs and perspective of the data producer or consumer, as informed by the scope.

Conclusion

Now that we have established the fundamentals of identifiers and provided the necessary background, we are ready to explore the best practices that asset management systems, workflow designers, and OMC data consumers should adopt to advance toward the 2030 Vision. The final part of this blog series will delve into such practices.

[4] At a conceptual level, identifiers are symbolic representations of a thing. To reason about a thing, we need not have the thing in our hands; we can refer to the thing using its identifier. In much the same way that when we talk about Rick Blaine from Casablanca, we need not be watching the movie at the same time or even have a picture of Rick from the movie to have a discussion. The character name (a symbolic representation) is sufficient to have the discussion. Note, however, that we recommend a more precise form for identifiers than what we use for people names, as you will see later in this blog series.

[5] By ‘identifier’, most of the time, we mean the composite structure of identifier value and identifier scope. If by ‘identifier’ we mean the identifier value of the OMC identifier, we will make it clear.

[6] Identifier values will often be encoded within a URL as they are passed around between participants. Therefore, avoiding binary data and using unreserved characters in the URL spec eliminate the need to perform URL encoding/decoding.

You May Also Like…

MovieLabs at IBC 2025

MovieLabs at IBC 2025

The MovieLabs team will be out in force during the IBC show demonstrating the value of the 2030 Vision through our 2030 Greenlight program and the broad partnerships with industry leading companies who are building products and solutions which align with the Vision.