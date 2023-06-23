FeelPic

By Andrew Kim, Research Associate and Nicholas Grous, Associate Portfolio Manager

Now that 3.2 billion players are spending $180 billion per year, the global video game sector is the largest entertainment market in the world.[1] Historically, developing video games has been difficult and costly, requiring software engineering, graphic design, production, distribution, and marketing.

As a result, the studio model - in which one company employs hundreds of people to produce a single game - has dominated. But what if all video gamers could become developers?

Until recently, video game development had been too costly for user-generated content (UGC). Now, thanks to technology, consumers have become creators in other entertainment media, pointing the way for games. With minimal upfront costs, online platforms in audio (Spotify), text (Twitter), photo (Instagram), and video (YouTube) have enabled anyone with an internet connection and a smartphone to create songs, podcasts, blogposts, photos, and videos.

Roblox’s (RBLX) CTO, Daniel Sturman, recently suggested[2] that generative AI could be an important catalyst in the video game space, because it can learn patterns, structure data, and generate new 3D content much faster and cost effectively than historically has been the case. As a result, the inflection point for gaming could be upon us.

To inform our expectations for gaming, we examined video content production since the birth of film in the late 19th century, as shown below. In the 1930s, scripted TV entered the market and, by 1956, surpassed theatrical releases, as measured by annual minutes.[3]

Similarly, according to our estimates, rapid internet adoption and the debut of Apple’s iPhone early in the 21st century enabled YouTube to scale to more than one billion minutes per year of content by 2011 - in just six years.[4]

By 2022, total minutes of YouTube content uploaded reached ~15 billion, more than 4,000 times the total minutes of scripted theatrical and TV content produced in the same year, according to our estimates.[5] We attribute YouTube’s dominance to meaningful cost declines in video content production that democratized the creative process.

A similar trend seems to be evolving in gaming thanks to the adoption of mobile devices, viable hardware, and game engine platforms like Unity (U) and Unreal Engine, as shown below.

By the end of 2009, the number of mobile games on iOS and Android overtook the number of PC and console titles released ever since Pong debuted 37 years earlier in 1972.[7]

In 2017, Roblox Studio overtook the number of PC, console, and mobile app titles with ~2.5 million virtual experiences on the platform.[8] According to our estimates, it now offers ~470 million experiences, 530 times more than the cumulative number on PC, console, and mobile app games.[9]

Moreover, the cost to produce videos and video games is likely to collapse. According to our estimates, thanks to generative AI, the time to produce UGC should decline at a rate similar to that between professional films and YouTube videos, as shown below.

By shifting the gating factor in video game creation from acquired skills to individual creativity, easy-to-use text-to-2D and text-to-3D models could turbocharge the creator revolution.

Consider the two images below, for example. On the left is a complicated 3D asset dashboard on which creators design their assets today while, on the right, is a single prompt bar that specifies a 3D asset’s textures using natural language.

Over time, we anticipate that game engines, regardless of sophistication, will look more and more like the right over time, deprecating manual tools for AI.

During the past year, the innovation in text-to-3D models has decreased the cost to generate 3D assets at an annual rate of ~99%, as shown below.

In 2021, researchers from UC Berkeley and Google Research published Dream Fields, an AI model that generates 3D models from natural language prompts.[13]

While prior methods used a small amount of 3D training data with text-3D asset pairs, Dream Fields’ neural radiance field (NeRF) inferred multiple viewpoints to reconstruct text-labeled 2D images in 3D space, limiting the need to train on sparse 3D data.[14]

Nine months later, the same team published DreamFusion, a higher-fidelity model that requires no 3D training data to generate 3D assets.[15] After normalizing for asset accuracy and calculating based on Nvidia’s V100 spot price as of June 9, 2023, we estimate that the cost to generate a single 3D asset dropped 94% in just nine months, from $196 per asset with Dream Fields to $12 with DreamFusion.[16]

In December 2022, OpenAI released Point-E, a text-to-3D model that takes text-to-image model output and produces a 3D point cloud.[17] Point-E sacrifices 3D asset fidelity and accuracy for lower latency, generating a 3D asset in one-and-a-half minutes compared to Dream Fields’ 200 hours and DreamFusion’s 12 hours.[18]

After normalizing for asset accuracy, we estimate that the cost of generating a single 3D asset with Point-E is less than five cents at today’s V100 spot price as of June 9, 2023, suggesting more than a 99% improvement in performance from Dream Fields to Point-E.[19]

If advances in text-to-3D models were to continue at the same pace, then the cost of generating a 3D asset would drop to that of the computation necessary for a game within the next year.[20]

Despite impressive cost declines in text-to-3D models, the next wave of reductions in video game development cost and time will require more model flexibility.

While current models can generate various 3D assets with ease, their output remains monolithic, requiring game artists and developers to break apart the output generated into smaller segments and create dynamic game assets.

Existing models also focus on the generation of discrete 3D items, rather than large-scale environments. We believe Epic Games, Roblox, and Unity are well-positioned to break those barriers.

As readily available game engines integrate text-to-3D models, creators and developers should be able to generate and edit their assets on these platforms. The platforms that train models on finished assets should have proprietary data advantages, as shown below, and proprietary data is likely to separate the winners from the losers.

Recent advances in text-to-3D generative AI should be gaming’s next inflection point, merging the roles of user and developer to accelerate adoption dramatically.

