Using ScyllaDB for Distribution of Game Assets in Unreal Engine

Joakim Lindqvist23 minutesFebruary 15, 2023

How Epic Games is using ScyllaDB for distribution of large game assets used by Unreal Engine across the world — enabling game developers to more quickly build great games.

Share this

Video Slides

Video Transcript

Welcome to my talk about using ScyllaDB for distribution of Game Assets in Unreal Engine. My name is Joakim Lindqvist. I’m a senior tools programmer at Epic Games.

so I’m going to talk about Unreal Engine today and a real engine is software used to make computer games previous realization in movies TV shows architecture Automotive you know pretty much if there’s anything that you can do like real-time 3D there’s a fairly good chance that Unreal Engine is used for it uh and for a lot of the games that you might play online Playstation PC Xbox a lot of those games do use Unreal Engine uh as their their engine of choice uh an epic outside of the sort of selling Unreal Engine we actually make games ourselves the most known one is probably fortnite and that ships on a bunch of different platforms ships super frequently use and actually uses a lot of the features we add into the engine uh so it sort of acts as a bit of a test bed for us on top of being you know a super large game and so we’ve been using all of this technology that I’m talking about today to ship fortnite and a bunch of other stuff that we’ve been doing but so uh what I’m going to talk about today is uh first of all I’m going to give you a bit of like an introduction into how it’s like building games because I assume most people don’t really know what actually goes into it and then I’m going to talk a bit more about the technology we built called unreal Cloud EDC and how that uses Zilla and then I’m gonna end talking about some of the future stuff we want to do using both and really WC and silver but so let’s start by talking about what it actually goes into building games uh so game says fairly complicated piece of software there’s two things that goes into it the first part is like the typical software part of source code the source code is both used in the game itself in the runtime uh where you know that goes on to the disk where the game is uh and then there’s also a bunch of tools surrounding it like done real editor and a bunch of other things that you’re going to need but outside of the source code and typical software stuff there’s also game assets that are essentially data that you display in the game and these can vary what they actually are they can be like a mesh a 3D model uh it can be a texture which is like a description of the surface of the object it could be sound music it could be like a super specialized kind of asset like a particle system uh so lots of different things and the size of each asset can vary quite a lot as well they can be a couple of kilobytes and they can be multiple gigabytes so varying types of data but we need to sort of have all of these available to us to be able to sort of play the game on top of all of these sort of large assets we actually it actually takes quite a bit of time to build on assets as well so we typically have a fairly large team working on building assets uh so it’s not unusual for a game when it’s close to sort of it’s you know its final stage in full production where to have more than 500 people working on building assets onto the game itself so you know a fairly large team and then to find the kind of people that you need to to build that uh because it’s a fairly specialized skill you tend to end up being sped out all across the world you’ll have a bunch of different offices and a bunch of different places so these assets not only are they large but they also need to be available in a bunch of different places so just to sort of give you an example of what people sort of spend their time on when they’re what they look like what it looks like when they’re building the game this is a screenshot of the unreal editor uh that people work with so you can see I think this is the level editor you can see like a 3DC in there and then to the right you can see some assets like the lemon that’s highlighted and the lemon is actually not one asset is actually multiple assets so it’s like the 3D mesh of the lemon textures probably multiple textures some shaders and there’s a lot of things that goes into it uh so and you can see the 3DC in here there’s just a bunch of stuff in there right so there’s just so many different things that is inside one single uh sort of scene in a game and then you can see you can probably recognize this if you’ve tried to use like a blender or Maya or some 3D tool like that it’s sort of sort of similar to that in how it looks but this is just one view in editor just a bunch of other ones where you can look at you know you can produce your own audio or a bunch of different other things so you know it’s a big software that does a bunch of different things but the data that we see here it’s like the once you download this texture or something like that that’s not in a format where the editor could just present that to you so it needs to do some transformation of that data and that transformation is part of a concept process that we call cooking uh so it varies from asset to asset exactly what kind of uh transformation that needs to happen it could be like it could be a conversion of file formats it could be compression it could be compilation it really varies uh but the the sort of common thing across all assets is that they are cooked or a lot of them are cooked and the the reason for this is like when we share us between each other they’re submitted into Source control right so they will be uh you will you will get them through Source control but it will not be in a in the perfect format that you want to use they will be in some sort of useless format that it’s like the the best quality you would ever want to use in a game and then we need to apply these conversions to them to make them into a runtime compatible format and it can actually take quite a long time to do all of this conversion if we uh if you haven’t cooked anything before and you’re going to sort of cook a game that could take as much as 24 hours to cook the entire game which would just be completely crazy you couldn’t really work with that so uh what we do is apply a caching on top of this called the DDC the the ride data cache and the idea with this the right data cache is just contain a copy of all of the data that you cooked before so that if you go to cook it again you can expect it from the DDC instead but that local DDC uh is not the only caching layer we have we also typically have like a shared cache and that historically used to rely on on-premise network file shares uh so essentially when I cooked an asset I will upload that to the Filer and someone else in my office would be able to download that so they didn’t need to cook it again so that worked quite okay I mean there’s problems with Fighters there they can be a bit slow when lots of people hit them there’s there’s just you need to set them up there’s lots of sort of headaches when it comes to scale up and so on but it worked okay but now sort of we’ve talked about sort of how we’re sort of video distributed a bunch of different offices these are like per office right and while it worked okay we just we’re starting to see like you know you see more offices popping up more quickly contractors spinning up for a couple of weeks work those kinds of things it just it’s there’s just such an overhead in managing these and then of course we saw Trend even before the pandemic with people sort of working more from home being more remote and so on but then with with the pandemic this just became such a much bigger problem where people are working from home and they need infrastructure that can work for them even though they’re not in the same physical building as everyone else so we tried to solve that by using our vpns but that just doesn’t work it’s too slow and there’s just so too much data that goes over this VPN that bottlenext um the VPN proxy and so on so that didn’t really work so what we figured out what we need is we need another layer on our cache in our caching system uh that is you know built for the cloud that is zero replicated that can just feed everyone all of the data that they need on top of these other layers that we have in the past

so we decided to build something called unreal cloud ddce to support exactly this and the idea I mean it’s fairly self-describing what it is right it’s for unreal it’s a DDC that is built to run in the cloud uh and uh so yeah we went that was about a year ago that we sort of rolled this out and it’s been doing quite well so uh this uh this is sort of the our solution to that problem of Distributing all of these assets across across the world but so let’s look at what we actually store in the DDC we store sort of cash records that are in our format called compact binary so that’s a self-describing binary format so uh think of it sort of like a Json file or more actually like a bison file so it’s a binary serialized self-describing format so you can see like an example here to the right where you see like a couple different types of fields the major thing that we add I mean on top of being really compact in terms of serialization is a type called attachments so an attachment means that you have that object but then that’s also like a handle to some other thing that is related to the object logically they’re part of the same thing uh but it’s something that you can fetch if you need it and we the you can see the attachment field there and the hash and we the way we reference these attachments is by content addressing them so that means that we take the hash of the content and use as the identifier of the content this is a fairly common scheme you see in in a bunch of different things get this perhaps a most famous example of it uh but that leads to some really nice characteristics it means that when we download a cache record we can see you know do I need to download this locally or do I already have this content but it also works on the opposite side where someone’s uploading something you know you have two cache records that reference the same texture we can potentially like if the inference the exterior they will have the same attachment hash so we’ll only need to upload it once we can deduplicate it we know we don’t need to replicate it those kinds of things so content addressing can be quite valuable in both like downloads and upload scenarios so that’s like the fundamental object that we store and this uh this sort of cached object itself like the the metadata about it like the the actual Cash record that will go into Silla while all of these other attachments goes into our blob storage

so I’m really cloudy to see it’s an API that we run in kubernetes in a bunch of different places we run about five regions for epic right now uh you can see the the world map there on the right uh sort of highlighted where we have our different regions uh so this is the web API servers they just are the content that people are requesting and as I mentioned like the metadata goes into Silla and then all of the blobs goes into S3 and we actually front that with like a local nvme cache about like seven terabytes so that means that when when you go through fetch the actual payload we’ll fetch that from an mbme cache instead typically but that’s not like large enough to fit our entire payload or our entire data set so right now we store about 50 terabytes on in S3 for this data and and used to be said that that’s lost access tract so we will delete data that hasn’t been accessed for a while so so 50 terabytes that’s the amount of data that we need in about two months for building our games so this is quite a lot of data and and also it’s worth noting that the 50 terabytes there that’s that’s you know for one region so it’s times five in terms of the amount of data that we have to store on the system so it’s quite a lot of data and then you know our region sizes will vary a little bit some of our smaller sizes have like 10 users uh working with them so it’s fairly small but they’re in some sort of some separate Network where it’s sort of hard for them like they will have really bad experience if you try to go to another region uh perfect example is Australia and then we have some other regions like we have quite a lot of people in in North America so we have double regions there and so on and I think like Peak users is about I think it’s even more than 500 things like 700 users in our most used region so yeah quite a lot of users uh sort of separated our all our all around the world really

let’s have a look at what the actual architecture looks like uh you can see the Unreal Engine logo there on the left that’s supposed to be like the editor or some other case where someone needs to get hold of this content and I will talk over HTTP to our web API I’m really cloud DLC and what it will do is it will go to Zillow to fetch the metadata in this case like it’s mostly resolving the the cache key right so like I have I have this key it’s supposed to mapped to this object does it exist or what is the object so that’s that’s the request that we could do against Zilla uh you can see the the sort of thing there on the right that’s supposed to be another region so you can see that the lines between Cilla there so Silla will replicate data between each other yes as it always does and we just rely let it do its work there really and then uh give us back like yeah this is the cash key that you wanted here’s the blob that you stored so what we do is we actually take the the cache key uh the actual object that we store and store inline that into the Sila database if it’s below 64 kilobytes which it typically is so that’s like the high level description of like you know these fields that you saw uh and that that works really well honestly we we get like sub millisecond responses for that most of the time so that’s really really efficient and then that would go back to the web API and it will be able to determine you know did people ask for other attachments should we send those as long as well or do they want to fetch them later if they do want those blobs we’ll go and check the local mme storage and then go to S3 to fetch the data that they need and that could take you know that will depend on the size of the asset and so on how long that actually takes uh one kind of weird thing that we do here is that as I mentioned like the lines between Cilla there are Silla does its own replication but for S3 we do not rely on S3 replication instead we actually rolled that ourselves so we whenever we write anything into the system we will write a journal in Silla where it says like you know I wrote this um this asset and this key now got updated so then people will know that it needs to get replicated so we’ll be following that in all of our other regions and just pulling data across as we need it this uh actually means we do that quite aggressively so it means that we actually replicate data quicker than what S3 would do out of the box with its built-in replication and it also allows us to do like selective replication only replicate certain things we need so it ends up being quite valuable for us um but so yeah it’s it works out quite well uh even though it’s it’s sort of a bit of a funky setup uh but yeah we’re quite happy with it but so why did we decide to use solar uh and that that that answer there’s there’s multiple reasons for us deciding to do that I think the primary argument was Cloud agnosticity when we first built this as like a prototype we use dynamodb on AWS it worked quite okay but when we wanted to sort of be able to provide these two licensees we decided that we wanted something that wasn’t bound to a particular cloud provider it might even be possible for one of our licenses to not even be on any Cloud it might be on completely on premise so having a solution that could just work anywhere like still was quite key to for us honestly but also as I mentioned before this is a fairly performance sensitive workload so it’s important for us to get a response quickly this one still has performance is just so much better than than what we got out of Dynamo and honestly the cost was much better as well so it was just everything just wasn’t its favor really uh initially when we looked at databases a geo-replicated database was um was a requirement for us which was why we started with Dyno and then monticella honestly we could probably like as I mentioned we do the blob replication ourselves so we could probably have built uh you know replication ourselves using that as well but the fact that we don’t need to is just great we get it out of the box it’s faster than anything we do it works really well it’s just perfect and then uh yeah I mean the other potential drawback you could you could imagine would be the eventual consistency right uh but for us because we have to replicate these really large payloads anyway and that just takes time to physically move from one place to another uh we have to build in the eventual consistency in the API anyway so that doesn’t really end up being a problem for us uh I do think by by normal certain standards we run fairly small deployments uh so we run we run in like multi AC Fashions so we have two or three ACS in one region and um so that means two to three notes of silla in in one region and will pick the the node type based on the expected load really so right now we typically run I4 x large on like our small in our smaller regions and 2x large in our larger regions and that those typically have more ACS as well uh so that works really well honestly uh we started out with my three en machines but then we saw the blog post about ifri move to that as soon as we could it’s been really good highly recommend using that if you can on AWS I think the biggest problem we had was uh the fact that if I didn’t exist everywhere in terms of like AWS offering but we’ve recently migrated ever in like all the regions that we use uh to that so yeah it’s been perfect and yeah a number of requests you see we see about 3 000 requests per second per node so it’s not like I mean it’s not insignificant but it’s it’s not like a huge load either uh interestingly enough uh our load is relatively consistent over time and that tends to be because uh a lot of the load is produced by our build farm and the build Farm is pretty much producing new cached assets 24 7 because there’s pretty much always someone somewhere around the world that has changed something that results in a bunch of new data that needs to go into the system right so not a lot of admin flows in the system it’s a fairly High load and even if they’re all like low periods of people submitting things those are usually like there will be things scheduled in the build form to do like bigger work that might use unusual assets or something like that so yeah fairly consistent load for the entire system I’d say we’ve been using the system around epic for a bit over a year we use it for like all of our different projects it’s been working really well so the next big step for us is getting this out to more licensees yes there’s uh there’s definitely future work we can do uh we’ve found that this inlining of blobs into Cilla is actually working really well it gives us you know a blob within a millisecond usually when they’re really small so that just seems like a perfect opportunity to consider putting like our small blocks into Cilla instead of uh putting those into s reasons as threesome particularly fast with small blobs uh there’s a trade over there I mean S3 storage is quite cheap uh the storage that we run Priscilla isn’t really so there’s some sort of trade-off where where it’s unclear when exactly it’s worth doing that but it’s it’s some an area that we definitely want to want to consider especially is that would let us uh you know fetch these small things really quickly which could be really valuable then another big thing we’re looking at uh is working with licensees to get this set up for them and we work quite closely with a lot of Microsoft Studios and they’re obviously running on Azure so one thing that we want to do is uh you know get this we’ve been doing improvements to Unreal child DDC on Azure and they’re currently using Cosmos DB as their uh database there just using like Cassandra API and we’re seeing you know some the performances in the skill we sort of see out of Zilla and the cost isn’t as great either so we’re trying to get the project going where we try running still a DB as well and sort of can compare that it’s quite hard to compare these numbers between different games because they’re just so like they change much depending on what kind of content you’re you have and so on so it’s a bit hard to give like a fair comparison but so that’s why we want to try and get something set up with uh on Usher and then I mean there’s performance improvements we can do both to the API and stuff like that we also got some really good feedback from the seller guys on how we could change some of our tables and so on to sort of yeah just generally improve the performance that that’s definitely something that we’re going to look at and uh yeah I mean this storage we’ve talked about this is like a caching system right now for DDC it’s actually built as like a general storage uh like structured storage system and and that that’s on purpose because we think that there are multiple workouts that we could put into the system that we currently do not so that that’s a future work that we’re going to do is just extending this into way more workloads game assets type stuff for sure but uh but just other things than just the pure cash

all right so before I go I just wanted to mention that Unreal Engine is Source available so if you’re curious about how we’re using Sila or just generally how to make games uh you can sign up on our GitHub ignore our GitHub and you’ll find the information you need how to create an account and so on it’s free and then you can look at Oracle source code figure out what you want to do without build any kind of games you want to so yeah enjoy and uh thank you for listening to me and here’s my contact information if you want to stay in touch see you around

[Applause]

Read More