Get started on your path to becoming a ScyllaDB NoSQL database expert.Take a Course
Welcome to my talk about using ScyllaDB for distribution of Game Assets in Unreal Engine. My name is Joakim Lindqvist. I’m a senior tools programmer at Epic Games.
so we decided to build something called unreal cloud ddce to support exactly this and the idea I mean it’s fairly self-describing what it is right it’s for unreal it’s a DDC that is built to run in the cloud uh and uh so yeah we went that was about a year ago that we sort of rolled this out and it’s been doing quite well so uh this uh this is sort of the our solution to that problem of Distributing all of these assets across across the world but so let’s look at what we actually store in the DDC we store sort of cash records that are in our format called compact binary so that’s a self-describing binary format so uh think of it sort of like a Json file or more actually like a bison file so it’s a binary serialized self-describing format so you can see like an example here to the right where you see like a couple different types of fields the major thing that we add I mean on top of being really compact in terms of serialization is a type called attachments so an attachment means that you have that object but then that’s also like a handle to some other thing that is related to the object logically they’re part of the same thing uh but it’s something that you can fetch if you need it and we the you can see the attachment field there and the hash and we the way we reference these attachments is by content addressing them so that means that we take the hash of the content and use as the identifier of the content this is a fairly common scheme you see in in a bunch of different things get this perhaps a most famous example of it uh but that leads to some really nice characteristics it means that when we download a cache record we can see you know do I need to download this locally or do I already have this content but it also works on the opposite side where someone’s uploading something you know you have two cache records that reference the same texture we can potentially like if the inference the exterior they will have the same attachment hash so we’ll only need to upload it once we can deduplicate it we know we don’t need to replicate it those kinds of things so content addressing can be quite valuable in both like downloads and upload scenarios so that’s like the fundamental object that we store and this uh this sort of cached object itself like the the metadata about it like the the actual Cash record that will go into Silla while all of these other attachments goes into our blob storage
so I’m really cloudy to see it’s an API that we run in kubernetes in a bunch of different places we run about five regions for epic right now uh you can see the the world map there on the right uh sort of highlighted where we have our different regions uh so this is the web API servers they just are the content that people are requesting and as I mentioned like the metadata goes into Silla and then all of the blobs goes into S3 and we actually front that with like a local nvme cache about like seven terabytes so that means that when when you go through fetch the actual payload we’ll fetch that from an mbme cache instead typically but that’s not like large enough to fit our entire payload or our entire data set so right now we store about 50 terabytes on in S3 for this data and and used to be said that that’s lost access tract so we will delete data that hasn’t been accessed for a while so so 50 terabytes that’s the amount of data that we need in about two months for building our games so this is quite a lot of data and and also it’s worth noting that the 50 terabytes there that’s that’s you know for one region so it’s times five in terms of the amount of data that we have to store on the system so it’s quite a lot of data and then you know our region sizes will vary a little bit some of our smaller sizes have like 10 users uh working with them so it’s fairly small but they’re in some sort of some separate Network where it’s sort of hard for them like they will have really bad experience if you try to go to another region uh perfect example is Australia and then we have some other regions like we have quite a lot of people in in North America so we have double regions there and so on and I think like Peak users is about I think it’s even more than 500 things like 700 users in our most used region so yeah quite a lot of users uh sort of separated our all our all around the world really
let’s have a look at what the actual architecture looks like uh you can see the Unreal Engine logo there on the left that’s supposed to be like the editor or some other case where someone needs to get hold of this content and I will talk over HTTP to our web API I’m really cloud DLC and what it will do is it will go to Zillow to fetch the metadata in this case like it’s mostly resolving the the cache key right so like I have I have this key it’s supposed to mapped to this object does it exist or what is the object so that’s that’s the request that we could do against Zilla uh you can see the the sort of thing there on the right that’s supposed to be another region so you can see that the lines between Cilla there so Silla will replicate data between each other yes as it always does and we just rely let it do its work there really and then uh give us back like yeah this is the cash key that you wanted here’s the blob that you stored so what we do is we actually take the the cache key uh the actual object that we store and store inline that into the Sila database if it’s below 64 kilobytes which it typically is so that’s like the high level description of like you know these fields that you saw uh and that that works really well honestly we we get like sub millisecond responses for that most of the time so that’s really really efficient and then that would go back to the web API and it will be able to determine you know did people ask for other attachments should we send those as long as well or do they want to fetch them later if they do want those blobs we’ll go and check the local mme storage and then go to S3 to fetch the data that they need and that could take you know that will depend on the size of the asset and so on how long that actually takes uh one kind of weird thing that we do here is that as I mentioned like the lines between Cilla there are Silla does its own replication but for S3 we do not rely on S3 replication instead we actually rolled that ourselves so we whenever we write anything into the system we will write a journal in Silla where it says like you know I wrote this um this asset and this key now got updated so then people will know that it needs to get replicated so we’ll be following that in all of our other regions and just pulling data across as we need it this uh actually means we do that quite aggressively so it means that we actually replicate data quicker than what S3 would do out of the box with its built-in replication and it also allows us to do like selective replication only replicate certain things we need so it ends up being quite valuable for us um but so yeah it’s it works out quite well uh even though it’s it’s sort of a bit of a funky setup uh but yeah we’re quite happy with it but so why did we decide to use solar uh and that that that answer there’s there’s multiple reasons for us deciding to do that I think the primary argument was Cloud agnosticity when we first built this as like a prototype we use dynamodb on AWS it worked quite okay but when we wanted to sort of be able to provide these two licensees we decided that we wanted something that wasn’t bound to a particular cloud provider it might even be possible for one of our licenses to not even be on any Cloud it might be on completely on premise so having a solution that could just work anywhere like still was quite key to for us honestly but also as I mentioned before this is a fairly performance sensitive workload so it’s important for us to get a response quickly this one still has performance is just so much better than than what we got out of Dynamo and honestly the cost was much better as well so it was just everything just wasn’t its favor really uh initially when we looked at databases a geo-replicated database was um was a requirement for us which was why we started with Dyno and then monticella honestly we could probably like as I mentioned we do the blob replication ourselves so we could probably have built uh you know replication ourselves using that as well but the fact that we don’t need to is just great we get it out of the box it’s faster than anything we do it works really well it’s just perfect and then uh yeah I mean the other potential drawback you could you could imagine would be the eventual consistency right uh but for us because we have to replicate these really large payloads anyway and that just takes time to physically move from one place to another uh we have to build in the eventual consistency in the API anyway so that doesn’t really end up being a problem for us uh I do think by by normal certain standards we run fairly small deployments uh so we run we run in like multi AC Fashions so we have two or three ACS in one region and um so that means two to three notes of silla in in one region and will pick the the node type based on the expected load really so right now we typically run I4 x large on like our small in our smaller regions and 2x large in our larger regions and that those typically have more ACS as well uh so that works really well honestly uh we started out with my three en machines but then we saw the blog post about ifri move to that as soon as we could it’s been really good highly recommend using that if you can on AWS I think the biggest problem we had was uh the fact that if I didn’t exist everywhere in terms of like AWS offering but we’ve recently migrated ever in like all the regions that we use uh to that so yeah it’s been perfect and yeah a number of requests you see we see about 3 000 requests per second per node so it’s not like I mean it’s not insignificant but it’s it’s not like a huge load either uh interestingly enough uh our load is relatively consistent over time and that tends to be because uh a lot of the load is produced by our build farm and the build Farm is pretty much producing new cached assets 24 7 because there’s pretty much always someone somewhere around the world that has changed something that results in a bunch of new data that needs to go into the system right so not a lot of admin flows in the system it’s a fairly High load and even if they’re all like low periods of people submitting things those are usually like there will be things scheduled in the build form to do like bigger work that might use unusual assets or something like that so yeah fairly consistent load for the entire system I’d say we’ve been using the system around epic for a bit over a year we use it for like all of our different projects it’s been working really well so the next big step for us is getting this out to more licensees yes there’s uh there’s definitely future work we can do uh we’ve found that this inlining of blobs into Cilla is actually working really well it gives us you know a blob within a millisecond usually when they’re really small so that just seems like a perfect opportunity to consider putting like our small blocks into Cilla instead of uh putting those into s reasons as threesome particularly fast with small blobs uh there’s a trade over there I mean S3 storage is quite cheap uh the storage that we run Priscilla isn’t really so there’s some sort of trade-off where where it’s unclear when exactly it’s worth doing that but it’s it’s some an area that we definitely want to want to consider especially is that would let us uh you know fetch these small things really quickly which could be really valuable then another big thing we’re looking at uh is working with licensees to get this set up for them and we work quite closely with a lot of Microsoft Studios and they’re obviously running on Azure so one thing that we want to do is uh you know get this we’ve been doing improvements to Unreal child DDC on Azure and they’re currently using Cosmos DB as their uh database there just using like Cassandra API and we’re seeing you know some the performances in the skill we sort of see out of Zilla and the cost isn’t as great either so we’re trying to get the project going where we try running still a DB as well and sort of can compare that it’s quite hard to compare these numbers between different games because they’re just so like they change much depending on what kind of content you’re you have and so on so it’s a bit hard to give like a fair comparison but so that’s why we want to try and get something set up with uh on Usher and then I mean there’s performance improvements we can do both to the API and stuff like that we also got some really good feedback from the seller guys on how we could change some of our tables and so on to sort of yeah just generally improve the performance that that’s definitely something that we’re going to look at and uh yeah I mean this storage we’ve talked about this is like a caching system right now for DDC it’s actually built as like a general storage uh like structured storage system and and that that’s on purpose because we think that there are multiple workouts that we could put into the system that we currently do not so that that’s a future work that we’re going to do is just extending this into way more workloads game assets type stuff for sure but uh but just other things than just the pure cash
all right so before I go I just wanted to mention that Unreal Engine is Source available so if you’re curious about how we’re using Sila or just generally how to make games uh you can sign up on our GitHub ignore our GitHub and you’ll find the information you need how to create an account and so on it’s free and then you can look at Oracle source code figure out what you want to do without build any kind of games you want to so yeah enjoy and uh thank you for listening to me and here’s my contact information if you want to stay in touch see you around