Sessions is temporarily moving to YouTube, check out all our new videos here.

The Fisher Tree: a Realtime Map Data Structure

Jim Fisher speaking at The Realtime Guild in July, 2017
152Views
 
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

The newly invented Fisher tree is a kind of radix tree (“trie”). Radix trees represent “maps”, and so compete with hash tables in standard libraries. Jim will show the awkwardness of hash tables in a realtime environment, and how radix trees provide much more predictable performance. The key observation behind the Fisher tree is that length-prefixed strings are prefix-free. By grouping keys based on length, the Fisher tree becomes simpler than other radix trees, and gets binary-safe keys for free.


Transcript


So the title I gave for it is The Fisher Tree, my name is Jim Fisher and the bold claim was that this is a new data structure. Since actually writing the talk, I've decided that maybe that was a bit too bold a claim and so I've given it a subtitle which is, like why you should maybe not use hash maps everywhere and just give a bit of thought to it before you do. It's not as catchy. I'm gonna start with a kind of framing question which, since we're at The Realtime Guild, what actually do we mean by realtime and it's a term that we kind of apply to event systems, so systems that have events in them are kind of reactive, so one obvious example is Pusher, the company that works here, which is a message bus. You could also think of games or trading systems, so the events here are sort of messages or say, frames in your game or trades in your trading system. So we can think of these on a timeline. One thing that we think of with realtime systems in they're often high throughput, so you just have lots of events going on. That's not really what we mean by realtime, it's kind of orthogonal to that. It's all about latency. So you have some events that are happening which are causing other events, so here we have like these orange things causing these green things and in this example, this is not a particularly high latency system because these arrows are very long. What we want is something that looks like this. This is your ideal realtime system, lots of events are happening and the thing that's happening after that is happening very quickly and reliably quickly so this is a kind of very smooth progression. What actually tends to happen though in real world systems is something more like this. It's not that sort of buttery smooth look, it's kind of choppy and bunched up, so you have these groups kind of here and here where nothing's really happening and everything's kind of happening here and here. So, for example, you might play a game where you have some frames and then it pauses for a little bit and you get a choppy experience. So you want to avoid that. So what's happening in those gaps, well, there are lots of things. So we did a talk here before where this was garbage collection, so your programme does some work and then, at some point, it needs to collect the garbage that is a result of that and then it has to pause. There are a lot of other things though that can cause these pauses. Another might be filling a buffer so, if you're buffering something to go over the network and then eventually sending it out when it's full, then you're increasing the latency of those messages. In general, this is sort of debt collection in these pauses, so the analogy here is that your programme has some credit that it builds up in these pauses and then it spins it in the time that it sort of does useful work. So we can think of this as sort of like a credit versus time thing and then your undesirable system, your credit is kind of going up and down and up and down and you get that choppy experience. What we want is a realtime system that looks more like this, so you're doing your work, as you go you're paying off your debts as you go. So this talk is one particular case study in thinking about this kind of choppy versus smooth distinction and we're talking about map implementations. So, what is a map? This is where you laugh because it's not one of these, that was a rubbish laugh. That was slightly better. It's not one of these. It's something more like this so, you know what this is. You create a map, you can set things in it to equal other things, you can remove things from your map and you can get stuff out of it. It's in every programme language but the question here is, how are these actually implemented? So you've got your set, you've got remove, you've got get, how do you implement these? There's lots of different ways to implement them. So, you might have the hash map, you might have a red-black tree, you might have an array, which is, if you squint, sort of a map, you have a radix tree. If you go on Wikipedia, the list is extremely long, there are so many variants on map types and what I want to do in this talk is talk about a couple of those, I'm gonna start by talking about hash maps and specifically in the context of real time systems and then I'm going to go on and talk about radix trees, which I want to convince you are sort of nice in some ways, particularly in this context. So what is a hash map? If you've done Computer Science 101, you've probably seen something like this, so the idea is you have some hash function which take the keys of your map and spits out some sort of pseudo-random deterministic number and what you do with that is you interpret it as a bucket in a list of buckets or an array of buckets and using that, you allocate, you put each of your keys in a bucket. So, I should say with these diagrams that these kind of blocks are supposed to be like memory allocations, these are typically like a byte or a pointer size thing and these arrows indicate your pointers. What is the performance of a hash map? Well, we usually think of it as sort of constant time, so constant time get, set, remove, well, in this context I'm expressing it in terms of the length or size of your key k so, normally when we say constant time, that's because we're thinking of your key being fairly small and bounded, potentially it's not. So, we should really think of the case bit of the performance being the length of the key, but this looks pretty good, right? It's good performance. Well, there are some problems with hash maps and particularly there are problems in realtime systems. So the one that you're probably familiar with is hash collisions, it's a necessary consequence of the hash function that you get collisions, two different things mapped to the same bucket, so what do we do with that? One option, just kind of the most common one, I think is to kind of chain them up into a link list, so here we've got lots of things which happened to get put into this bucket three and we put them all in a reg list. It's kind of a problem because you're worst case here is actually just that your hash table becomes a linked list, which has pretty bad performance. Now, you may think, okay, that's a, sort of theoretical possibility, but in practise, that doesn't really happen. Well, it can happen sort of maliciously, so, if someone knows your hash function then they can just feed things into your system which force everything into one bucket, give you terrible performance and potentially kill your system. So, there are ways to deal with that. Well, okay, so, here I've kind of expanded my table so, we've got our average case performance here, which is pretty good, it's kind of just the size of your key but your worst case is really bad. It's the size of everything in your whole map, which could be everything in memory if you're just storing a giant hash table. But thinking about that kind of malicious user who is trying to destroy your performance, one common way to get around that is to kind of randomise your hash function by feeding something into it, so you would keep a random number for your hash table and this example is 42, which I chose randomly and you feed this into your hash function here. The reason that you're doing that is so the attacker doesn't know your hash function anymore. One disadvantage of that is that to make your map now, you need an entropy source in order to generate your random number and you kind of have to hope your attacker doesn't have control of that. The second problem though I want to talk about with hash tables, which is kind of the bigger and yes, slightly less well known one is you have to resize your hash table. So, here is that same hash table after you have inserted a bunch more stuff into it and you can see that everything kind of, each bucket turns into a linked list with bad performance. So, what you really need to do is start expanding your number of buckets to be kind of roughly proportional in some way to the number of keys that you have. So how do you do that? Well, what you really have to do is take your set of buckets here and move everything from each bucket into a larger set of buckets, so we might double the size of these buckets and go for 16 buckets, then you go through everything in every bucket, rehash it and move it into the second table. Now, the problem there in a realtime setting is that that is a very expensive operation. It takes time, proportional to the number of keys in your table. So, this is what the performance of a sort of naive, well not naive, but real world hash map looks like. So Python dictionaries, Ruby hash maps and I'm guessing other common implementations of hash maps, they all have this kind of stop the world, resize the table and then continue, characteristic which gives you this choppy performance where you're doing your resize. So now, your worst case for your hash table is not just the theoretical possibility, it's actually the very common case where you resize your table as you go and also this average case becomes amortised so, it is still sort of constant time but only if you squint and think about the entire run of your programme and not think about the individual operations themselves which can become very expensive. Again, there are kind of ways to fix this problem and there it just kind of becomes even more messy, so, you can resize your hash table gradually, so this is what Redis does, this is what Go's map structure does and the way that that works is you keep your two tables concurrently and gradually, as you do, say, an insert or a remove or a get, you'll do a little bit of work to move stuff over from here to here. It does become pretty hairy though if you look at the code. Like, you've got to decide how many buckets to move each time. You've got to make sure that if there's a lot of things in your bucket, in a particular bucket, that that doesn't take too much time. You've got to think about, another thing which is really kind of painful is you've got other things which are iterating over your map, so iterate as in other places in your programme and you've got to make sure that they're all pointing to the same things while you're moving stuff around and it's, it's not nice. So, what'd I'd like to convince you of is that rather than taking the hash table and then just trying to fix up all of these problems with it, we can just take a completely different approach which, and my preferred way is radix trees. So what is a radix tree? Well, so here is a map, here's it's radix tree representation and the way that it works is, let's take an example, food maps to eight, so the way that that works is, you've got f, o, o, d, points to eight so, you work your way down this tree and at each point you look at the possible next characters and each character gives you a new subtree. If you look at the performance of a radix tree compared to that naive hash map, like the Python or Ruby or other, the performance is a lot more like this ideal smooth, constant, good performance, rather than this choppy thing with potential resizes and long lists in each bucket. So now things are looking kind of a lot better for the radix tree than the hash map, if you're thinking about realtime systems. Here we've got this bad worst case, we've got this amortised average case but for the radix tree, everything is kind of just proportional to the size of the key that you're inserting or removing or getting. In the average case, in the worst cases, they're the same. Right, so, the reason that I think that that is the case and is kind of the property that you should for in realtime data structures is that in the radix tree example there is a single canonical representation of this map. There's really only one way to draw it out whereas in other data structures like the hash table, there are lots of ways to draw it out and when you change the structure, it can potentially change dramatically to the next representation. It's also kind of true of others algorithms like balanced trees, there are lots of different representations and it can change quite a lot from one to the next. Okay, so one thing that I kind of glossed over with radix trees is that the map represented here, all of the keys are so-called prefix free, so none of them, no one key is a prefix of another. Why is that a problem? Well, if you look at this example, so we take this tree here and then try to insert foo into it, well, we kind of go f, o, o and then we want to kind of have a pointer out here to a value but this diagram doesn't really have the ability to represent that, it just has a pointer to more things below it. So, how do we deal with that prefix problem? There are kind of two ways that I know of and a third one which I'm calling The Fisher Tree, we'll get onto it. The first one is, is kind of null byte terminator so if you think of c strings, you have a load of characters and then you have a null byte, which terminates your string. You can do the same thing in your radix tree, so here I'm representing the null byte with a dollar and the way that this works is if you think about foo, well, in the tree that would be represented as f, o, o, dollar so f, o, o, dollar and now we can actually represent that prefix and these ones just continue on here, this ending in a dollar is not a prefix of any of the other ones. A big problem with this representation is that you can't represent like binary data in general, like data that has a null byte in it, which the string's probably alright for things like if you want the keys in your map to be integers or structs or just most other data types. In general they contain null bytes and so you can't really pick any one byte as your terminating character. So there is another approach which is I think the most common approach that I've seen which instead marks each possible point in the tree with a little marker that says does a string end here or not and if so, here's a pointer to the value associated with it. So this is the same tree represented in a different way and each of these blocks here, like the green on grey ones is a marker that says, does a kind of prefix end here? So if we think about foo, again, we've got f, o, o and then we've got this marker here which points out to foo, the value four which is associated with it. This is good for general binary data but it's kind of expensive in that you have to do these checks of the markers saying whether there is a terminating prefix here and you have to have these pointers out to other, out to values even if there is no value out that position. So, here is a kind of case study in the kind of hash table versus radix tree thing which is sort of how I started looking at this. So Redis kind of uses hash tables in a lot of places and goes to a lot of effort to make those hash tables realtime. Recently, it started to replace little bits of Redis with radix trees, in particular as a map in Redis Cluster, which tells you which node each key is on and they experience some performance problems associated with that, kind of the ones that I've talked about and they replaced it with a radix tree to give them a lot more predicable performance. Apparently they're planning to replace a lot of the other hash maps with radix trees in Redis and Redis uses this reprsentation here so you have a marker and a pointer at each possible location. But it is quite fiddly if you look at it, like having to maintain this pointer everywhere and I think that there is a nicer representation to deal with these prefixes which is where The Fisher Tree comes in and I've just given it a name because I haven't really done any extensive research to see if this actually is used anywhere else, it probably is 'cause it's kind of obvious but I'm just gonna put my name on it anyway. The starting point is to just forget about like, variable length strings and just deal with constant length strings. So, here, for example, is a tree which represents strings of length, three bytes, so got bar, foo, loo, and because they're all the same length, it's not possible for one thing to be a prefix of another so you don't have to worry about that in your representation. So you don't need any null terminating characters, you don't need any kind of pointers out in the middle of your tree. Here's another example of the same thing, this is a radix tree with keys of length four bytes and in this example these are no longer strings, these are numbers and so for example, this is a 32 bit zero, so you got zero, zero, zero, zero, zero and here you've got the 32 bit key for this big number here which happens to be one, three, zero, nine and what that does it allows you to represent more than just traditional strings. You can represent any struct or integer and these are pointing out to some values. What's the point of all this? Well, because what I would like to do is take those and arrange them into what I'm calling a forest because there are multiple trees and the construction is that you have at the top this tree which maps integers to pointers to other trees and below them you have trees which have your agile strings in them. So, what we're doing is taking your variable length strings and grouping them by length. So each of these sub-trees contains strings of a particular length and you're grouping them and organising them in another tree at the top, which is keeping track of those lengths. So, for example, food has length four, so this would have zero, zero, zero, four in it with 32 bit integer for four and then this maps out to a tree of length four, which contains food as before. Here's a kind of more detailed representation of that. We've already seen these trees, so you've got zero, zero, zero, four, f, o, o, d. So one reason that this is nice is because in most run times nowadays, we don't actually represent strings in this kind of null terminating c way. We represent them in a way that looks more like this, so you, at the start of your structure, have a thing which says, how long is the string and then you have the string. So, this might be some representation in say, Go or Redis for a string, foo, which at the start has zero, zero, zero, three, the 32 bit integer three, and then three bytes foo and the way that you look this up in this tree is literally just to go through all of this, zero, zero, zero, three, f, o, o, so in a sense, this tree here is just a normal radix tree but it happens to be prefix free because this representation of strings happens to be prefix free. Okay so, we can do some optimizations to this structure to kind of make it more memory efficient and kind of make us do less look-ups, this is not particular to the representation I'm suggesting, it's common to all radix trees. You will have seen that there were kind of lots of allocations for little bytes here and that's kind of bad for look ups so what you can do is compress those into things like this where you've got the whole prefix here in one allocation so zero, zero, zero and then a branch which says where it's going next. So, everything in this tree kind of takes on this shape, which is a bit like a rocket or whatever you're thinking. We can go further down the rabbit hole and think about how to actually represent those in memory because it's one dimensional, it's not that rocket shape, so we represent it as sort of a here is your prefix and here is your branch and here are some pointers out to more nodes. There are lots of ways to represent it, this is one suggested way. Okay, so, that's basically it. If I, hopefully you take one thing away from this talk is just like don't mindlessly use hash maps because they can get quite hairy and ugly, especially in realtime settings and there are other data structures out there, go on Wikipedia if you want hundreds more and here are the reasons you should maybe pick a radix tree over a hash map. You've got the operations which are kind of variable in the hash map and quite hard to make reliably constant. You've got additional requirements for your hash map, so it needs some kind of magic hash function, which you kind of hope works in the way that you expect a hash function to work and it needs an entropy source in order to stop potential attackers from destroying the performance of your system, whereas the radix tree is just kind of constant performance, doesn't require those things and also, The Fisher Tree is a radix tree, which is good. It's not a hash map, so don't use one. That's it, so, any questions? Yep. I don't think that there is because, well, so one thing that, I think the worst thing than an attacker could do is kind of give you a set of strings in such a way as to kind of break up the string into lots of little blocks, yeah, so you could have like, I don't know, one, one, one, one, one, and then break that up into like, one, zero, one, one, zero. The worst case that you're getting there though is kind of the length of your string so if you restrict the length of that string so the height of your tree then, then your worst case performance is still that number of look ups in memory, I guess. Does that make some sense? I've implemented it in Haskell and in Go and I have an implementation in C which is not actually the one that I suggested there. It's an extremely naive one which just has like 256 pointers in each node, which is not ideal. Yeah, working on it. So yeah, the space complexity for the hash table is fairly predictable. You're potentially using the kind of variable amount of space if you're taking this hash training approach and you get lots of collisions but it's fairly predicable overall. With the radix tree, it is more variable and depends on the kind of amount of like structural sharing at the top or start of your strings, I guess, but it's still sort of roughly proportional to the number of characters that you have.