Sessions is temporarily moving to YouTube, check out all our new videos here.

GraphQL at Facebook

Adam D.I. Kramer speaking at GraphQL London in March, 2017
212Views
 
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

A sneak preview of GraphQL Server implementation at Facebook using Hack. We'll be reviewing internal GraphQL Server infrastructure, and giving a sneak preview into its latest iteration, powering enough queries per day to choke all the sheep in Wales and New Zealand combined!


Transcript


Thanks. I strained my voice this past weekend, so I promise you the passion is there, it's just gonna sound rather understated. So, this is about Facebook's GraphQL server written in Hack, which is a programming language. I know that building a server takes a lot of time, we did that, we spent time and effort into it. So I just wanted to sort of bring some of that back to the community, for those of you building servers. And then later, and so for those building servers, I'm gonna talk about some of the ways that we've made things work at scale. And then in the second part, I'm gonna talk about how we're building our new schema definition system in the programming language Hack going forward. And so, for the server developers out there I'm actually interested in your feedback, thoughts on whether this seems like a good approach, useful approach, that sort of stuff. Cuz we're sort of in the middle of it right now. So first, Hack, what is Hack? Hack is the sort of thing that... The word has many meanings, we named our programming language this because we have what we consider a hacker culture at Facebook. But really what hack is, is a static typing system for PHP. So a lot of folks have maligned Facebook for being written in PHP. It's like this old backwards language from the 90s or whatever. PHP is a scripting language that lets you write code really fast, that lets things work. It's very effective for a number of reasons, that's why people used it so much. But we built a static typing system on top of it, so that it's fully type safe. That typing system is called Hack and that's open source, you can find that on the internet. And so, and there's an example here, like in this case where you're adding two to i, and then catenating it with strings, whatever. Like, who knows what i is when you come in, what is it when you return it? Hack is a static typing system that lets us have a lot better guarantees of what that is. And then HHVM is a VM runtime that does Just in Time compiling for PHP, so when we actually run Facebook, it's run on PHP, without all that static typing stuff. So it's like we take these two things, put them together, and our code has far fewer bugs, believe me. So, part of this talk, and since we've developed Hack over the course of the last several years. Part of this talk is about building a type safe system in PHP, to serve GraphQL queries. Okay, so part one is Queries at Scale. We have an interesting model for serving queries. This is it. And that's the talk. Okay, I'm gonna break this down into about six different parts, so you gotta transport your query over to the server, you gotta parse and lex the query, you have to validate the query to make sure that it makes sense to even answer it, you need to defragment, which is to say like, get all those fragments understood well so that you can put the right information into the response, where it belongs. Then you've gotta actually get that information to put into the response, and then you have to submit the response. So, for transport, so here's an example. This is not actually the newsfeed query, the newsfeed query is a little bit more complicated. But so for transport, the number one thing we can recommend for transporting queries, especially over mobile devices, is to not do it. The notion of what we call the persisted query means that we take every query that we're going to run on our mobile devices and we, you know it's written, and then we save it on our server. At that point there's an ID number that represents the query, and that's all that needs to get transported back and forth. So this can actually save you a lot of time. There's also some additional benefits to persisting queries that I'll get to later in the talk, like this notion that the device doesn't exactly know what the query is going to be, it just knows what response it's gonna have. It can have a lot of benefits also. For parsing, we've actually found that writing our own parser saved us a whole lot of CPU. We thought about doing things like just using deserialization or something. Like get our AST built, like have our objects that represent objects, all that sort of stuff. But it turned out that parsing every query, when we wrote it the way we expected to, ended up saving a decent amount of time compared to standard json parsing, and compared to deserialization having stored those queries, the persisted queries in a database. So I should say, first of all, if you're gonna do parsing, you should write your own parser. However, I would also recommend that you not parse, if you can avoid it. So, one of the cool things about hack is that Hack is a type safe system, so we can define what are called shapes. So here we've got a fragment, which has a lot of information necessary to represent a fragment on a server. What we do is we represent this fragment as a shape, which is a, it's basically a magic array, but it's not magic because you've defined what the magic actually is. So when you define a shape like this, it's got a name, it's got a type. But then, in the run time in HHVM, it's really just an array. And what's the cool thing about array is, in PHP at runtime, is that you can cash them. So when I say avoid parsing, what I'm actually talking about us doing is that we take a fully parsed query and then we just cash the query itself. So we can just access that query whenever we need to, we don't need to parse it. So, if you have enough memory to save, all of you queries, by all means do so, but we do a... We take an approach of just cashing the top several queries, it saves us a lot of CPU for parsing. Another thing we do is... sorry about the voice guys this is really, it's hard to talk. Another thing that we do is we parse fragments on demand. So if we go back to the... Can I go back? I'm not... So if we go back to this here, we only parse the initial query, and then when we end up at the defragmentation step, where we're taking apart the fragments, figuring out what goes in them, we actually only parse the fragment to figure out what it's about, if someone actually calls it. So if you've got your parser system going on there, you don't need to parse what you don't need, which is to say that you can be as lazy as you'd like to be. And then, once we've got these shapes that are representing objects of various sorts, they're really just data objects, and so to actually have functioning classes that you wanna use, classes that you wanna manipulate in order to actually execute queries, hoist those shapes up into objects that do things on demand. So basically, what I'm suggesting is be as lazy as possible. We found that that is very helpful for us. So, the next part, and I'm moving a little faster cuz there's the new definition system that I wanna talk about at the end. But if I end up unclear, you can interrupt me and be like, what the hell are you talking about? That's definitely okay. So validation is the question of whether this query can work. Validation's really important because if you have different engineers writing queries, trying to build their mobile app, trying to get something done, then, if their query doesn't quite work right, you need to tell them why. So you need to be able to provide a useful error message to engineers who are actually running a query. But you don't actually need to provide those error messages necessarily to the client, so once again, I'm gonna recommend that we don't validate. What I'm actually recommending though, is that there are times to validate and there are times not to validate. So based on what I was saying earlier, we validate our queries before we persist them, so we've got the query, we've written the query, we've tested the query, lots of error messaging there. It's all internal, it's just some people working on what they're doing. Then we know the query is valid, we know the query works, persist the query at that point, and then just, you know, stop validating it every time you load it. This also gives us an opportunity to normalise the query, so if you've got a query, you can take any query, and deconstruct it, reconstruct it on the backend, so long as it's gonna do eventually the same thing. And that normalisation step allows us to, to hash the query to make sure we don't persist the same query more than once. Right, cuz persisting the same query again and again and again would just waste a whole lot of space. And then what we do at run time, is if there's something invalid we're just like, well we don't know. So, part of the GraphQL spec allows you to distinguish between null meaning the answer is null, and null meaning there is no answer, and if you think about it you can really sort of unroll that like conceptually from a programming language standpoint, to say well there's null and then there's nothing, and then there's something, but we don't know what it is, and then there's whatever. But we're intentionally avoiding all of this on our server and we're just saying if something's invalid, then we can't answer, so the answer is null. And the answer of null and not there and error are all sort of mixed together. And then our clients will deal with that, they're like, well I don't have this answer, what am I gonna do? Cuz for the client, whatever the answer is, you know they have to make do with that anyway, they can't look to the server, they can't get that information back. So in this way, we're able to actually avoid the notion of something that's invalid, by just saying like, alright no, your implementation of this specific field is not good, so you don't get anything and then we make do. Defragmentation, I'm not gonna suggest that you not do this, I just wanted to call out defragmentation, which is the process of figuring out from which fragment, what field you need translating it back to the type, translating it from the type to the implementation, and then using the implementation to get the data. This is like, there's a lot of work here, but for a server engineer, this is like 90% of what makes GraphQL an API focused language, instead of a database focused language. So like the defragmentation part and like all of the work of slogging through the muck to figure out what this query is actually trying to be, like what function do I call to actually get the data, is plumbing. And that plumbing is what allows us to actually have a language that makes sense for our client developer to ask questions about a graph, using. So, don't skimp on this. Also I mean society needs plumbers, right. Like without plumbers we wouldn't really have a functioning society. Server engineering, I think, is a lot like plumbing in this way. The other note I just wanted to make here is that as you're doing the defragmentation, you're gonna wanna memo-ize everything, you're gonna wanna say like you know, well I've got these four different pieces of information coming from the same function, we need to put those all together cuz we're returning the same thing and like, six different places in the query. But, as your server serves more and more clients, and here by clients, I mean engineers writing code. As there are more and more people writing code that uses your server, the implementations of the actual fields that they write on their own are gonna end up having side effects, right. So you can say something's read only but it's not really gonna be read only, there's gonna be like writes. But the reads are from the app's perspective and the writes are from the, oh I just want to log something along the way perspective, that's how it starts out. It starts out with logging. I spent five years writing logging APIs before I started working on GraphQL servers so it's my fault, but now I have to pay for that. So, be careful with the memo-ization. Alright, then you gather the data. You actually get the information from whatever your data store is. You can avoid this. Just returning the same ID whenever someone asks for an ID, is actually very fast if you just return... I mean, I'd recommend zero if you're gonna choose a number to just return. But clients, meaning the apps and the engineers who write the apps, get pretty angry if you do this. Yeah, so the team is motivated by passion, but the rewards for our jobs as engineers is based on performance. The way that we're now achieving fast data gathering is via a really cool Codegen system. So we've got our notion of the server, but the server is operating, not just on the definitions that the engineers who are exposing a GraphQL object or defining an object that has certain fields due. But we generate code based on those definitions, which the server then uses. And that generated code... Do I have pictures of the generated code? No. You'll see it later. But the idea here is that it's not only safer, because we can add the Hack type safety annotations to any function that we create and then just pull in the definitions that people hand to us. Which is actually pretty important, cuz as you're all aware Facebook, the company, is older than GraphQL the language. So we have some really really old and ancient code that isn't necessarily super type safe. So by grabbing that code and pasting it into a function, or into a class or a file that has good type safety, actually gets our GraphQL server to be safer than the code that backs it, which I think is pretty good. And it's faster too, because instead of having to deal with a tonne of different objects that you're manipulating, you know, getting the type from, the schema, that sort of stuff, you can just directly look at a file which has a name that you understand because you generated the file, and then just accessing stuff directly. And then HHVM, remember I talked about the virtual machine that runs PHP for us, which is also open sourced, has a notion of an async function. An async function is a function that can get most of its work done, but then eventually it's gonna need to hit the database. Hitting a database is something that you can batch, so to speak. So if I'm asking a lot of questions about a single user, and that user is represented on a single database, I'd wanna make one query to that database. And so HHVM has a really nice system that allows you to just effectively write a function that is supposed to operate asynchronously, so you say you need the result from the function sometime, and then you've got it when you need it. But by building all of this together, we've got a codegenned file with a whole lot of async functions that'll just fetch the data when you need. And then, for submitting your response, it would be faster to not submit a response to the client. I really really believe that you're gonna start laughing at this joke by the end of the talk. But effectively what we're doing here in order to avoid submitting responses is that notion of keeping null and missing the same. Which is to say, since keys take time to ship, anything that we don't need to submit to the client, anything that is null just doesn't get passed through. So if you make a query on a field, and that field is not in the response, we just say straight up, this field is not in the response means that the field is null, so the clients treat it as null. And the reason I say keys take time to ship is that the shape, or the struct that you're sending back to the client is gonna need to have keys encoded into it, so if you can just avoid shipping keys and nulls down the, I wanna say down the wire, but really it's over the air, then go ahead and do that. Okay, so I'm moving along, this is a pretty fast talk. That's basically a lot of the dos and don'ts for our server implementation, things that we've sort of found along the way. And so, I'll take questions about that at the end, after I've also talked about the new way that we're actually defining a schema, which I will do now. So, what is an object type? Like if you've got a GraphQL object on your server, like how are you actually gonna represent the notion of an object? And our old approach was to represent objects using objects, people love to do this sort of thing. And so we created a PHP object named something like, GraphQL object, and had people implement it. So it's like if you want to expose a type through GraphQL, write a file that extends GraphQL object, and then we'll collect all those files together later and call it a schema. The downside was this resulted in tonnes of boilerplate, just to complete, or just to reasonably extend the GraphQL object that we defined. So the thing is, like some types, some objects, are very simple, some are much more complicated, and the more features we build for people to be able to express different things through GraphQL, the more trouble we ended up in because everyone else had more cognitive overhead in order to understand how to do even a simple thing. So to quote Lee, who's gonna talk after me, whenever someone asks for a feature you just say, "We ain't gonna need it." And this effectively would have saved us if we had thought of that earlier. It's also hard to mass manage objects, so if people are defining their own GraphQL objects and there's a tonne of different engineers, and some of them have quit, then figuring out what to do with that object when you're trying to update it, when you're trying to figure out what it's actually supposed to mean, cuz it's named poorly, doesn't have descriptions, that sort of thing. It's really hard with these custom objects. So our new plan is to borrow from the Java annotation system and to say literally any object is a GraphQL object, all you need to do is to say which one. So, do I get a pointer with this clicker thing? There's an antenna, there's no laser. Alright, nope, no pointer. So, in this case, like let's say that there's this notion of a Facebook user, and that's just one PHP object, and it represents a Facebook user. To say that this... Oh, cool, does it work? It's got an on, off button that I bet is the laser. Red button, am I holding it wrong? Does it go this way? Oh, there it is, there it is, I found it. Is it work? There it is, okay. It just doesn't show up on the screen. Awesome. Just yeah, okay. - [Man In Audience] You ain't gonna need it. I ain't gonna need it, thank you. Perfect. So in this implementation, we're actually taking the class of a Facebook user and simply annotating it, saying GraphQL object of user and the description is a user. And then under the Facebook user class, we've got a bunch of methods on that class. If we wanna say that the generate profile picture function is what you call on a user, to get a profile picture, then we just annotate that function as being a GraphQL field called profile picture. And in this case we say, we add an additional annotation to say that profile picture is self descriptive in exactly the way that users are not. And the key here is that this allows us to actually write a script, since remember, we're generating all of this code, this is all codegen code. We write a script that's effectively a build step for a given Facebook class. We just point the script at the class, and using PHP reflection, which is basically like introspection for the programming language itself, you can say, you know, get me this specific class. We look at the class, we look at the attribute, and then we generate the GraphQL code for that specific object, based on the specific class that someone passes in. So that's just like grab at all the information about the class, generate all the information, and create a completely different class that works exactly how we want it to work based on the annotations. So, names and descriptions are clearly one benefit of doing this. It's actually amazing how much engineers prefer to explain why something is self descriptive, as opposed to writing a description. Like if we actually made people write a user every time, they'd get very angry at our team, but if we give them something that says self descriptive, even though it's way more characters, they're much happier to do it. Another note about engineering for engineers, don't use the prefix GQL, it's really hard to type on a qwerty keyboard. Definitely use the extra letters RAPH. Type isomorphisms via reflection is another really good thing. We've got the ability to go from the Facebook user PHP class to the GraphQL type of user because we can reflect on the class and get the user, and then once we've got a GraphQL type, we can go back from the user types class that we generated, and we know the name of that class because we generated it, back to the Facebook user class. So we actually have this full connexion that connects our object types to our PHP types. We also get that via PHP reflection for the return types. So it says awaitable of image, that's cuz this is an async function, I talked about that earlier, and this is all just in the Hack specification. But even for those inputs, if you're sending arguments to you profile picture field, we know what arguments you need so we can translate back and forth, and create additional types and that sort of thing, so that all of it is powered via reflection. We get arguments, we get default values, I just said that, and we get type safety for free from Hack. So this is the thing, is that we know that all of the code that's generating the GraphQL responses from the queries is literally the code that we use to generate the responses to the method calls. So we end up with type safety built in from the very beginning. I'll sort of show you how that works. So in this case, we take this definition, and then we'd create a class called GQL user, only because it lets me display the font bigger on the screen for you, don't need things GQL that extends our object, and then we've just generated a function that returns a user, which is something that we got out of the user function, and then we've got a getFieldNames function, that's gonna tell us what field names are there, so that we can power GraphQL introspection correctly. And we can do the same thing for GraphQL field descriptions, we're just generating these classes directly. And separately, we generate the methods, and the cool thing about the methods is we can write a method that takes a literal Facebook user object in, as well as the arguments, and we've type generated a... Or we've code generated a specific type representing the arguments to the user function, and then we can use that definition to only process the arguments that we need to at run time. So this is where we're gaining all of that extra speed. By running profile picture directly like this, we're effectively being as thin of a layer on top of the PHP code as we possibly can be. For type introspection, we just generate a new function, or a new method for our generated class, that ends with type, that just returns the type. And of course the type is something that we get from the definition as well, because if we're annotating a getProfilePicture method as being the profile picture field, we can look at the return type image and go directly to what GraphQL type is represented by the image. Then in the code generation step, we can just generate that code to have it. So at this point you just grab your image right from there, we know it's an image, we know what it... That's represented in a GraphQL and PHP because we have that isomorphism. And then the result is when we wanna get results for a specific type, when we wanna just figure out what fields we want. It's like six lines of code, and it's extremely simple. If you know how to use... If you know Hack code then this looks very simple to you, but the key here is that PHP has a system that lets you just call strings as methods, and it lets you create classes from strings. So here we have new type name, new type name is basically just saying, we literally know what the type name is based on the GraphQL type cuz we generated that class. And then we can just instantiate one of those classes for free, just right there. And then call a method on it, that we know exists, because we generated all the code to promise that it exists, which is extremely fast. And then we just call that function right there, we don't even need to know what the function is, at run time we can catenate strings in order to get the right thing the first time. This is not type safe, so the hack typeCHECKER, doesn't know what that string is in all possible cases, given that a lot of that information is coming in as part of the query. But it's been... So it's not type checked, but it is type safe, because we used this generation's system to cobble everything together. There's a literal file sitting there, representing a class with full type safety, that does everything we need it to do. And since the run time's not gonna pay attention to any of the type annotations anyway, it's just a PHP executor, we literally don't need to know. So that's the idea. GraphQL interfaces are just as simple, we can take a PHP interface and then we can say it's a GraphQL interface, and then later, when we've annotated Facebook user as a GraphQL object called user, we use PHP reflection again to look at the interfaces, and if any of those interfaces are GraphQL interfaces, we just know that the user type represents the IGQL agent, GraphQL interface, right. So we just pull this directly out of the code, all we need is that one annotation, thanks to the way that PHP gives us information about PHP. Traits are super cool too, cuz in PHP you can just say here's my trait. If you annotate a function in a trait, and you use the function in a class, when we reflect on the class, to get it's methods, if the methods are annotated, we know that they're fields. But in the classes that import the trait, that aren't annotated as being types, you just never look at the annotation, so it just works all the time. And it means whenever someone adds a trait that they're using, they've automatically exposed, through GraphQL, everything that they already said they wanted to expose through GraphQL. And this actually can solve a lot of problems that you'll find if you end up having multiple schemas. I would generally speaking, recommend against having multiple schemas, like really just one schema is all you're gonna need. It's faster if you have zero schemas. So I mean, maybe you're not gonna need it, in which case you've wasted a couple hours tonight. But, if you've got a few different schemas, that are fairly closely related to each other, you're gonna want to end up reusing code. Code reuse I think we're still agreeing that's a good thing. So the trait annotations can matter, or they don't, you just annotate something and say, if you're importing this and exposing it via GraphQL, then you can just slap on the trait. So that's it. Actually, that's actually it. Got one question already. Is all the swag gone, do I hand these out to people? Who's in charge? This guy raised his hand first. - Well, thank you for the talk. It was awesome, thanks for that. And now we have free swag we can share, so, what was the question? I missed it. - It's this guy here. The first person who realised I wasn't joking about the talk being over in only 30 minutes. - So first thing, really cool, I mean how you can just make so many things redundant in first place. Closer, okay, why do you need to use Hack in the first place, if you can just let everything go and you have so many resources in Facebook to kind of... - Yeah, you know, I kind of... There is a time in every Facebook engineer's life when they say to themselves, why? Why, Mark? Why PHP? But it seemed like a good idea at first, and then it seemed like a fine idea until it was too late to just rewrite the site in Python. So Facebook started as a website. I know that now people understand it to be a mobile app, but it started as a website written in PHP, cuz PHP is a web scripting language. And so the website is implemented there, all of our infrastructure is in PHP to make the website work. So the reason that we back our GraphQL queries with PHP is that all the stuff's already there. Like all of our infrastructure, it's connected to the databases, people are used to writing the code. And it means that anyone who's developing for the website, is accidentally developing for the mobile site, all they need is a mobile engineer to slap an annotation onto a type. Yeah, thousands and thousands of PHP developers. I mean like really I'm saying that we started with PHP and then we invested so much into it that the cost of switching to a new language at this point is way too much. Like even though some of us would like to have it written in Python, others want it written in Haskell, some of them want it written in C++. Like everyone has their other second favourite language, and like just getting us to agree on it would take years. Oh, this guy's got a microphone, sorry. - Three part question. You mention writing you own parser. Facebook actually publishes a C parser that they're encouraging people to use. You're not using that internally? And, or do you not encourage people to use it? - So I mean a query comes in, we have to parse the query, and so... - It's like a lift graph-gel parser, it's like a Facebook C parser, or a GraphQL and Graph field C miss. - Yeah, I mean if you're using C. - Well no, you can write extensions for any language, I don't know, it doesn't matter. - Basically reading each thing and then just knowing what something is, and then having it right at that moment, it's already parsed directly into the shape. The shape is what we then cash. Faster, it's just faster. I mean that parser, I'm certain is going to work, and it's going to do a good job, and if your goal is to have something that works, by all means use something that works, right. The Apollo team likes this philosophy, don't build a server right, but we care a lot about performance and so this was a little bit faster, and so we did it this way. - So you mention all the tooling and using a way to kind of patch stuff, but are you guys doing anymore to actually optimise your queries like, you know, considering joining databases and what not, you guys have approached that and you share or? - So, our, like the GraphQL server team specifically, is trying to be as thin of a layer as possible, so that's a great question for like how does Facebook answer a web query, but that's sort of beyond the scope of this talk and my knowledge. - [Man] Do I need to reach the.. - Hey, thanks for the talk. That was awesome. I work at Twitter and we did the other thing where we were like, ah it's ruby and then rewrite it into... If people want to hear about the other side of that then I'll tell you about it. - [Presenter] How'd that go? - I'm a front end engineer, I've got nothing to do with it. So my question was about the distinction between the shape of these objects on the service side and the shape that the client wants them to be. And seeing the GraphQL technology about building clients and I wondered how you deal with the situation where the objects is represented on the server would be inefficient or uninteresting for client engineers, and you might want a different shape using your annotation system? Or if you just don't use it for that, and instead you keep your existing objects? - I mean we have a lot of different ways to get the data from the server to the client, but at the end of the day, how we represent it on the server, it's gonna need to be translated into something that the client can read so once you've got your data you're gonna need to put it into, or you're gonna need to serialise it somehow in order to get it to the client. So I mean, for the server versus the clients, we do a whole lot of code generation so that the clients just have access to an infrastructure that's given to them by the server, so it's actually really easy for them to just use it. Yeah, I'm just gonna keep hitting the codegen hammer, and just say generate more code. We've released a codegen library, it's also open sourced for code generation. - So the example, I'm sorry, I probably phrased the question wrong, the example I would think of is say, let's say that getting someone's friends is a method on the user, but, actually let's say it's not a method on the user, and it's a method on your graph store that you want to represent it on the user object and use your connexions and that stuff, how do you wrangle that? - Great question, great question. The answer is to write a function that does that. I'm sorry, that was rude of me, I'm naturally rude, I'm American. But, like this was actually one of the big things that we fought about, me and the one other guy on the GraphQL server team, who's giving this talk in two days in San Francisco, At a meet-up, there's lots of meet-ups, but it's like well what if we don't want that function? What if that function is the wrong function? And then it's like, well we'll just write another function, cuz we'd need to write another function right? Like if we've got a graph store that has some way to say, you know, get friends, then chances are that class that you have, that's supposed to represent a user, is gonna need to call that function in order to return the friends. Or you'll call some other function, some other place, but if you're calling that other function, in that other place, you may as well just put it into a trait, and then you're done. And this is actually one of the answers to a lot of different questions along the same lines. Which is, well what if you wanna change you class? What if you wanna develop differently for the GraphQL exposed versions than the PHP functions that you're actually using? Create another class, just make another class that's a thin wrapper and you are strictly no worse than us forcing you to write an entire class that extends GraphQL object. - [Man 3] Cool thank you.