Sessions is temporarily moving to YouTube, check out all our new videos here.

ECMAScript Proxies in Node.js

James Wright speaking at The JS Roundabout in September, 2017
20Views
 
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

The Proxy constructor, first defined in the ECMAScript 2015 standard, allows one to intercept and alter the behaviour of existing JavaScript APIs and other operations. After briefly introducing this capability, James will tell us how it can be used in conjunction with Node.js in a variety of scenarios.


Transcript


I'm sure you've figured out already, I'm James. I'm a contract software developer at a JavaScript consultancy call YLD, which is amazing. And today I'm going to tell you about the. I've titled it ECMAScript Proxies in Node.js but you can just call it JavaScript Proxies really. It's just because it was defined in the ECMAScript 2015 standard. But I'm going to introduce you to this capability and then I'm gonna show you some pragmatic real-world examples in Node.js. So let's begin with the obvious starting point. What is a proxy? Generally speaking, it's a means of intercepting and overriding fundamental operations upon an object such as property lookup and assignment. So if you had a person object, for example, the access of a property could be person dot name or the assignment, so person dot name equals Roy. And potentially also function invocations, as well. And in JavaScript, this is exposed as a global constructor. So it's window.Proxy in the browser and global.Proxy in the Node.js. And this capability, it has actually been in Node.js since version, not point eight, but it wasn't a standard compliant implementation. But now that an API rolled out, but that API has been changed and it's been conformance since Node six. And I've said previously it was first introduced in ECMAScript 2015. So let's take a look at a basic example. So here we declare an object using the object literal syntax, which is a person with a name of Bob. And this person has an age of 50. And below this we declare a constant called proxy and assign to it a new proxy instance. And into this we pass two arguments. The first argument is the object which we wish to virtualize or facade with a proxy, in this case, person. And the second argument is an object. And this has a collection of methods such as get, which is invoked when you access a property, or set when you update a property. So before I point out the defacto terminology, I'll run this example now, and I can show it to you in action. So here's essentially the same code, but I formatted it with four spaces, so that's my preference, deal with it. So I declared the person in the same way. I create a proxy again in the same way. So in the object for the second parameter, that's the get property, so what we do is we will log when we're accessing a particular property and then once we've logged that we'll return the property. So the property's a string as we'll cover. So I'm using the string to look up the property on the virtualized object. To actually experiment with this, at the bottom, we log the. So you interrupt via the proxy because the idea is it's transparent and consequently you don't know that it's a proxy per se, which is the beautiful thing about this pattern, in my opinion. So you access the name and age though the proxy as if it were a person itself. Then we're going to update these and then we're gonna update, we're gonna log the update value so. We should see the original values, we should see when we request to access to the properties. We should see a log when we set the properties and then we should see these log statements as well. So, if I run node so it's intro get and set. Beautiful. So, the get method in the handler object has written to standard out twice, so once for the access of the name and secondly for the access of the H. And then through the proxy its falling back to the actual object we're virtualizing. So it says Bob is 50 years old. And then we have the assignment underneath and we log the actual assignment, the property and the handler so we can see when it's been logged. And then because we're printing the updated values again, we're printing that we've accessed these two properties, and finally, we're outputting the new values. And also, I printed the original targets to the original person reference, and you can see it's mutated that. So back to the presentation. So I've given you like a high-level overview of what you can do with a proxy. But there's some terminology associated with it, or rather MDN told me there was some terminology and I think it makes sense, so I'll go with it. So typically with a constructor, when you're creating an instance of something this is just known as instantiation. So because it's a proxy constructor you create an instance with the new keyword. And the first argument is known as the target. And this is the object that the proxy virtualizes. So it's that facade over the object. And them these methods in the second parameter, which is the object get and set, these are referred to as traps. And these are methods that are invoked by search and operations such as getting and setting. And the object that has ownership of these traps is called a handler. So as I've just explained, they're invoked by various operations. It's important to iterate all traps are optional. So if a trap is not defined by the handler, say you have a get but you don't have a set, and you set a property on a proxy, that by default will fall back to the regular operation on the target object. So you don't have to be too explicit and define every operation under the sun. There are plenty of traps, but to keep things moving I'm just gonna briefly introduce you to five and then I'm gonna show you some real-world examples. So the first, we've seen, is get. So this takes a target as the first parameter and the prop is the second parameter. So target is the virtualized object, and prop is the string or the property name being accessed. But this third parameter, we're not gonna cover this today but just to bring it to your attention, this is called the receiver. So it's typically the proxy instance itself, but what you can do with proxies is you can create classes or functional prototypes that inherit from proxies. So whatever it potentially inherits from your proxy instance could be the value of this receiver parameter. And essentially, we have the same with set except the additional third parameter is the value and then you have the receiver's the fourth param. So that's what we've covered already. So on to something new now. Another useful handler, in my opinion, is delete property. So in JavaScript you have a delete keyword which one can you use to remove a property from an object. And there's a trap for it. So this receives simply two properties, two parameters, excuse me. So the first is the target, which is again, the object that's virtualized by the proxy. And the property name is a string to delete. So on to example number two. So this is quite simple. So I've written an incredibly primitive cache. You might want to do this with a map, for example, or take it out of your node layer completely, but let's just pretend this is some legacy code, for example. So I've got a very basic cache map of foo bar and box, really useful values. I'm proxying this cache, and the idea is when you delete something from the cache it will log which property was deleted. So under the hood I used the delete keyword myself to remove it from the surrogate reference but then I log what property is being removed. And according to the spec, you have to return to a billion. I'm just returning true in this case. That indicates the success or failure of removing this property. So I initially log the cache. I delete a property via the proxy and I log the cache again. I mean really I should probably be logging the actual proxy itself, but it should be to the same effect because objects are reference types in JavaScript. So if I Say. Ah. So if I run this, so I've invoked delete. I'll clear this and do it again, because that stuck trace is incredibly annoying and I can't seem to type this evening either. There we go. So we have a property. I used the delete keyword to remove foo. I've got access to that property name as a string in the intercepting delete property method, the trap. And you can see it's updated the underlying reference. A really cool one, in my opinion, is apply. So functions in JavaScript, they're derivative of an object, essentially. So it's not like you can just virtualize objects. You can virtualize functions as well in this ending type. So if target's a function and it's invoked, this apply trap will be called. So target's the function that's virtualized. This argument is the value of this keyword. So this is ultimately dependent upon the invocation context whether you're calling it without association to it's object, for example, or you're invoking it as a method. And then you have the argument. So this is an array of the arguments that are ultimately passed into the function. So the apply. So what we have here is I've just created a proxy for math.round and what this will do is if the args are, if there are no arguments specified or throw an error please provide an argument. Real life I wouldn't recommend adding behaviour to other people's APIs like this because maybe they should do it themselves, but it's just a basic example. So no args, throw an error, otherwise we call the apply method on the function instance and pass to it with this arg, which in this case will be undefined because we're calling it with that context. And we pass the argument already to that. So if I remove this, and I run apply. Throws an error. Please provide an argument. So let's clear that. I go back, put 5.5. So it falls back to the actual method because I've got that reference, that function reference and I'm calling apply upon it with the original argument. And one more before I get on to the juicy real-world examples, is construct. So it's invoked when targets A, constructor function or a class and it's invoked with the new keyword. So three parameters. The first is the target. Not just the object that's virtualized by the proxy, but it's also, it has to be constructable in some way. You have to be able to utilise the new keyword as you invoke it. The arguments array, just like apply. And what you have as the third parameter is called new target, which is quite interesting. So back to the point I had about classes being able to inherit from a proxy instance. This'll either be the proxy itself or it'll be the constructor inheriting from that. So one more basic example, if I go to construct. So I've essentially written this pattern like it is a basically a proxy pattern where you have a car, it's really dumb, it just holds the data and has a stringification method and then I've written a proxy around that constructor. And this trap, when invoked, when the proxy is called with the new keyword, if you don't have the right arguments then it throws a hissy fit again. Otherwise I spread the arguments around to the constructor. So node, intro, construct. So it throws a hissy fit because cars must have at least three wheels. So if I bump this bad boy up, clear this, run it again. It doesn't throw an error, and then you're able to invoke the two string method on the instance and that just proxies three. So now we have a high level understanding of what the API actually is and some of the traps as well, let's look at some examples that might actually benefit us in reality. So I've specified six. We'll see what happens with time, but we're hopefully going to cover debugging set interval, a fancy console, memoisation, schema validation, lazy require, and preventing XSS attacks with proxies. So the first one is debugging set interval. So set interval, it's a great API for scheduling recurring tasks, but the problem is if, for example, you're using node to serve the HTTP server and you have libraries and many dependencies that are creating intervals, then this can clog up the event loop and consequently it hampers your ability to serve as many requests per second as you may possibly want. So imagine we have a library that's just registering loads of timers and it's really annoying us. Let's replace the global set interval method with a proxy to track how many timers have been registered. So what we have here is we have a method that will override the global set interval method. First, so we create a closure. This has an interval count. And we replace that interval with new proxy. With a proxy, excuse me. That facade set interval. And when this is called we take the arguments that are part. So this is a compliant API. So like set interval in reality you have a callback as the first argument. The delay is the second argument, and then you can specify any subsequent arguments that are passed into the callback. So I'm using destructuring here just to make that more readable and assign those indices to constants. So then what we do is we ultimately call the original method. We invoke the callback past into that with the arguments, and we set the same delay. But what we're able to do is because we're intercepting this we're logging that a new interval's been registered and then we're incrementing the interval count and then also we return the timeout which in node is an instance of class timeout. And then we also override clear interval. So this means that when you remove a timer you can then decrement that number as well. So I call the method to replace it. It'll create an interval and I'm just demonstrating the APIs compliant here. So you can pass in subsequent arguments that go into the callback. Just as a simple log. And then I create another interval. And that's so we create the intervals. They have slightly different durations or different repeat times, and then we clear them after a certain number of seconds. So if I run this. Debug set interval. So we've created two intervals. They're actually working as expected. The first one clears, it decrements the count. That one logs and then we clear it again. And then it decrements again. The event loop is then clear and then the programme terminates. So the second example is a fancy console, as I like to call it, which is maybe a bit pretentious, I don't know. So the node implementation of the console API, which abstracts standard out and standard error, output strings to those outputs for parity with the browser. It's not made to be a logger, for example, nor is it made to be that complex, but potentially we want a local development logger or we might want a server logger with more detailed information like times of events, but we can maintain parity with the console API and that provides familiarity for developers. So if I show you the code. So I've abstracted the logic that formats the date. It just users the INTO API. This is on GitHub by the way, so you can have a look afterwards but we won't step into that for now. I have an array of formattable methods so we're gonna format calls to console.log, console.info, console.warn and console.error. And then I have a map of error colours so if one invokes console.warn, it I'll put. So these are just like UNIX escape sequences. You probably use a library like chalk in reality, but just a brief example. Error's red and the default, that's just really a reset back to the operating system standard. So before I show you the format method here we create a proxy to the console global. And I'm using a get trap, because when you ultimately invoke console.log or console.info you're actually requesting the get property from the object, which happens to be a function or console.one, for example. So what we're doing here is, we're looking up the property name and say one's invoking fancy console.log. That's in the formattable methods array. What we do is because it's a function that's then invoked once we've accessed it, I return an anonymous function that receives N number of values and then I call the virtualized console, so the original console reference. The original method, which would be log in this case. And then we invoke the format method which it prepens a date and a hyphen to the values being logged and it does colour output as well. So if I run node fancy console. So we've maintained that same API. So if you have a look at the bottom you can pass it in a number of arguments and it concatenates them with a space, but also we've got some additional information as well. So I mean on dev you might only want the colours or in productions you might only want the times because escape sequences in Jenkins logs are just horrible. But it just shows what you can do while maintaining parity with this API. The next is memoisation, which is the caching of a function's output according to its arguments, usually. So this can boost performance, particularly around heavy computation. I know a lot of people who do this for simple things and it probably doesn't have much benefit. Kind of like the example I'm gonna show you, but for heavier computation it's quite beneficial. And again, it might not surprise you that it's achievable with the proxy API. So let's have a look. So I have this function to memoise. So each for each function I create a new map, which represents the computations. So the key will be a stringification at the parameters. I'm just using primitives for now, such as numbers. And then that will store the result and we'll be able to retrieve that if it exists. So when the function one wants to memoise is called we stringify the arguments. And if the map for this particular function has that computation already, just for visibility and transparency, I log that the memoisation has being found. And then when return it from the cache. Otherwise we assign the result to a result constant, we apply the function with the thisArg, because you might be invoking it any number of ways. And then we store it in the cache. And because you most likely want the return value we then return that back directly. So what I've done, very simple example, we've written a function that just adds numbers, so you pass it whatever number of arguments you want. I'm using this Bresden text to create it as an array. Or to caress it to an array, sorry. And then I'm reducing said array just to add whatever number of algorithms altogether. So what I do is once that's declared I then declare a memoised add, where I invoke memoise and pass add into it. And then I log the output. So you can see I call it once and then the second and third call are invoked with the same arguments. So. So you get the result. So I think the first one I can't remember the parameters. And then the second call works as expected, but then because the third call uses the same arguments as the second one, two, four, six. It's found a cache result for that and will therefore return that without invoking the inner function. Schema validation. So validating objects such as incoming HTTP request bodies or API responses coming from some foreign API can help us to filter malformed payloads and it can also allow us to report errors as well. So we can use a proxy to create an object that will validate itself. So I declare a function that will return a proxy for whichever object you want to pass in. And when a property is requested or a property is updated it will look up the schema. So this is just a simple type of check. Just for demonstration purposes but you probably want something more complex in reality. So if the types don't conform then it'll throw an error for get and set respectively. So in the example I create a simple schema so the name should be a JavaScript string and the age should be a JavaScript number and ultimately I then create a proxy of a person so I have this object literal here with values just to begin with, and then I pass the schema as the second argument. I access those properties without any problem but then I try to set the age to the string. So clear this. Node. Schema validation. So actually I'll just scroll up. So the initial property request works, so Roy is 20 years old. But as soon as I try to set the age to a string it throws this error saying expected the age to be of type number but got string. So if you're working with foreign API data or someone's sending you some payload and you're needing to act upon that it can provide a useful means of validation of said payload. Now the last two examples are a bit hit and miss for me, but I'm gonna throw them out there nonetheless. So the second to last one is a lazy means of achieving lazy require. So loading in common JS modules. So usually you would load all your dependencies in out starter before you needed them and then refer to them subsequently and say request handlers. But, for example, you can have large modules that can dramatically reduce the startup time. So potentially you want to find a middle grounds sometimes between the two. So we can create a proxy that only loads a dependency when a valid property is accessed. So I've just used a json object for now, which no matter how big it is probably wouldn't take a long time to load because you're just accessing an object. It does some deserialization but I don't think it could potentially be an issue. But say you have like a large common JS module where you can store via MPM and it has a hell of a lot of logic in there and it needs to be evaluated before you even hit the exports. I would never try to use a module like that, but it happens sometimes. So it's very naive, you have a result object. You have a flag to determine if it's loaded, so it has some stay. You request a prop. So this is a json file of a person. So when I request person.name if it hasn't loaded the actual object it will log that it's loading it. It will require that file into the result and it will update the state to determine that it has indeed loaded, and then we return the actual property from the loaded object. So I've gone through json loading in this case, but you could potentially adapt it for other means as well. So to actually consume this I create, I declare person and assign to it the return value of lazyLoadObject, which is the proxy. And as soon as I request person.name it won't have the result so it'll go oh, I need to go actually load in this data through the require function, and then we'll log another property to show it's not loading again. So clear, node, lazyRequire. So because we accessed person.name initially it logs by it's loading the person. We log the name because then we have the result and then we've also requested the age which would consequently invoke the get trap again, but because we've already loaded in that result and we have it updated, there's no need to do that loading again. So our proxy handler is aware of that. And one more example I would like to show you is how to prevent XSS attacks. So this is when one can inject malicious content into a website, say via query parameters or if they write to a database and they send like some JavaScript in a script tag, for example. And we can use a proxy to intercept database writes to remove the dangerous content before storing. So this is also achievable with abstraction, which hey, it's been around for a long time, but I just want to keep this theme alive, so I'm just showing you an alternative here. So what we effectively have is I've just written a little HTTP server and it has two roots. It has a get for slash, which will just return the home page content from our fake DB abstraction. And then there's also a post method, sorry, a post handler for update home content. Really naive endpoint, no authentication or anything where it will store into the DB the content that's sent in the body. So if I run this server. Node. Prevent XSS. So yeah. So, okay, a new tab. So at the moment this is loading the page but there's no content in our DB So as a result it just doesn't render anything. So if I open. Oh, where is postman when you need it? There we go. So it's updating content. So if I do the post to local host A T A T update home content. If I decide to put some malicious content into this. Oh, still got some old stuff leftover. So if I do script alert Muhahaha. If I now send this to my endpoint I get a 200 OK. So now if I go to my page and refresh we now have an injection attack. And I mean that's just an alert, so someone could potentially do a window.location.href assignment and take the user to somewhere incredibly dodgy. We don't want this, so what we could potentially do is you can see up at the top I'm creating a pretend DB connection. It's just a memory store, really. And what we can do is unprotectedSet. So if I proxy this function and make this the set method that's invoked in the post handler, what this will do is, we have a very minimum change. It will destructure the arguments list, it'll get the database key. It's more of a data store really, I guess. A data key and a value. It will then use the stripped tags library to escape that HTML and then it will invoke the underlying function, which is the data store set. So now if I restart this server. - [Man] Need to change your proxy target group. - Yes, I do, thank you very much. Ah, not the whole line. This needs to be unprotected set because this is the method I'm virtualizing but yeah, thank you very much. So now, finger crossed, there we go, it's listening. I send the same malicious content. Still get an OK. Refresh it, it stripped the tag so it just outputs the inner text or the text content. So nice try, but you're not gonna get past our impenetrable security, I'm afraid. Okay, so that's all the examples covered. So I just want to cover two gotchas and then we can wrap up. So first of all, trying to detect if an object is a proxy and then publicly accessing the original target that one's virtualizing. So detecting if an object is a proxy, don't. So the idea of proxy is, it's designed to be transparent so as a consumer of an API, or if you imagine the browser, for example, if someone is choosing to virtualize a download, which is a very common scenario for this, then you're not really supposed to be aware it's the proxy. That's what makes more of an implementation detail. It's still conforming to that same API and thus that's all that should matter to you. And if you do try to do an instanceof check. So if you create a new proxy and use the instanceof keyword to determine if it is a proxy, it will throw a type error in the latest V8 because proxies don't have prototypes. So really generally speaking, that's not the point of a proxy anyway. So avoid this. However, certain libraries or APIs such as back to the browser admittedly Web Audio, they require the original target because it will do an instance of check under the hood and for example, in Web Audio I was running a library where I was proxying audio nodes and connecting these audio nodes together to create an audiograph. But because these were proxies, Web Audio was throwing an error and screaming at me that this is not an audio node. So the only work-around I'm aware of is you would have to put in a conditional branch in your get handler where if the property is a particular token then you return the virtualized target. So it does go against the grain of what I said, but sometimes you have no choice, unfortunately. Well, that's all I have to say. Thank you very much for listening. The Git repo and this presentation are available at these links, so if you want to run the examples yourself. And if you have any questions you can ask me now otherwise you can hit me up on social media. Thanks for listening.