Sessions is temporarily moving to YouTube, check out all our new videos here.

CSP in JS

Joe Harlow speaking at The JS Roundabout in September, 2017
210Views
 
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

In this talk we'll explore the implementation of CSP (Communicating Sequential Processes) in JavaScript with Async/Await, its use cases and how they leverage it at Paybase.


Transcript


Hello everyone, I'm Joe, I'm Head of Engineering at Paybase, and if you care, there are my details about Twitter and GitHub, don't worry if you don't care, don't really care. Today we are talking about CSP in JS. Now what the eff is CSP? It's a very good question. CSP stands for Communicating Sequential Processes. CSP is described in a book of the same name from 1978. And CSP is a model for orchestrating concurrency. Now, what does that really mean? Okay. So, what the eff is a process, in this term? A process is a piece of software designed to fulfil a specific task and a process can complete a unit of work independently, and processes can be run concurrently, think threads. But, I hear you ask, JS is single-threaded. So how does this work? We can pause execution now with async/await, we used to be able to do it with generators, but I do not like generators. Async/await is a lot nicer and we can fake threads. So, I hope everyone can see, but here we have a simple async function, which is defined as a fake thread, it's an infinite loop because we can pause execution. So if we are awaiting an input at this point, we are pausing execution until we get that input. And then we can fulfil the task by processing the input and omit a result of that output. Now if we have multiple fake threads, this is where CSP fits in, it's a mechanism for communicating between those two fake threads in JS length. CSP is based on the idea of using channels to communicate between these processes. One process may put a message onto a channel that another process takes from. So let's talk about ping pong. This is a standard implementation that lots of people go into when they talk about CSP. Bear with me, okay. Each bat is a process. Each bat's task is to hit and return a ball over a channel. And the ball is a message. So here's a very small snippet of code, snipped out of CSP implementation. Basically this would expose three methods: channel, take, and put. Channel is a factory method, and then there are two functions which take channels and one will take a value to put onto a channel and one would just take the channels take from. So we're going to create two channels here, one for whiffing and one for whaffing. We're going to create a ball factory here, and a bat factory here. And this bat factory actually takes two channels, an inbound and an outbound channel. And again, we're infinitely looping here because we are using the take method to await an inbound value on that channel. So then we incrementally hit the ball, change whiff to whaff, and whaff to whiff, and assume that maybe it takes 500 milliseconds for someone to actually hit a ball in table tennis, probably not right but anyway ... And here we can await the put method onto the outbound channel. Now the funny thing here is that we are awaiting put, which means we are waiting for someone ready to take from that channel. So this will not actually resolve until something is ready to take this message from the channel. And here we create two bats. We create a bat that will whiff whaffs, and a bat that will whaff whiffs, and then we put a ball on the whaff channel, and what we get is an infinite loop of two processes playing ping pong. And we'll see that's a bit useless. It's just infinitely looping. So why? What for? And initially I wasn't so sure. But I like a challenge, and async/await so let's look how it's implemented. Now the funny thing is, that's it. In async/await that is it. We have a channel factory which just returns an object containing three FIFO stacks. So messages, takers, and putters. The put method takes a channel and a message to put on and returns a promise and at this point we essentially stack the message onto the front of the messages' FIFO stack, and we put the putter, the resolution of the putter onto that same stack. And if there is something ready to take a message, we immediately resolve that putter and we pop the message off the stack and put it in the taker. And the take method is just as simple. Takes a channel, returns a promise, puts the resolver on the beginning of the takers channel, and puts, if there is actually something ready, putting something on that channel, we resolve the putter and then resolve the taker with the message. And it really is that simple. It's very small code, 21 lines of code to do this. It's pretty impressive stuff. Now, this implementation has caveats. There's no error handling, there is no back pressure or buffer control, and there's no alts mechanism, which is a mechanism for racing two channels against each other, but they are actually quite trivial to implement. And at this point, when I wrote this, this was back in December 2015, I wasn't sure what to use it for. I played with flux in CSP. It was interesting but it didn't feel right. And then many months later, at Paybase, here's an idea: what if we put interfaces on those channels? An interface with a method that invokes network I/O, a database call or some kind of IPC, anything, anything at all. Let's talk about toilets. Bear with me a second. A bathroom is a channel. Each cubicle is an interface that resides within that channel. Someone waiting to use a cubicle is an independent process, I hope. And that is an implementation of a bathroom in CSP. I know it's a bit small, so we'll go through it. We're defining the number of users, and the number of cubicles. The cubicle factory which is just a simple object that returns three methods, this is the main one, this is the cubicle usage algorithm we're basically saying that it takes between five and 10 seconds for someone to use the toilet, which is okay I guess. We then create the bathroom factory. So in this we are creating a channel of the bathroom, and then we're filling it with cubicles, however many cubicles we define. And then this is the mechanism for using that bathroom. It takes the bathroom for user, we have a user who waits for a cubicle to become available, then uses the cubicle interface by locking the door, using the cubicle, and unlocking the door. Now, what we're doing here is we're not waiting for someone to take that because when you use the toilet, you don't want to have to stand with the door open waiting for someone to turn up to use that toilet. You want to just leave basically. So here is how we're using it. We are basically defining the bathroom with the number of cubicles, we're creating a mad rush for the loo by queuing up 10 people all at once to go to the toilet, and we are then awaiting for them all to finish. So this is what it looks like. You can see multiple people using cubicles. And when they're done, it's done. So, hang on a sec. If we put spawned child processes on the channel, that's a process pool. We're passing processes around on a channel for our fake threads to use. Enter @paybase/pool, a package available on npm for creating pools, simplifying network I/O parallelisation. Now this simple implementation, what it does is it creates a pool of two parallelized request processes, and will essentially allow us to create a simple mutex lock. This is not something that's actually a process, it's just a locking mechanism that locks the availability of this process for one network I/O operation at a time. And this is the handler. We ignore the initial process that's passed in, but we get an input from the run command which is the query. The query is passed off to a fetch, and then when we're done, it is automatically, as soon as we resolve this promise, the handler returns a promise, as soon as we resolve this, this process is ready to be taken by something else. And this is how we run it. So we are creating 20 requests, and we are rating them all at the same time and they will run with a parallelisation of two. And it's quite simple really. It's quite nice, it's quite effective. Child process pooling. Again this one's slightly more complicated. But here we're creating a pool size of 10, limiting the number of spawned processes. We define a create process as returning an actual spawned child process. In this case it's cat, which will just echo anything that you pass into it, and the handler just wraps the process' event emitter resolving when we get our result back. And obviously running this, we actually run 100 requests against a pool of 10, and we await all of those and it does it pretty quickly, so there you go. Now, we use this for various things. We use it for migrating lots of data, for request parallelisation, we have thousands of transactions that we sometimes need to migrate, and it can take upwards of three hours but we don't want to hammer the new system, we don't want to hammer the old system, we want to migrate at x-number of a rolling buffer. And it was just that power. But it also allows us to do things like be pragmatic in our use of languages for certain things like cryptography, we use Go to do cryptography, we run a binary, we spawn up a pool of binaries and we offload the work to those binaries. So it's very powerful, it's very powerful. In closing, CSP is very powerful. It's a very powerful abstraction. It's implementation can be deceptively simple. You saw it's only 21 lines. Yet it has multiple applications within async control flow. I haven't really even scratched the surface of this, there are so many more applications and it's just an exploratory thing. You need to find out where it fits in. But it does fit into a lot of things. And it's all possible natively in Node 8. So we don't Babel anything to be able to do this, we just run it in Node 8. Thanks for coming everyone, I appreciate it. This is my first talk by the way, so hope you all enjoyed it, and yeah, any questions afterwards let me know. I'll be around. Cool.