Sessions is temporarily moving to YouTube, check out all our new videos here.

Treeshaking in Ember CLI

Alex Navasardyan speaking at EmberFest in October, 2017
105Views
 
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

Treeshaking is a popular topic nowadays. Popular bundlers, like Webpack and Rollup, both support it. But how does it work and what does it actually mean “to shake trees”? We are going to learn about compilers, optimization and experiments with treeshaking in Ember CLI.


Transcript


So I wanted to talk to you about treeshaking in the context of Ember CLI. So it's an interesting topic nowadays, everyone talks about it. Everyone mentions it one way or another. But before that, I kind of wanted to talk about Berlin. I really like the place, it's a great town. And it's known for many things. I was actually googling Berlin, and like sightseeing and all that, and one thing popped out. Not all people know about it, but this is actually a cinema. And they don't market it at all. It's kind of, you have to like climb through a window to see it, and I thought it was really cool. And a bunch of other things, like graffiti's probably my favourite form of art, it's all over Berlin. This is East Side Gallery, well a part of it, so if you guys checked it out, or haven't gotten a chance to check it out, please do, it's really cool. And there are some things that are just pure gold. That's apparently was taken somewhere in Berlin. You know, just a bunch of people hanging out in a bathtub on the street. No big deal. This is like a weird thing, an astronaut just hanging out in Berlin. It's cool, too. And apparently, Berliners, they don't really like hipsters. Well anyway, anyway. So, hi once again, I'm Alex. Better know as twokul on the internet. It's my first time here at EmberFest. First time speaking at EmberFest, so I'm pretty excited. I work for a company called LinkedIn. I'm a software engineer there. Primarily do Ember GS and Ember CLI kind of work. And just on a side note, how many of you use LinkedIn here? Just by the raise of hand. Alright, that's cool. So there are primarily three things I want to talk to you about about treeshaking in general, and what sort of treeshaking is. The origins, and just talk about the term I guess. And make some parallels to the "grown-up" languages like Java or Erlang. So secondly I'm gonna try to tie it into Ember and talk about what treeshaking means in terms of Ember, Ember CLI, and what it was gonna give us, what are the benefits, and what is the path forward there. Thirdly, I wanted to chat about tools and things that could help us better understand our applications and dependencies that our applications have. And fourthly, it's mostly to make a bonus part, I guess. I'm gonna give a couple of demos. Mostly around code splitting and dead code elimination. I also budget some time at the very end for questions, but if you have an urgent question or something, just please raise your hand, and ask your question. Alright, so to dive in. How many of you have heard about treeshaking? Alright, so everyone kind of heard of it. Do you guys know what it is? Like vaguely, read about it, kind of, maybe. So I recently got really interested in a subject and I started Googling, and just playing around with ideas. And how can we bring this thing into Ember CLI, and what it would entail. And first thing, alright you can see it well, alright. So that's one of the first things I found was tweet by mixonic with Matthew Dill, his core team member, saying that he mentions this in the context of smaller builds. Which that makes sense, right? We all want small builds. You ship less things, fewer bytes, rather, to the browser. Which is, you know, we all want that. And you know, I kept googling, and one other thing popped up. This is Axel Rauschmayer, he's asking Rich Harris, who's the creator of Rollup, where the term originated from. And someone mentions that treeshaking is just another way of saying dead code elimination. Okay, so at that point I was really confused, I was like, "Okay, so you guys all like discussing "this treeshaking thing, but there's still no clarity "in what it is and where it came from." And until I found this article by Rich Harris where he explains conceptually what it means to treeshake things within Rollup. So, and it turns out that the term originated in LISP. And that's how I picture LISP developers nowadays. So, and the gist of the idea was how to store it using only the code that you actually need. Alright, and it makes a lot of sense. 'Cause, kind of like again, all over the grown-up languages like JAVA, they all do that, and it's kind not a brand new kind of thing. But the weird part to me was this whole term treeshaking. Why, why? What does it have to do with trees? It makes no sense. But it turns out it does make sense. So, common LISP has this interactive environment which we call REPL, it's much like Rudy, or Elixir, or Nojay, they all have that. What makes common LISP REPL unique is that you can define, redefine, remove, and add functions and variables, and then you know when you're happy with what you created, you can save it. So, here I am just creating a function add numbers, which basically sums up two integers. And then I call it, and I get a result of 4. So, when the image is saved, it starts, it basically, the ways it's saves, it starts with one or several as they call them roots, or entry points. And then from there, it follows all the functions, it follows all the functions, all the dependencies it needs to include. And the process is recursive, so it'll call that, it'll try to traverse dependencies, or those functions, etc, etc, etc. So, until all of the functions are included. So in this particular case I guess the root would be the image, and then one of the functions that we would include would be a plus function, and then add numbers function, alright? And that's, by doing so, you're essentially creating, you're essentially building a graph. And it's a graph without cycles, in a computer sciency sense, so it looks just like a tree. To give you just an example of what it is, there ya go. Like it's a pretty, okay that's weird... Basically the graph traversal looks just like that. Like, it's very straightforward, there's nothing magical about it. Okay, so having said all of that. The first assumption that people make when they talk about treeshaking, is that it's the same thing as dead code elimination. Which it is definitely not. You can think of dead code elimination as taking the finished product and then just imperfectly removing things out of it that you don't need. Which treeshaking, it asks the opposite question. Like, given the thing that I wanna build, what kind of things to I need to include? It just so happens that both of them, they try to solve the same thing. They have the same goal of how to we reduce the code size? And treeshaking, to be quite honest, is not a very good name, it's very confusing. Rich Harris, he mentions this, another possible term which is like live code inclusion, which is kind of confusing as well. But I don't think we just have in JavaScript community right now a good way of framing it, so there's really no good term for now. Another assumption that people make is that treeshaking is a form of optimisation, just like dead code elimination. Which is not true as well. It's kind of like, if you start thinking about it, again, given the thing that I wanna build, what are the things that I need to include? It's more of an analysis, and it just so happens that smaller code size is just a side effect, really. So if it is an analysis, what are the things we can get out of it, right? If we put it simply, it's an exact understanding of what your code looks like, or what your code needs. So, I guess in a way, you could just call it an application AST. And this is where I wanted to make some parallels to VMs, and star stems for a JVM, which is Java Virtual Machine in Erlang, I should mention. These platforms were developed for years, and they deal a lot with code, for example pretty much all of the optimisations in JavaScript VM, they were taken from JVMs, like code inlining, or dead code elimination, or like variables with loop motion and all that. One thing particularly from JVM I think it relevant is that JVM can optimise your code by tailoring to the target architecture. One other thing that I think is relevant from Erlang VM to JavaScript, and to Ember particular, or rather, like I guess, all web apps, is that Erlang VM can do a high code replacement. Meaning that it can replace parts of your code, running code, in production, without actually redeploying the service. 'Cause right now we do this dance when you have to deploy a new version of your back end service, for example, and spin the node up, and then you direct traffic from one node to another, and you have to do it gradually, because you know you don't want any downtime and things like that. So, having said that, why do I think it's relevant? So, for example, tailoring your code to the target architecture. There was a recent blog post by ChromeTeam where they introduce script tag type modules. So this is one of the things we can actually do in Ember. So, we can just ship straight ESX modules to the browser. And it would reduce the code size quite a bit, mostly because you don't need the load, or you need the resolver, it's just there. You don't need any of this define functions we have right now in the code. So we can definitely do that, and this is one of the advantages of having something like treeshaking to tell you that this is what the code looks like, this is what you need to do, this is how you can structure it in a better way. Going back to the Erlang hot swapping example, we can theoretically do that with JavaScript apps right now, given service workers, given HDTP2 push API, we can start invalidating only parts of a code base. So every time we wanna ship a fix, as opposed to invalidating the whole app, redeploying the whole app, you can just push a fix to the module. So it's just a much better environment. To bring it back to Ember, what are the challenges we have right now? So David Baker, he in a learning team, he just recently gave this talk about, I think it was called Ember CLI as a compiler, that's what it was. Which he outlines some of the things I'm gonna talk about really nicely, so I would highly encourage you to go and check out his talk or at least go through slides. So one of the things-- Oh, you can't see it, it says legacy. So we need to support legacy things like Globals, for example, there's just no good way of dealing with Globals, we probably should just move away for it, but we still have to keep them in mind. Another thing is dependencies. Dependencies, to expand on that, there are two types of dependencies, you can think of them as strong and lazy dependencies. So, imagine, I guess to put it simply, it's the dependencies I load when I'm loaded, and it's the dependencies I load when I'm used. So, dependencies that I need to load to execute, to be loaded, right, and then the dependencies I load when I'm used. So you can think of a lazy dependency could be dynamic import, or some of the Ember lead things, like DI, would be an example. There's another thing that I think needs to get really, needs to improve to support treeshaking. So, build time resolver. There was a recent blog post by Chad Heitala about glimmer bundling compiler, and he mentions the ahead of time dumpling compilation. This is something that, it uses build time resolver, because it knows given the module indication structure, it knows where templates are, it can compile them, during build time. Some things it doesn't do right now, for example is figuring the independency injection. So in this particular case, imagine you have a session service, right, and then you have an initializer and what not you imported from somewhere. And it ends up in your apps, it ends up in your vendor. But let's assume you don't use it anywhere. You still have it, you still have the code, you still have the initializer, etc, etc, and the only way to remove it as of now is pretty much put up your app and see if anyone's actually going to require it. Or make it possibly, as I said, a lazy load dependency. When you try to resolve it, it's just gonna say, oh I need to fetch it, so, it's an async kind of thing. So pretty much the same thing goes with initialzers, they kind of go hand in hand, I guess. Another thing to consider would be something in the data, or, just like, it's a default way, but it could be any dealer, let's say this car model is defined elsewhere in a different library. Should we load it, or should we not load it, kind of thing. This core relationship with serializers, normalizers, and all that, and it's not necessarily that you need it right away. And then, this is probably the most exciting part, and probably the hardest problem is the component helper. It's a dynamic resolution, so you don't know what kind of component you're gonna get, it only happened to be one time. This is where you get into really exciting conversations about how do we hint a compiler that this type might have those values, and you start getting the conversation about packages, and things like that, packages in terms of, much like in module unification, in a module unification way, it aligns really well. But this is something that, this is another problem that we need to solve, too, to support treeshaking. In how the inherent knowledge of our code and what kind of stuff we use. Alright, so talking about tools. There are three tools that I kind of want in Ember system right now, and I don't think we have. This is Bundle Buddy. This is something that Google engineers put together, I think. It's quite nice, it shows you, it basically lists a bunch of bundles over here, shows you the size and the distribution, and what kind of, basically the overlap, all the code they used, you see the lines of code and things like that, which is really nice. I think especially in bigger apps, for example LinkedIn, could benefit greatly from it. You can quickly figure out best strategies to share your code, where do you put, how do you structure your shared bundles, kind of thing. The other tool that I think is really cool is this. It doesn't look that sexy, but basically the idea is that if you're starting a big project, it's really hard to ramp up, and this is a tool for Java. But you've basically got a decent outline of what the code looks like. So you should be able to click around and see like what kind of methods are within the class, or even within the package, kind of thing. So, you can think of that, if you're working with a large code base in Ember, that could be very useful too. And overall you can see, you know this is the, what is it, contribution graph on Good Hub? That's the same thing here. And some of the tools that we actually already have, which are quite useful, this is something that Steph Banner put together, Broccoli Comcat Analyzer, basically is a break down with a bunch of files in your app their sizes. So I think treeshaking by itself, if it's a form of analysis, it should come a bunch of tools that could help you figure out what you're looking at. Some sort of, make sense out of data, if you will. This is something that sort of falls into the category of shower thoughts, I guess. And I was thinking through the problem of treeshaking, it kind of just occurred to me. Yeah, shower thoughts. This is the discussion that's been going on, not just in Ember but in the general development community for quite a while, which was be using inheritance or compositions, and Ember is mix-ins versus extend. And there are pros and cons there, but the reason it's important here is, if you overuse inheritance, for example, and that's very evident in Java community, you get this deeply nested structures. And its really, really hard to maintain. And the running in Java community is that the reason that this picture is relevant, is that when you ask for one service, or rather you ask for a banana, you don't just get a banana, you get a monkey holding a banana in the forest. You know, it's Java humour, I guess, I don't know. By just requiring one service, you get this whole, you get ten classes that it inherits from, or maybe like a hundred classes that are inherited. And it's just really bloated. So in that particular case, it kind of makes sense to use composition because you only mix in the behaviour that you want. And the reason I think it's interesting in terms of treeshaking or the way you think about your code is that composition makes a little more sense when you try to figure out what are the things that I need to create a particular bundle, or a particular app. So it's much, much easier to extract functions than traversing sort of inheritance structure. It makes it a really interesting problem in terms of bundling. I'm gonna demo a couple of things as I said, so dead code elimination. So if you're on Ember build, actually let me do this. I don't know if you guys know about this, but let me just quickly show you this. There's a broccoli debug flag now in CLI, so what is does, it's gonna spit out a couple of things for you. So you run it like that, you select Broccoli Debug Flag Assembler. And then you open a debug folder. So, it does a couple of things for you. So the first one is what we see pretty much every time we're on Ember build, this thing is just a DST. But there's another thing which is intermediate, application tree. It has all the add-on trees that you import, so you can pretty much inspect what the dependency list is. But then, you can also see things like this, which is, this is your app, this is your actual app, and all of the things that it imports. So, you can see this is basically one of the data docker initializers, and things like that. We see it's 6 kilobytes, right, and 735 kilobytes. So one of the experiments that I did was to write a very simple treeshaker that would remove the modules from the add-on trees and figure out which ones are unused. It doesn't give you too much, but is 735, 731, which is nice. These are the modules that you don't use within the dependencies, right? So, the things that we talked about, over here you can see it, things that we talked about, about like DI and services and initializers, we're not actually touching, right. So we're not, it's just hard right now to do. It's just hard to having all use cases in mind. It's hard to treeshaking the app tree. But assuming that we could. Run it again. Alright, so we dropped to 587 from 731. And this is just the out of the box app, like the app doesn't actually do much, it has two routes. It might not be that impressive, but I think almost 200 kilobytes is pretty neat, but with especially bigger apps, it will work pretty great. You can figure quickly drop things you need, can figure out what are the things you share, and things like that. So, this is in a nutshell what treeshaking could be. I'm going to do a codes demo now, I think I can do master. Let's see if that works. Oh, I remember, there we go. GoodSplitting actually uses the strategies, or I've seen implementation that I recently put out, so you guys, I highly encourage you to check that out. Alright, that's cool. Think I might be missing dependencies. Oh, yeah, okay, cool. Okay. Alright, alright, alright. That's what happens when you have too many branches. So what I was going to show you guys is pretty much this. Alright so, we just loaded the page, right, and there's only one bundle. You go to races, which is one of the routes, and you've got another bundle loaded. Go to drivers, you got another bundle loaded. Reload drivers, you get drivers and then Emberfest, which is the main. These are some of the things, just some of the things that you can do with treeshaking. Cool. I think with that, I'm going to open to you for questions, if you have any, but that's all I have.