Sessions is temporarily moving to YouTube, check out all our new videos here.

Zipping Files Fun - Server Side

Clemens Helm speaking at viennaJS in August, 2017
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

A follow-up talk to Zipping Files Fun - Browsers (April 2017).


Hi. A few of you may think we've had this talk before, I gave a talk in May, it was about zipping files, in the browser, but today I give the sequel of zipping files on the server. So it's number two. My name is Clemens, I'm responsible for the, yeah that's me in real life, without lederhosen, this is just my professional outfit. I'm leading the dev team at ChillBill, was very happy the Pilar is here today, who is working with me, she is responsible for our automatic voice recognition at ChillBill, and I love Javascript technologies, especially Meteor, React, I also like Machine Learning and JSZip, which I'm going to introduce today. JSZip is a library for creating and reading zip files, we have these examples in May where I showed you how to select zip files in the browser and unzip files or at least list the unzipped files, and the other way around. Today I'm gonna show you how to zip files on the server because the other way around is probably not what you need that much in your everyday life and the code I'll show you is kind of the same code that we use in production, more or less. I'm gonna give a little tour about how I think it would work, zipping files on the server, to actually the ideal code that I came up with after some time, and this is actually how I discovered that JSZip is pretty amazing, so to have something else, Codetime, cool. I've got something prepared, just one thing before I start, we're quite mixed here, we've got more beginners, more experienced people, and this is very good, hold on I'm just going to increase the font size a little bit, so if something is unclear in the code, don't hesitate to just interrupt me and ask any question, I would love to explain things if I am not clear enough. So what are we looking at here, I've got some files, some logos that are on S3 but it could be any web server that is publicly available and I want to download them, zip them, and deliver the zip to the person who makes the response in the browser. So I'm just gonna give you a demo how this works, so I'm gonna start my application here. Oh no, that's not it. Let's do it this way. Okay I've got this download link and when I click it there's the zip file and when I click the zip file it will unpack it and there are these four logos in there and this all happened by downloading the images from S3, zipping them and then delivering them to the client. So what does the code actually look like? I've got this route that gets called when I click the link and what I do now is I want to download all of the images so I iterate over these logo file names and I compose the url out of them so I say it's the S3 url plus the file name. I also pass encoding now because this will tell the request package that it's binary content and once this is downloaded, I take the content and i put into my zip file, which I created up here so I say zip dot file pass the file name, and the content. And what this actually does, zip dot file is it returns a promise, so the promise, yeah sure. More. So the promise will resolve as soon as the file got edit to the zip archive, and at the end, all these promises are stored in this zip promises array and at the end I say promise dot all, so I want to wait until all of these promises have resolved, and then I actually generate the zip file because before it was just some kind of Javascript object but here I generate the real file as a type note buffer, there are several types available you can check it out yourselves. And then finally when I've got this content, I send it to the client by saying response end and passing the content. So what is bad about this code? It's not error handling, yeah, but. That's not so important, okay, other than that? Yeah, how would you get around that? Caching, would require that we downloaded it before probably. Pardon? Yeah, posssibly. Yes, that's actually what I was going for. I have to make them brighter next time. Yes the problem is actually when I wrote this code I came from the Ruby world and this is more like the code in the Ruby world works that actually download everything, everything is synchronous, you download everything, maybe you create temp files or you keep it in memory and then you built the zip file as a whole and then you deliver it to the client. But we don't have to do that in note GS because everything is like streams all over and we can take a lot of use for it, so the bad thing actually is we're downloading all the files, we're putting them into memory and we're also putting the zip file into memory so actually we've got double the, at least double the size of the files in memory afterwards. Which is not very much in this example but you can imagine when you've got a web server where you've got a lot of requests maybe you build zip files with 100 mb or larger which we do every few minutes, then it's gonna crash. So one thing we could do immediately is we could put the whole request thing into the zip file because JSZip is so smart that it accepts promises as well. So this doesn't save with a lot of memory but it will save us some code. Now I'm in parentheses hell, hold on. I think, yeah, could work. I'll take it out okay, this application's pretty basic so I need to kill it and restart it every time. Alright, but it's very fast. And it worked, let's open it and see if everything is still there. Yes. Okay, so that's a little optimization. But it's pretty nice because they already get added to the zip file before even the download is finished so probably just optimises it a little bit. But the cool, the really cool thing is that actually if we do it this way that we put a promise in there, we don't have to save promise dot all anymore because we passed the promise to JSZip up here, and this way it will just take care of that all promises are resolved before it sends them, before it generates the zip file in the end. So probably I can also get rid of this. Yeah. Okay. Something worked. If we look in here, yeah they're still there, great. So we already saved a few lines of code and the cool thing is now if you imagine your files finishing at a different time, the promises resolving actually as soon as they are put into the JSZip file the garbage collector can already clean up the download and this memory can be freed again, ideally. So but what we're still doing is we're building the entire zip file on our server, waiting until all promises have resolved, building one large zip file and then we just push it to the client, right, and that's not necessary at all because I don't know, do you know what a zip file looks like internally, kind of, I didn't before I looked into JSZip and had to debug some stuff. But in the end it's just like, there's an entry saying okay this file is starting now, it has this checks 'em and this size, and then there's just the binary content of the file and then the next file starting and stuff like that so metadata, file content, metadata, file content and so on. Yes? I don't, in this example but JSZip also supports it so you could just edit it as a configuration parameter and it would compress. So it's also possible when you have this metadata, file content thing that the file content is compressed of course. As far as I understand it, not, no. That was your question? Hmm. Yeah. Hmm. Yeah I think it uses something like, how's it called, I had it in computer science first semester, adaptive Hoffman code or something like that. Or dynamic Hoffman code. Pardon? Chant? Yeah. Mmhm. Chance. Yeah, I think it's like it starts with, I totally forgot how it works but... Yeah. Yeah. I can look it up. Okay, but what we want to do actually or what I was going for is after every file that gets written to this zip file we can actually already deliver this to the client and free up the memory because it's already streamed to the client, the next file where the promise resolves get put as the next entry and so on. So this is really a scalable solution because this way and this is how we use it in production, you can build zip files of gigabytes, it doesn't matter because everything just gets kind of piped through your server and maybe there are two or three files being present in parallel on your server but that doesn't really matter, it's really really memory efficient this way. So the way we could do it and maybe you've discovered it already down here is that we can use the generate notes stream command with the option stream files to, I'm gonna get to it later on, or let's do it without this option first, I'm gonna explain what it does. So actually what this does is it creates a stream and when a programme is resolved, so when a file finish downloading it's contents get zipified and this code like this metadata and file content gets put into this stream and the stream is piped to the response so actually this already gets delivered to the client, and as soon as the next file resolves, the thing is done again and so on. So let's see how this works. Yeah, you just have to believe me that this really makes a difference because the files are so small you can't notice it here but trust me it's a huge difference when you use it like that in production, it's really fast. Yes? Actually it's created on the server and but you have to imagine the client downloads the zip file, so what I was referring to is you start the download in your browser, you click it and of course the file isn't delivered in one chunk to you but before with the approach we had we acted like with it but in the end still when you click on downloads of course the server will send the file incrementally especially when it's a large file until you have all the contents there. Yes. Exactly. No, no, yeah, maybe I've, I haven't been clear about it. Yeah, sure. Definitely. No this is server code here. When we call this link, hold on I'll show you. This link is very very little here and you can't see it cause it's down there so you have to believe me, I'm gonna show you in the developer tools, this links to zip dash file dot zip, so I could also copy this and put it into the bar up there, okay? And then it starts the download again because when I hit this in the browser, on the server, it recognises okay, you want to execute this function here because this is related to the path zip file dot zip and then this code gets executed to the server and actually just this last line here is responsible for transferring the result back to the client where you receive your download. And right now what we did is we didn't serve the whole file or we didn't build the whole zip file on the server at once and then send it to the client but we already started transferring parts of this zip file to the client when some of the files that we want to zip haven't downloaded yet, so that's the advantage here, we grab files from anywhere on the web, put them together into a zip archive, and yeah, we can already start streaming as soon as we receive the first one. Okay, did we already look at that, if it worked, probably yes, right? Or you just believe me? Yes, sure. I think it kind of bubbles up so when the request for example doesn't resolve or it's rejected then also this gets rejected and then in this case we've got an unhandled rejection error. And yeah, what does happen when the clients receive some parts, it will have the zip file but it won't be able to open it. I can try it. I don't know. I have no idea, honestly. Yes. There's the zip file standard, so when you create files with one tool, it's like when I created here in Javascript with JSZip and afterwards I opened it with my unzipper on a Mac, it knows how to unzip it because there's this zip standard. It's kind of the same protocol, yeah. Mhm. Yeah. Probably that's what's used here. But actually in my use case I don't even compress the files, I just care about having everything in one archive to download. Yes. But it's also possible to turn a compression here. Yeah. Our customers are tech advisors, they have Windows, they download it with their Internet Explorer and then... Yeah, I don't know. It's like, yeah, it's, also in common language people talk about zip files and know what a zip file is so we chose zip files so no one is confused. Okay we wanted to break it and that was most fun so we removed the e here, I don't like it anyway. Let's see what happens. I need to restart it again, ah probably I cheated before because I didn't restart. So let's see, oh no. Okay so here we see what's actually the payload, what we get, yeah. Sure. I can also bend. Uh huh, yeah, it's a 404 error that we get from the request library because the file wasn't found. So what happened in the browser, we would know if I cleaned the downloads before but we're just gonna do it again, okay, we're gonna remove this so when I click it there's a new one and I'm pretty sure it gives me an error when I open it. No, it opens it, and. Yeah but this is broken. Yeah but of course we could've also caught it and presented a nicer error message to the customer. Yes, I think so because I've got little thumbnails which you probably can't see. These work, just this one is broken. Yeah probably you would have to fall back to the other approach then of waiting for the promise to make the request and if it works, put it into the zip file and otherwise not because I think when you say this, what we had initially, I've also got a dinner kit repo, I'm gonna show you afterwards so you can see the original code as well because here with this zip file, file name I think it already kind of, yeah, it already creates an entry for that file or something so that's why we've actually got it in there, and yeah, just the promise didn't give it any content so that's why it failed, something like that. Okay, let's make it work again. Yeah, and now about this stream files true parameter, the cool thing is that, just noticing that it doesn't have any effect here, anyway, it's possible to pass zip dot file read stream, I tried this with the requests, somehow it didn't work, I couldn't read the files afterward but you could do it for example when you have files locally, you could open read streams to all of the files you want to put in your zip and then it would start streaming the files into the zip already while it's reading them from disc or from any source, so it doesn't wait for the files to be complete in memory but just pipes them right through, and the advantage is of course even less memory consumption because yeah, just a current buffer that got loaded needs to be in memory and everything else which gets flushed out can already be deleted but the other thing is I told you that the zip files build like there's some metadata for the file like the size and the check sum and then there's the content, and this isn't possible afterwards anymore, because JSZip doesn't know the size or the check sum of this file in advance if it already starts streaming it, right? So it will just put this metadata afterwords after the file which is not, and then we're at compatibility, which is not really in the standard, so some zip extraction applications will still work with it, or most, but it's not guaranteed. So if you're really aiming for high performance, this is the way to go and if you make it work, unlike me, otherwise just do it like I do it, not like this, like this, yeah. That's it. I think now we've got the same code as down here, and this would already be the end unless you have any questions. Thank you.