Sessions is temporarily moving to YouTube, check out all our new videos here.

Monitoring Front-End Performance with New Relic

Richard Moore speaking at JS Monthly London in June, 2017
66Views
 
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

A short introduction to monitoring front end performance using New Relic with a high level walk through.


Transcript


- I'm literally going to give you the quickest overview of the New Relic platform now in the interest of time. So basically New Relic, we are more than just a platform that provides the visibility in your applications from a front end perspective. We provide full stack instrumentation. So we monitor the back end of your applications, the health of your servers, the services that they connect to, and also the user interaction that are occurring in modern browsers. Not only that but to give you an idea of the scale with which we are able to handle, we are pretty much subject to the biggest denial service attack every single day. We process more data than Twitter and Facebook combined. We also look after 15,000 of the largest customers in the world, whether that be Major League Baseball or customers like Airbnb. And we also have a real time analytics engine that allows you to access all of these information. So rapidly going through this, just to give you a real quick idea about what's made up of our platform, we have our customer experience products, we've got New Relic Mobile, we have New Relic synthetics products which is about simulating user interactions with your application, which can allow you to be proactive when you are experiencing issues. We also have our New Relic browser product which basically gives you that visibility into what's occurring in the browser. From the back end perspective, we have our APM product, which is basically everything that's running on the back end of your servers. So there's methods that, for example, processing those orders, et cetera, if that's what your applications do. That will give you the visibility into what's occurring there, so database calls, et cetera, the whole stack of any services you're connecting to. And we also have our infrastructure product which provides visibility into the server health, AWS integrations, et cetera. All of this ties together with our Insights platform where you can actually build a customised view, and I'll actually show you the Insights tool today where you'll be able to build a dashboard and interact with all that data and have a single view of your whole application in one area. And all of this is tied together and basically built to scale as I mentioned before. So that was a really rapid overview and we're going to just dive straight into a product walkthrough. So I'm gonna exit out of this and dive into here. So what I'm showing you here at the moment is New Relic Browser. So this is a JavaScript agent, let's get rid of that bar at the bottom, is a JavaScript agent that can either be auto-instrumented into your site if you have the APM product. If you don't you can use a copy paste method and it's basically going to give you the visibility into what's occurring on that page. So what we're seeing here at the moment is I'm monitoring an application that just happens to be a standard e-commerce application that we provide as a demo. And this is called Storefront and we're looking currently at the last 60 minutes of data is that flowing through this application. We also have support for single page apps so the application I'm showing you isn't a true single page app so what we would typically see in this scenario is the initial page load and then we'd see subsequent root changes. You'd be able to see the performance of each root change that you have. But in the interest of this demo today I'm just gonna basically go through the standard page view aspect. So what we can see here is where we're spending the majority amount of time in this area because we're monitoring this APM as well. We can see the time that's being spent on that web application. We can also see the network time which is this area in the brown. And then we've got the DOM processing phase which is here in yellow, which we can clearly see is the predominant area where we're spending most of the time on this page. And then we can also see that we've got the page rendering occurring here. What I should mention about this chart that you're seeing here is that this is the aggregation of every single page being loaded in your application. So I should also mention that if I for example only want to focus clearly if I want to make an improvement here I want to be looking at trying to reduce that DOM processing phase. So if I want to just focus on that I can actually turn off those elements and actually just be focusing just on the DOM processing. If I also see an interesting spike occurring within my application I also have the ability to go and actually just highlight that segment. What that does, so before we are looking at 60 minutes, now we are actually just zoomed in and we're looking at the nine minute sections, we're looking at a nine minute window of everything that has occurred. We also have this app deck score which allows you to have an easy value that tells you the performance of your application and that's based on basically us benchmarking each of your pages. We'll also on this page have a throughput by Browser but in the interest of time I'm gonna jump through to the other sections. So if we want to get granular to the views of each page and see what each page's performance is like what we can do is jump through to the side menu and we'll see the page views here. So what we're seeing at the moment is this is kinda based off of the single page app view, based off trend interaction names. What we're seeing here is the most time consuming page on this application is the index page. I could also filter this to actually look at what's got the slowest average response time. I could also take into account what's got the highest throughput. So in this case I'll click on the index which is taking 5.2 seconds. I can see exactly where I'm spending the time. I can also go into this view here and I can see the historical performance, so I can see what that performance was compared to this time last week or yesterday compared to now. If I had any transactions on the backend that are associated to the delivery of this page I can also click page load transactions and this will actually show me those transactions that are linked to those and I can actually dive in and then see what was occurring. So in this case this was delivering, part of that page was delayed by 479 milliseconds because of the backend performance. We also have the ability to look at, if I go back to the performance view, I can look at the throughput going through. I have these things called session traces. A session trace on the front end is basically going to give you a view of exactly what's occurred within the browser. So in this case if I click on one of these I will be able to see specifically for this 7.95 seconds exactly what's occurred. So in this case I can see that I spent 388 milliseconds in the backend. The DOM processing phase completed at 3.3 seconds and the page load occurred at 7.9 seconds. I can also see here if I have any interactions, so I'd be able to see if my users interacted with that page and then I'd also see how long they actually stayed on this page. So the user actually stayed on this page for 8.7 seconds. I can then go further down on this page and I can actually see all of the events. And all these events are performance and navigation API events that you're seeing that are raised within modern browsers that we're connecting to. And we're seeing here for example this area in the purple is the backend time. Then we have the DOM processing time here. And what we can see specifically is a timeline view and then we've got the page rendering occurring at this point. And we'll see at this point the DOM interactive event being raised telling us that basically the page has started to become interactive. As we go further down you'll also notice that at this point there is our new relic agent loaded, and that's always done at the end of the load event, so we don't wanna impact the delivery of your page. So what we do is we load that agent afterwords, and then we'll start monitoring its flow of the data coming through. So that's a session trace. We also have the ability to look at asynchronous calls that are coming on your pages and give you a view of the performance of those. I can get down to the granularity of seeing the performance of those for each of the pages within my application. And then from the front end we can also look at JavaScript errors, so in this case I don't have any so what I'm going to do is I'm actually going to increase the time period. Again, this time ticker at the top of the page, do I have an incident in the past that I want to look at. I can just go into the custom date and actually put in the time period that I'm interested in. In this case I'm just going to increase it to the last seven days. And what I'm gonna see is basically a number of errors, so I have some generic errors here in my view. I can click on those errors and it'll actually give me a view of the occurrence of these errors and a percentage error rate. I can also see specifically any browsers that are specifically affected by this JavaScript error and I'll also, if I have any session traces that were caught I'd be able to see those in here so that would take me into that waterfall view where I'd be able to see those errors occurring. I can then go look at the error instance details and I would be able to then see what that error was and if I have for example minified JavaScript, which most people are striving to implement, I can actually drag in the unminified version and it'll actually point to me where my error is within my code. Additionally I also have the ability to look at the performance based off each individual browser. So I have for example, I can see in this account that the predominant interactor in this case is Chrome. I can change this to actually be looking at what's the slowest or what's the slowest frontend load time. In this case if I choose to select Chrome I can then actually go in and I'll see a version breakdown as well as the page load performance for each of the browsers that are interacting here. And then I have filterable geography. With filterable geography this is actually a filtering engine that we have within New Relic and this allows you to get really granular with the data that we have. So if I want to for example get an understanding of the performance of my data with users coming in on mobile I just choose device type and then go and filter this to mobile as I filter to the view. And I can then go back to my grouping list and now I'm seeing all of this data just filtered down to mobile. So whatever filter I want to get I can get access to that view. And then I can see also the average network time involved here. So that's a real quick overview of our New Relic browser which is allowing you to get to visualise obviously where the performance issues are and so on. When it comes to becoming proactive with your applications we have something called New Relic Synthetics which allows you basically to create tests against your applications and we provide four different types of tests that you can run in the account. I'm not gonna go through creating one I'm just basically gonna walk you through and explain. We provide a ping which basically is gonna give you the millisecond response time of a page. It's very useful just to know that a page is up and you can also do some validation to check for a certain text on the page being valid or present. We also have a simple browser. A simple browser is a more interesting type of check and that check's gonna go off to a page and it's gonna basically monitor all of the JavaScript and resources that are loaded on that page. I've actually set a synthetic script up against the meetup.com website and we can have a look at the results of that in a moment. We have a scripted browser which basically is a Selenium script that allows you to script up a curated journey through your application. So if you have for example an e-commerce application where you've got a critical path through your application you can actually create a journey through and actually monitor each step of that journey. And when either that journey fails or there's a deviation on the performance of that journey you can be notified by an alert, you can also visualise that and also identify rapidly what actually caused that issue. And then lastly we got an API test so if you are connecting off to any rest of the APIs or so on, you'll be able to hit those with our API test. So if I now go and look at the synthetic journey here, actually I created the synthetic script in the account for, this is a scripted browser that goes off to the Meetup website. We can actually see that in the last, since I created this about two hours ago, the actually failure rate has actually gone to 92.8 so meetup.com actually had an outage. We can also see that their average page load time is 2.93 seconds. Their 95th percentile page load is 11.9 seconds. I'm running this check at the frequency of one minute. I can then dive into this specific check and I can see the results here. So I can see specifically, looking at the last 3 hours, there was a spiked increase that occurred at about 6:40ish. If I increase this time window to the last six hours I will also be able to see all of the failures that have occurred here. So in a moment we'll go look at a failure. But I have a scenario where we had a page load that actually took 23.3 seconds on the Meetup homepage. And if I go and look at that, I should mention this is not the JS monthly webpage part of Meetup, it's just the Meetup homepage. So what we're seeing here is that we're seeing the DOM processing time completing at 13.4 seconds. We have the page load occurring at 23.3 and I'm actually seeing here the timeline duration of all the assets that are loading here. So the actual slowest resource on this page looks to be the HTML here that took 11.3 seconds to actually load. If I want to rapidly see what actually is the slowest resource all I need to do is actually filter this by duration. I can then click and I can immediately see well actually it is that HTML page that took 11.3 seconds. I can also see it took 6.79 seconds to deliver these images, et cetera. So obviously with this deviation I can then set up an alert around this to go and investigate that issue. So that's kind of a synthetic check if it's a successful one. We also have scenarios where this has failed. So today at around 2:55 we've had a outage, so we had 503 response and we've also had a timeout. In our view when we try and run synthetic check we actually wait one minute before a check completes and if it doesn't complete within that minute we'll actually fail that check. In this case we got a timeout when requesting the home page. With this if we've got a consistent timeout we could actually trigger an alert and be notified about this and obviously take action against resolving whatever issue that may be that's driving behind the website. So that's New Relic Synthetics. Lastly we have New Relic Insights. And Insights is a tool that we provide that basically, with all of these agents, whether that's for an APM agent which is about monitoring the backend or it's a browser agent or synthetic data that's being fed in, all of that data gets fed into something which is called NRDB which is New Relic Database. And we provide this tool, Insights, to go and ask questions of the data. So what I'm gonna do is basically just build a simple dashboard. So in our storefront application we're monitoring in browser, if I wanted to ask myself what's the average page load just today, I would just enter in the view here, this is NRQL by the way, which is New Relic Query Language which is very similar to SQL. I would just type in select average duration from page view, which is the type of data that we get from browser, since today. Click run, basically tells me that's 2.55 seconds to deliver a page load for every application in this account. If I want to compare that to this time last week I would just go compare with one week ago. And I can see that actually the page load's increased by 0.24 percent compared to this time last week. If I want to time series that data all I need to do is just enter the keyword time series, click run, and that then shows me the time compared to last week so I can actually see a trend view here. Rather than obviously going through in the interest of time to take you throw this I will show you a pre-built dashboard that I've got which I have created here which actually is looking at the data coming out of the synthetic check that I created. So what we're seeing here is the percentage uptime of the check that we're running. We're looking currently at the last 60 minutes, I'm also looking at what the average page load is versus the last 60 minutes. I've got the ability to look at the DOM processing phase, so I can rapidly see what improvements I make and the effect of those. I can see the average page rendering time. I can then get a breakdown of the content, so predominantly most of the page that we're monitoring in the Meetup is actually images. I can also look at the average duration by domain. So in this case I can see that Google Tag Manager is actually the slowest performing of all the resources, taking up 427 milliseconds as a domain. And I can see also that the slowest assets on the page are the JavaScript assets. If I come down further here I can get a breakdown view here of what the average duration is, the average response size, the max duration, et cetera, all the data that I have in the account to query. And then I can also see on the pages what's actually the slowest resource overall. So in this case I can see it's actually the Double Click advertising service that's running on the homepage of meetup.com, which is taking 758 milliseconds to work there. And that's kind of a rapid overview, so lastly if I wanted to see the outage that occurred here I can just increase the time window to last six hours and I can see actually specifically the arrow there. So in this view i can see where we went and had that outage, so I can see specifically the page load increase. So specifically at this time they had an issue on their service, we can actually see that there was a major increase at this period of time. And that kind of covers the time that I have, thank you very much for listening.