About this talk
How can you use machine learning and data science techniques in your everyday work? In this talk, Holly will cover recent projects where 383 have used Bayesian machine learning to inform and validate qualitative research, as well as introducing Bayesian statistics to their everyday product management.
It's kind of basically around the practical applications of these methods so hopefully whether you're client side, freelance, or within an agency you can get something from it. Today I'm going to talk about our team set-up at 383. I won't bore you too much with 383 all 'cause Cass kind of done that bit. I'm going to talk a little bit, a forewarning I guess about the AI winter, kind of how that might impact us. I'm going to go through two practical examples of using statistics and machine learning to kind of improve the user experience. And then I'll wrap up basically with some tips from kind of doing this stuff for a while now as to what we recommend in terms of do's and don'ts and then finally hopefully a little bit around what the future looks like for data science machine learning at 383. Earlier this year 383 formed a insight and data team. And this is made up of a quality of researcher who was also living it up at the web summit in Portugal, whereas I'm here in the lovely Jewellery Quarter in the rain. And the data on this which is me. Given we're kind of a cross discipline team I actually figured the best way to represent this was with my favourite team of multi-discipline scientists, the Ghostbusters, and as Louise, the qual insight analyst isn't here you can't fairly judge who Venkman is but I'm going to say it's me and so joke's on you, Louisa. So previously we held kind of positions at 383 in design, I was in Engineering for a while, in strategy, and the UX department when there was one. But eventually we kind of settled on having a team that just looks after data, whether that's qual or quant, kind of working together. So whether that's kind of an audio into your spreadsheet the idea is you that you have kind of one centralised department that's really spearhead in talking about the user experience and improving that. One of the things that we kind of have as an eforce, within our team is moving away from questions like this. What can our data tell us about the user experience, like at which at first glance seems fine, right? But typically if you're agency side that's the data that the client owns, that's just using data to talk about the user experience, not directly impacting it. And we kind of wanted to get here, so how can we use any relevant data to improve the user experience, and the kind of distinction between the two here is we're not just interested in the data that the client has. It can be from third parties, we're super interested in that. It can be from people on the street that we meet for field research especially from obviously a qual perspective. It can be can be from social media. It can be from interview transcripts. So the idea is that it's not just looking at customer or client owned data, it's actually everything in between. And also looking at how we can use things like data and text mining to kind of fill in the gaps. And then the next bit as well is actively improving the user experience. Not just talking about this but on how to face that click through rate but actually how can we thread data through. If you land on say, one of our client's websites, you have a personalised, pre-filled list based on what we know works well. It's not just looking at what's happening, it's looking at how we can predict and impact what's happening, which is really key. We're kind of at the beginning of a really exciting phase at 383. It kinda sounds really good, right? We're a new department, new budgets, new stationary, all that stuff. But things might get a little scary I guess on the quant side. It looks like we're kind of heading towards what's known as an AI winter. Which means bad things like budget cuts, presumably The X Factor and Strictly are on indefinitely in some sort of AI hell. Yeah, it's real bad. It's all the worse bits of winter but AI themed. I presume that's just like the Terminator in snow, I haven't really though this analogy through. Gartner placed machine learning on the peak of inflated expectations, have you guys seen the hype curve stuff before? I feel like every strategist at every agency ever brings it out, looks at the thing that's really high and goes we definitely need to do this thing. But yeah, essentially, machine learning is right at the top of peak of inflated expectations which obviously sounds great for my job security. And what that means is, you'll hit a plateau of productivity within the next two to five years. Unfortunately for people that kind of are in this field, if these predictions are correct we're gonna hit a trough of disillusionment. Which will essentially mean optimism towards machine learning is reduced and then in real terms, potentially a hiring freeze and budget cuts. There was some interesting stuff, this isn't a new thing, right, this sort of push pull between machine learning and AI being super useful, versus everyone hating machine learning and AI. Back in the 60s, and I wasn't around then, I was 80s baby, there was unbridled optimism towards what AI in particular, natural language processing could offer. But in the 70s there was a huge pushback, the Lighthill report, which came out in the UK basically binned off AI and just said it's incredibly unreliable, especially for NLP. And that had a huge impact on the industry, so there's been this real, we've essentially seen for the past few decades this unbridled optimism right at the top towards these dives towards essentially the trough of disillusionment. The interesting thing now is this isn't just happening in campuses and universities and the likes of Stamford, it's actually happening within the industry. I suppose that's kind of the big difference from where we were then to where we are now. I kind of want to talk about a little bit around now. Our approach to machine learning and data science, and I think it's important to make the distinction that we offer both a data science approach and a machine learning approach and hopefully you'll see how that fits. This guy probably makes an incredible amount of appearances in machine learning talks. Does anyone know who he is? Can I get a shout out? It's Thomas Bayes, it's not like he's like, you know, on the cover of Hello. I don't know if this is the audience that reads Hello actually, I probably should've gone for something better there. But Thomas Bayes was essentially a theorist in the 1700s, again I wasn't born then, 80s child and all that. Bayes' theories on inverse probabilities, so essentially, assigning a probability distribution to something you can't see was kind of revolutionary at the time. And then it was taken up by a guy named Laplace in 1812 who extended this. And broadly speaking, this formed the basis of Bayesian statistics and theory that we look at these days. You guys have probably seen it in various Medium posts and stuff like that but it's the equation which is like, P, A, given B, P over A et cetera. I'm not gonna go through the equation and how to do it now, I figure it might actually be more interesting to talk about practical applications of this given the theme. I'd like to share two examples of Bayesian statistics and machine learning that we've applied at 383 and why we use them. This first one is essentially around Bayesian AB testing. How many of you guys have run AB tests in the past, whether through third parties or your own in-built systems, I can imagine quite a few people. One of the biggest challenges that we have especially working agency side with this stuff is conversations around length of time to run, conversations around sample size, particularly if you have a small product. And also getting by and about the results. Up until the beginning of this year we typically ran what's known as Frequentist statistics AB tests. And anyone that's done an intro to statistics course or had to sit through discrete mathematics while doing computer science will kind of have an awareness say of P values, confidence intervals and stuff like that. The problem is, if you're a head of a marketing department and you're tasked with literally making improvements you probably don't care much about the theory of what a P value is. You probably don't appreciate, analysts like myself, going up to you going, so you know that test that we run for three months, it was inconclusive simple because the P value was naught point naught seven, so I know you spent 10 grand but we can't do anything. That's not a great conversation, no one likes to have those, least of all me. We actually moved to Bayesian AB testing. Essentially what this means is, we have a bet that this widget is gonna perform better or worse than this other widget. Sometimes we'll have data from when we've rolled that widget out in the past, especially if it's new markets so we may be able to say, actually we saw over here, it increased it by five percent. So if we put it in Japan or Italy, we can kind of guess maybe it'll increase by five percent again. So we take our prior, our bet, and we essentially start to build probabilities around that based on what we see. So then we update our prior with the new data and go at the end of it, okay, great, I've now reached a conclusion where I'm 80, 90 percent sure that this performs better and this is by how much. You can kind of instantly see that this is actually more of a compelling conversation to have with a client as opposed to the P value is naught point naught seven, we spent 20 grand, you reached a million people, and we can't talk about it. That was kind of our big thing around switching to Bayesian AB testing. There's some interesting stuff online, and there's a lot of debate so, for contest, quite a few companies have moved to Bayesian AB testing. The reasons for that are you can get away with not a set sample size because you're always constantly updating your prior, your bet, based on the data you receive. In traditional AB tests you have to say 2,000 people for control, 2,000 people for test. In this instance you can actually just make decisions on the fly with the data. The other thing you can do as well, which is super valuable, is you have probabilities at the end of it, you don't just have P values and confidence intervals, so again you can have that conversation with the client and it's a bit more palatable. It's worth noting that Bayesian AB testing isn't a silver bullet for small sample sizes and it's not immune to what's known as no peaking in the statistics world. With Frequentist statistics, there's a rule that you obviously can't look at the test as it's ongoing and make decisions then. There was a lot of talk, especially from a guy named Evan Miller within the data science community about how you can do this with Bayesian. And in statistics world, which I won't bore you with, that caused a whole lot of drama. I think the takeaway is that Bayesian AB testing actually allows you to be a bit more fluid with your sample sizes and also allows you to kind of make decisions a bit quicker. We've had a lot of success with it but I do wanna kind of disclaimer, it's not the be all and end all. You can't run an AB test with one person each and go oh look, A won, respect to two people. Not that anyone's gonna do that but you never know, right? And then the next bit of that is kind of Bayesian machine learning. In the realm of kind of Bayesian classification we've also done some interesting work. We were recently working with a client whose name I can't say but we'll talk about as a generic product and service. And we were basically interviewing customers about why they used it, why they switched to it, what they thought of it. We identified two to three themes around that. Switching was a hassle. Once they got in they kind of felt trapped. And also there was a load of new tech in the market, without giving away who it was, that was kind of confusing them almost, like devices were coming into their homes for the first time and they didn't know how they worked, they didn't know if they integrated with other providers and so on. I feel like I'm being suitably vague. What we found was we had these themes, which was great from the qualitative interviews, but some of the pushback we received from clients is okay, you only spoke to 10 people, you only spoke to 15, you only spoke to 20. How do I know if this scales across my business. We essentially did an exercise where we scraped the client's Twitter and looked at all of the customer support things that were coming in. And then from there we used Bayesian machine learning to classify the sentiment of them whether it's positive or negative, and alongside that we actually classified them by the topics. We were able to see these correlations between what we've seen in the qualitative interviews with the quant, and then conversely there was another thing which was, Twitter was becoming almost the last chance saloon for a lot of these customer's queries so they were frustrated 'cause they were spending 40 minutes on hold, they were not getting their emails replied to, and this never came out in the qualitative interviews 'cause no one we were speaking to was on the phone to this provider trying to get through. But on Twitter, they were irate. Actually we were able to come up with a new thing of saying look, yeah broadly your themes are at X, Y and Z, but there's also this huge issue which is causing the blockage here. So that was super interesting because A, we were able to validate those findings from qualitative were actually a thing, and then B we were able to talk about the urgency of fixing customer support. I've talked a lot about Bayesian classification, Bayesian machine learning and Bayesian statistics. But I'd be remiss to say that you shouldn't just gravitate towards using a specific model or technique. For us, part of a machine learning project is prototyping other models to see what's the best or worst fit. And then kind of eliminating and improving from there. It shouldn't be like, hey we're gonna do neural networks, it should actually be, this is the problem, and this is the most efficient and accurate way to solve it. There can be a rush, and I think this is probably where the AI winter bit comes in to go, this is new stuff, we should totally do it. And one of the big challenges I think from shifting machine learning products internally is actually it might just be Bayes that wins right, and maybe Bayes isn't the sexiest one compared to Google's TensorFlow and stuff, but if it's the most accurate it's the best one for us. Those are two examples, and I'd be happy to talk about them a little bit more spoilery at the end. Maybe we can grab a drink or something. I'd like to now talk about our tips for success on these projects. We have a set process, I'd say, within our team for going from basic statistics to a shift machine learning project and I'd like to talk about those bits in between. The first one is really around getting the basics right. I know at least one person, two people in this room have seen this, but hands up, who recognises this quartet of charts? Okay. I see a few hands. This was essentially constructed in the 70s by a statistician, which is a terrible word to say when you're presenting. And ironically the thing that probably is closest to my traditional job title in the 70s. To demonstrate the importance of graphing data before analysing it and also the effects of outliers on statistical properties. So what that means is you essentially, if you applied a linear model to all of these, which is the blue line, it would all come out the same but you've got probably what's closest to a traditional linear model atop, you've got nonlinear stuff going on here, you've got a crazy outlier here, and you've got some weird categorical variable stuff happening here which no one has got time for. If you're working with these things, this shouldn't be a shock to you. In a rush to apply machine learning models and projects you can actually skip these steps of applying and graphing your data but if you're working in machine learning, you should be doing this as default. The things that you should be doing at the beginning of any project once you get your hands on data is exploratory data analysis, so looking at what the data physically looks like on charts and inference, so comparing two proportions, you can just run some basic Bayesian stuff to begin with and figure out what you're dealing with. The next one, and this is a beautiful janitor's bucket to visualise it, is data cleanup. It was identified in the New York Times actually, and I've seen this pretty much every year from data science surveys, is that the biggest hurdle to insights is quality of data. It's all about the cleanliness of that data. I'm not gonna talk about data lakes and stuff although that's a thing, and I'm sure Josh is gonna boss that later. It's kind of all about how clean and structured your data is and the quality you can get out of that. Three years later on, so in the New York Times said it in 2014 that hey, it was an issue, three years later it's still an issue. And I'm sure for anyone who's worked on these projects you can kind of testify to that. But the best thing to do in that instance is almost have a set methodology for cleaning and processing your data. If you're fortunate enough to work on an ongoing product you can obviously set up rules and stuff like that to make sure that it's continuously processed. We've got stuff in our that we run each thing to clean it up and it can just be boring stuff like getting rid of commas that are in the wrong place, screw up a CSV eventually. It's really key to set up almost this pipeline and cleanliness stuff at the beginning. So once you get the end you have higher quality data. The last one, which I just wanted to get this in 'cause I've just binge watched Stranger Things and now I'm terrified of going out because of demagogues. Don't overpromise, or as Eleven would say, friends don't lie. Our most recent application was actually pretty straightforward, it was text mining and then classifying that text. So we got some data from Twitter, we processed it, we figured out what the themes were. We could've gone further with that and we could've said to the client hey, so we've got this data, we know what your people are talking about, we can build you a chatbot now and we'll do this, this and this, but it wasn't the right stage of the journey for them. They were still kind of getting to grips with how they could potentially use data to inform their business. So going in and saying, actually, we've got this chatbot for you now would've been way too much. It's super key on a lot of this stuff to essentially not overpromise and to also make sure that you can give incremental results. So whether that's in the first five days of a data project you're tasked on you give some feedback, you say what you're gonna do next, that's super key because one of the big things about avoiding that AI winter is to essentially not overpromise and not tinker away for months on end perfecting the model. There's a lot of questions online about what is the perfect accuracy for models. And it will vary per client. I'm sure someone like Sim whose working in financial advice and stuff, that accuracy might need to be higher but actually if you're looking at sentiment analysis you can get away with being right eight times out of 10, that's pretty good. I'd like to talk now about what's coming next for 383. We've talked about two of the most recent projects we've done for Bayesian AB testing and Bayesian machine learning, we talked about how it's important to have clean data, to not overpromise, and to also ensure that you do the basics of statistics right. The next stuff that we're working on, so essentially this is as of Friday afternoon when we had a bit of a brainwave, is digitising design thinking artefacts. Or in the real world, reading text off post-it notes and basically taking a journey map and making it a digital thing. But I feel like digitising design thinking artefacts makes it sound incredibly smart. It's basically just gonna be using Cloud Vision, so that's Google's own machine learning product which essentially hook your API into. And we're looking to essentially if we have a journey map so essentially, like a user experience going from A to B, we can take a photo of that, and then we can ensure that that's not a human having to go this post-it note says this, it's all there in one place. There's actually a really good library for R and there's obviously one for piphonous R, but I'd definitely recommend if you wanna get started with machine learning stuff, Google's Cloud Vision is a really fun way. I'm gonna give myself a shameless plug, I actually wrote a Medium post about using, do you guys know the Funko Pops stuff? It's like a price checker for those, so you can put it in front of your screen, it'll figure out which Funko it is and then go and check the price. Because I for some reason have not invested an icer and assumed I can make loads of money off 11 with those Funkos. This is probably my time for. I'm sure Sim can give me some great advice in the end about how to not do that. And then the next one which is kind of a media project is working on customer clustering using machine learning. Again, we typically use R for exploratory data analysis and implementing models and we'll be looking at for a client where they have loads of usage data about their product, what the cohorts look like for that. From there we'll be spinning it out so we can go actually your clusters are here, here and here, we can serve those guys on what's important to them. A big thing to note, actually with clustering machine learning if you guys ever do it is it will find clusters anywhere right, so it might be completely arbitrary and this is where that sense checking your data comes in. Because the clustering's job is to find a cluster so you need a little bit of sense checking to go actually is this a thing or not. That is kind of 383's data science vision in a nutshell. You can find me on Medium @hollyemblem. I'm also on Twitter there but to be honest I just Tweet about RuPaul's Drag Race, so it's probably better to go on Medium. And then you can also check out the 383 blog on 383project.com or sung hyphen forts from hyphen team at 383. That's kind of it from me so thank you for listening.