About this talk
A developer's journey in the pursuit of a flexible, decoupled and scalable data management and publishing platform. Going from a monolithic platform to a suite of API-first microservices built in Node.js.
...going to talk to you about sort of my personal journey through various approaches to data and contact management platforms and sort of what I've learned from all those different approaches, all the way down to where I'm at today, which you can probably guess by the title of the talk, which is Microservices approach. So, going back a few years now, I remember the first time I've used a CMS, a Contact Management System, and it was something called PHP-Nuke. I don't know if any of you remembers this beautiful screen. But at the time, this was a web application that offered you a dash board from where you could edit everything about your website: add new content, edit sections, users. So you could do everything. - [Male] Which year was this? - What's That? - Which year was this? - Probably 2002, I think. - [inaudible] - Yeah, so you had this dash board from where you could control everything about your site. And, you know, at the time, it was mind blowing to me because, you know, for the first time, I could edit content instantly without having to edit HTML files [inaudible] and then pushing to a server using FTP, which is all I knew at the time. So this to me was a big game changer. And even though PHP-Nuke is not a thing anymore, we still use these concepts today. Like, we still have platforms like WordPress, or Joomla, where you have a dashboard from where you can control every aspect of your site. And so, I've used these platforms on a lot of different projects, both on my job and on personal projects. And at some point, I've sort of wondered and I've started thinking about... It's a lot of responsibility. These things do a lot of things. And I started questioning the actual name that we give to them. So when you think about something like WordPress and we call it a Contact Management System, I think that's a bit of an understatement because actually, if you think about it, it's also a Digital Asset Management System. So when you're creating an article and you drag a bunch of images to that article, WordPress will take care of uploading that, those assets for you to some location. It'll manipulate those assets, so generate different crops, different resolutions. And so it manipulates those assets for you. And it's also a User Management System as every type of actor that wants to interact with your system has to go through that platform, through WordPress, for example. The site admin, or a customer, or an editor, you'd need to be represented in that system. The same thing can be said about layout. So again, in the WordPress ecosystem, everything about how your site looks, typography, color scheme, the actual layout of visual components is controlled by WordPress. In this case, WordPress theme. And the list goes on. These platforms kind of evolved into being these monsters that control absolutely everything about a project. And, you know, I started to wonder if this is okay or if this could be a problem somehow. And to answer that question, it's worth having a look at how these traditional, or monolithic CMSs actually work under the hood. And different platforms have different architectures but this is a very simplistic diagram that sort of describes how these things work where you have a database where all the content is stored. You have a core application, whose responsibility is to read and write everything from the database and then deliver that data to, on the one hand, the admin UI from where it can control everything about the site, and on the other hand, to the templates, which will deliver random HTML to end users. And there are a couple of issues with this paradigm. One of them being, if you choose WordPress for a project, you're sort of committing to a bunch of things. The first one is you're committing to an entire tech stack because WordPress will need to run on PHP, it will need a MySQL database, so you're right away kind of committing to a lamp stack for your project, which is a big commitment to make right at the start. And another thing you're committing to is, because this single product handles many, many features about your site: how it handles assets, how it handles users, and layout, and all that kind of stuff, you're committing to how it implements all these features. So if you're happy with how it manages content but you're not happy about how it manages just your assets, there's not much you can do about it because it's one single product that controls everything. That's a big commitment to make. Another problem is, how do you take this content and how do you integrate it with different types of systems, like a native application or a smart TV or whatever you want? Because like we've seen from that diagram, that's a very low level of communication. You can't plug in a mobile application and connect it directly to your database because that simply won't work. So how do you take this data and deliver it to other types of platforms? And, I've faced this issue at some point in my career. And I was working with a publishing company that was using WordPress to manage a lot of digital properties, a lot of magazine websites. And they also had a few native applications, mainly iOS applications. But to publish content to those platforms, they had to use a completely separate publishing interface because WordPress is, you know, a web thing. It created a huge overhead for the editorial team, having to publish this same piece of content in different places, just to get it to different platforms. And, at the time, our approach was to use, what is called a headless CMS. And the concept behind a headless CMS is to take this same approach, this is exactly the same diagram as we've seen before, and make a small alteration to it. You sort of separate the layout from the rest of the monster, so to speak. So, you got rid of your coupled templates layer and you replace it with an API component, that still communicates with a core application, still gets the same data, but instead of exposing it to the world as an HTML file, as a fully rendered HTML file, it exposes it using another format like XML or typically JSON. And what that allows us to do is to...by exposing that in a language that is universal, we can plug in other types of systems that can easily consume that data. For example, a native application or another web server or anything we want really. And by doing this, we achieve, first of all, separation of concerns, so our content and our layout are now decoupled, which means that any front end system can be plugged in. We built this API layer as a restful service, which means that anyone that can issue an HP request can communicate with our system and also everyone speaks JSON, so again, it becomes a universal system. And the biggest win of all is that we can have a single editing interface that can be used to publish the same piece of content to multiple platforms and mediums. And this last bullet point is quite important and it's the core principle behind a philosophy called COPE, which stands for Create Once, Publish Everywhere. And the idea behind COPE is that, again, we separate data from design, which allows us to, again, have a single editorial process, which reduces the editorial overhead in a team. And most importantly, it means that our content is reusable, so we create a single instance of a piece of content. And because that is agnostic of any language or implementation, it can be reused and recycled in as many platforms as we want, which makes it as future proof as it can possibly be because we're likely to have to integrate our content with platforms that still don't exist, and this is by keeping that content agnostic of implementation is the best way you can be prepared for that. And this worked really well. Everyone was happy with the project. But it did make me wonder if this approach is the ultimate COPE platform that is robust enough to withstand the test of time. And I had some concerns about this. The first one being that we still have a web CMS powering the whole system. So even though we've created this API component that exposes our content in a language agnostic way, at its core, this is still a WordPress installation, and that API component is actually a WordPress plugin. And that means that we still have a web, a product that was created to build websites, it's still powering our whole system. And that can be a problem if we decide to build a project that has a very reduced web presence or it doesn't have a web presence at all, so in that case does it still make sense to have a website, a web CMS powering the whole system? Another problem is that, like I mentioned, the database and the core application and the API are still very much coupled together. Our API component is just a WordPress plug-in. So what happens if somewhere down the line, the organization changes, and it's very likely that it will change, and what happens if our publishing interface can't keep up and we need to change it? In this case, we can't just take WordPress out of the equation and replace it with something else because WordPress is our entire solution. So we can't expect that the API will just keep working, because it's still tightly coupled together with the rest of the system. And finally, We've managed to create a COPE system but it's sort of...it's COPE as an afterthought. So We've created a...we've taken a concept, that is very different from COPE, and we've somehow duct taped together this API on top of it as an afterthought. And that doesn't seem like the most robust way to go about it. And so, at the time, I sort of experimented with different things. I looked around to see what other people were doing and I started putting together this wishlist of what I thought would be the ideal COPE platform. And on that wishlist, I wanted my data architecture and business logic to be agnostic of any tech stack. So, when starting a project, I wanted to be able to represent the business domain without, at that point, committing to anything, to any implementation, to any platform. I wanted my features to be modular and interchangeable components. So again, going back to the WordPress example. If I'm not happy with how it handles digital assets, I want to be able to take that component and replace it with other component that does the job better or maybe I can build my own component for that. So I want to be able to pick and choose the features that I want using modular components. I want it to be using a universal language, like JSON, so different types of systems could integrate with it. And I wanted it to be scalable on a micro level. And what this means is that...let's say that I'm building a video social network, something like YouTube, for example, where I will be doing a lot of manipulation of digital assets, videos,and images, and what not. So, in that case, I 'd probably need my distal asset manipulation layer to scale a lot more than just the API layer, for example, or the user management layer. So I wanted to be able to, depending on the nature of the project, be able to scale different components and different parts of the system as I see fit, and change that over time. And, from that wishlist, microservices seemed like exactly what I wanted to achieve. And this is a good definition by Martin Fowler. It says that, "Microservices are an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API." So by having my features as modular, deployable, and self contained services that could communicate with each other using HP requests, I would achieve modularization, I would achieve scalability because I could deploy as many services as I want, and yeah, I would keep the separation of concerns. And at this point, I started working in a platform called Dadi, which is what I'm here to talk to you about, and this is a microservices framework that was built around the principals of COPE. And what the main, the core aspects of this framework, again it's COPE by design. So instead of taking a product and try to figure out how to make it work with a COPE principle, we did the other way round. We looked up COPE and we built this entire suite of products around it. It's an API first approach. So every single component, every single feature on this framework exposes an API. Features are modular components, so you can use just one component or you can use all of them and you can use as many instances of each component as you want. It's built in Node.js. It's scalable, and I'm going to show you in a second what I mean by that. And then the nice detail that separates this from a sales pitch is that it's a free and open source, so that's a win. I'm just going to quickly talk to you about some of the components from the Dadi stack. We have API, which is a RESTful API layer, which will effective be your data layer. It's a RESTful API layer. We have Publish, which is a Content Management Platform, kind of management interface that will plug into instances of API to create and edit content. We have CDN, which is a distal asset manipulation and delivery layer, and I'm going to show you a couple of cool features in a second. And we have Web, which is our templating agent, effectively our web server that can be plugged in with API to form a full stack suite or it can act as a stand-alone application. So this is an example of a project using our stack and how you can use the different components to build a project. In this case, we have a data layer, database. We have an instance of DADI API, which will connect to the database and then it will feed an instance of Web, which will be our web server. It'll feed an instance of CDN, which will handle all the multi-media assets, which will be connected to a media bucket. It can be Amazon S3 or whatever you want, really. That same API will be communicating with publish as well, or CMS layer. And it can be feeding a native application as well. And these dashed lines that you see here are just plain HP requests, which means that on your native application, you can just request assets from DADI CDN, because everything's just an HP request. So these are just some numbers to show you that this platform is being used and in production in large scale websites, with quite large editorial teams. And some of the things that we've able to achieve with this approach is scaleability. So by having all our features as separate components, like I said before, I could scale... I could create 50 instances of the CDN services if that's the critical part of my project that needs to hold a lot of traffic. It also gives us developer liberation. And what that means is that if you have a huge monolithic application, that handles every single aspect of a project, then you need a team that can understand a little bit of every aspect of that project. But if you have your features as self-contained services, you can have a team focusing exclusively on the front-end piece, another team focusing on the API layer. And even though they will communicate, an engineer working on a front-end layer doesn't even have to look at implementation of the API, doesn't need to know how it's implemented. He just needs to know what the API, what the response formats, what the other codes look like, but that's it. And it's also fast to market. And what I mean by that is that by using this approach, you can have multiple teams working simultaneously on different parts of the project without creating concurrency issues. So typically when we're starting a project, the first step is to define the API layer. So we're going to define...we define the business domain, we define how the response looks like, but once that's agreed, then different teams can carry on working on different parts of the system, so you could have one team working on a set of native applications that consume data from the API. You can have another team working on the web layer, a different team working on a set of contact management interfaces, another team working on CDN. And the cool thing about this is that, because everything is just a RESTful API, nothing stops us from creating a native application that is effectively a CMS as well. So I thought I would just quickly show you something, some implementation, or at least tell a little bit more about how these components work. And this one is sort of the main service, DADI API, where you define your your business domain, where you define your data structure effectively. And you do that by creating a schema file in JSON, where you define other fields, field types, relationships between fields and a bunch of other things, like validation rules and a bunch of other settings. So in DADI API, it holds data architecture and business logic. We use NoSQL database, so content is stored as documents and grouped in collections. DADI API Is fully REST, and by this I mean that you can, obviously, you can add, edit documents using the RESTful end points, but the schema itself is also accessible and editable using a RESTful end point. So I can make a request for an end point and change the actual fields definition within this collection, so everything is accessible with a RESTful end point. It's also customizable by hooks. So I can apply a hook to this collection and say that every time someone adds a new document, I'm going to run this piece of logic that will maybe interact with a third-party system or manipulate the data that I'm about to create in any way we want. Another component I'd like to talk to you about is CDN. So, the idea behind CDN is you...let's say that we have an image that we want to use in different sections of our site. And for those different sections, we're going to need maybe different resolutions of data image, maybe at some point we need a landscape version of that image and at other times we need a portrait, maybe some other times we're going to need a thumbnail. And so the idea behind CDN, is that you store a singular version of an asset, a raw version or at least a large-scale version of that asset, and then you have a way of generating different variations, in real-time, of that asset. So, by accessing a specific URL, with a set of URL parameters, I can request that asset in a 300 by 200 size. I can ask that asset to be converted to [inaudible] or i can apply a blur, or any type of transformation to that asset in real-time. And we have a couple of concepts around that these files show a little bit. So, one of them is the concept of a recipe, which is the screen that you see on the left. So, a good example of a recipe is, a thumbnail. I can create a recipe called "thumbnail" that it will specify that a thumbnail is 150 by 150, it's converted to a sepia tone, and it has, I don't know, a 30% blur, or something. And when I specify that, that's the concept, that's the definition of my thumbnail, I can then go mydomain.com/thumbnail/ the name of my image, and it will generate that image for me in real-time using this configuration that are specified for the thumbnail which is a really cool way of reusing your assets in different parts of your system. The other concept that I would like to quickly show you is the concept of a recipe, no I talked about recipe, I'm talking about roots, yeah roots, sorry. The concept of a root is that you can define a set of conditions that will determine which recipe will be delivered to a user. For example, I can specify that when ever a request is made from the UK, on a 3G connection, I want to deliver the thumbnail 3G recipe for that asset When I'm making a request from France, using where the browser has a specific language, I can deliver a different recipe for that same asset. So, that's a nice way of delivering different assets based on a set of conditions, again connection type, device type, and a bunch of other conditions. And, with that I'm going to do something really scary, which is sort of a live demo, which always goes wrong. But I thought I would just quickly show you another feature that we have. This is going to be a problem, isn't it? Yeah, this resolution is a bit of a problem. I knew this was going to go wrong. So one of the features we have on CDN, and I can quickly show you if you look at these URLs, so this...I don't know if you can see this. This is the URL of a CDN instance that I've set up. And so in here, I'm requesting photo3.jpeg, and in the URL, I'm specifying all the parameters that I want about that Image. I'm specifying the quality, the quality is just a numeric parameter from 0 to 100, which will, obviously impact the file size. I'm specifying the width of the image that I want, the height. And we have a lot of different ways that you can crop an image. And this one is the one that I wanted to talk to you about, it's called entropy. And what it does, I don't know what the original dimensions for this image are, but if I'm asking for a 400 by 600 cropof it, it would need to choose which part of the image it'll use. And I can specify that manually by providing a set of coordinates. But this parameter called entropy, what it does is it analyzes the image and it finds out what are the most important reasons of that image, what is the focal point? And based on that information and based on the dimensions that I'm asking the final crop to be, it will give me a good crop for that image. And what that allows us to do, and this is the demo that I was hoping to show you, is if we plug that in with the HTML picture element, you can take a landscape image and you can create different sizes of it, all the way down to a portrait mode. And you can make sure that your sensitive information in that image is always displayed so you're not cropping important parts of that image. So this is an example. And it's a shame that the resolution is not helping me, but if I change the size of this window, you can see that I'm setting break points. See, the image just went to a new crop, but it was smart enough to realize that this girl in the picture is my focal point. So it crops the image in various different ways, whilst always making sure this focal point is respected. And this is...and this here I can actually show you. So if I go...if I have...if I want like a big...sorry. 800 by...this is a bit of a ridiculous size. So 800 by 500. It'll give me this. If I then go, "Oh I want a square." So it knows that this is the important part of the image to maintain. If I go 200, it knows that this is the focal point to always deliver that part of the image. And, I believe this is it for time, I think it is. So, yeah, it's a lot to take in just a 30-minute presentation. And please by all means go check that platform out. You might find that some of the components or all of the components are useful to you. And, yeah, feel free to ask any questions and I hope you like the presentation.