Sessions is temporarily moving to YouTube, check out all our new videos here.

Replicated Object Notation (RON): Like JSON, but for Data Sync

Victor Grishchenko speaking at React Vienna in June, 2017
685Views
 
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

JSON implies static data sets and state-based thinking: no deltas, no versioning, no metadata. That is the opposite of "reactive". RON structures the data as a stream of changes, state snapshots being merely an optimization. For real-time, offline-first and collaborative apps, RON is a much better fit.


Transcript


I'm not a front end developer. I did a lot of front end development some time ago, like three years. Actually, I'm a researcher and a consultant, like a researcher-consultant. So, I have a little bit different perspective on things. First, I will share my questionable track record just to make the context clear. It is made of paper? It is good I didn't have any beer because I might not notice. Basically, I'm a researcher. This was my student paper, Modular Worms. That happened after Nimda epidemics, 2001. Basically, I was saying in that paper that soon worms will become centrally coordinated, updateable in the wild, worms like malware, viruses, worms everything. And centrally coordinated, updated in the wild, and they actually stay; they will not be like epidemics and then everybody's cured, but they will stay in the wild forever because that is profitable. I've got reviews for that paper, like, "What the crazy shit you are talking about?" "We don't understand it." So basically I was very disappointed. I lost very general paper. But, some actual mobile researchers put it online on their site, so it is still googleable. Everything I will tell here is googleable. And like in three or four years, they say, "Well, this is current reality; the boy was right." Well, I mean, these days it is like, it is just market reality, you may buy some botnet, for a very reasonable price, and it works more or less like this. Then, in 2007 actually, I made my PhD thesis, analysing, like how far and how fast retweets can spread. But, the final part I wrote in 2006, and Twitter was founded in 2006. Basically, there was no such word as retweets, but I had a nice mathematical model for it. And obviously, like, "What the crazy shit are you talking about?" and everything. So, the next thing, I was working as a researcher in Delft University of Technology. I had the task of scanning BitTorrent swarms, so basically, a hundred thousand people, or a million people, come to download their episode of Dexter, and we basically record all the IP addresses, downlink speed, like everything everything, time span of the download, that is not a problem. Except you have to maintain several thousand of TCP connections outstanding at once, and there was a time I didn't have a cluster yet, so I was doing this from my laptop. V8 was just released, so I figured out that if I wrap libevent with V8, it will be like I can use JavaScript to write high-performance network applications. What a brilliant idea. So I actually wrapped libevent into V8. I started writing high-performance BitTorrent crawlers in JavaScript. My name for it was BitScript. And it was amazing. It actually worked. It listed like hundreds of thousands of addresses like in 5, 10 minutes. But then I was pressurised by my boss, who said, like, what a crazy, crazy shit I was doing here, like, "Why don't you learn normal Python." Everybody was doing Python here. "Can't you learn Python. Here is a book." So basically, a year later node.js was released. It was a really bad day for me. I had like really bad mood, like yes I had this idea actually I had line in code. It is still googleable. Then, next project was RFC 7574. Then, next project was RFC 7574. It is about like make in Internet one big data cloud, so basically it is like BitTorrent on the transport level, so you say to the Internet, "I want a file with this hash," and the Internet gives you the file, without any connection to any service or something. I mean, without any server-specific semantics. So it is a hot topic, but there is nothing working at the moment. But for example, Otto Pike, who created Gore, he is working on a similar project at Google. So of course I am set. There is something is like Replicated Object Notation. I was presenting it at CodeFest in Russia like two months ago. Well, obviously people say it like, "Well, what a crazy shit you are talking about?" We love JSON, JSON is good. You're a bad man, go away. So it is a good sign that I am on to something. Like in two, three years this will be like the normal. Then I was thinking about how to structure within because I may go like, "Oh, formal languages. I created a formal implementation." And then 20 minutes about formal languages. So basically, I will be comparing React not to as React, but I will be comparing React to the most high-impact, most-used, most-deployed Reactive application to date, which is? Which is? - [Audience Member] Facebook. - Huh? - [Audience Member] Facebook. - No no no. Reactive application. Microsoft Excel. It is a purely Reactive application. I mean, it is like made of little lambdas, I mean. It is, I mean, all the code is made of lambdas. You may recompose them, recompose them any way you want. Basically if you make some change, it recalculates everything automatically. So basically, it reacts as the input in real time. And it is widely deployed, and people who have no computer science education, who has only basic school education, they use it nicely, they don't need any CSS improvement because it is very natural, and everybody understands it. It works, and it is extremely highly deployed. People actually basically count their money using this thing. So Microsoft Excel, I think, is the most high-impact, Reactive application to date. And it is like typical Reactive application: lambdas, events, everything. So real-time updates. Maybe something changed in the last year, but if you consider a typical React application, normally it is like several megabytes of JavaScript, right? So in the beginning it was like, we will be using four or five core functions, which are non-dep-- but in the end it typically, it is typically like two megabytes, I don't know, of one big core function which maps the world into pixels. I think this kind of monolithic bundles, I think it is ... I mean, consider Excel. It is made of very nice lambdas, which a schoolboy can write, and this is not the case with React.js. Then repeated data retrieval. If you download your Excel spreadsheet, you have it. With the current React.js world, if you download stuff, you have to download it once again, because most of the time, it doesn't cache the ... Actually, there was a very nice trick by or, very nice tweet by Dan Abramov recently, after React pages, when he said, "Well, finally somebody is using reduce for what for something it was intended for." Like for offline use, so well, basically there is no clear offline semantics and most of the time it doesn't work. And ad-hoc data synchronisation. I know even one company which basically went down in flames because they thought, like, "Ah, synchronisation, "That's not the problem. We'll implement it," and one year later, like, "Well, it is difficult. "Our best people are working on that." And like, in two years, "Well, we almost solve it." So basically, $230 million of venture money went to there. So it doesn't matter. What matters that even if you look at Excel we see that there's actually some space for improvement, maybe. And it may be not as Reactive way. I think the idea here is that the data model, to kind of JSON thinking, it kind of makes it worth. For example, consider typical Node modules for the folder, like ... The Rule Number One of the club: never look inside Node modules. Because you look, and is what is this? I have no idea, man. Like, lots of stuff, like, "I recognise this one. "I recognise this one. "This one I installed by mistake." The other 200 things: what is it? Part of the problem, I think, here, is that actually Node.js ... I mean, if you consider Unix tools, they have pipelines, so if you have grep, and you may pipe it to, I don't know, awk, or some something wc or something. Basically, grep may not know where is it piped to. I mean, there is some interface. One programme sends it to the interface, the other programme takes it from the interface. In the Node.js world, basically, the only interface is require, so if you potentially use something in some case, then you require it, and then it stays with you forever. I never liked that, so more or less I think I listed enough of problems to started approaching to some sort of proposed solution for maybe some of them. So, this is a real nice example which shows the difference between JSON and Replicated Object Notation. So, this is like simple object in JSON. I mean, it is really simple, really nice, really understandable, right? Basically, it's just context-dependent. I mean, if we put the subject inside some bigger JSON structure, we know what it is. If we put it outside of the bigger structure, we don't know. I mean it is just, here are just some values, it doesn't say which object it is. Which version? Which with what? If you want to, basically, serialise some the main object, and still maintain some connection to that domain object, then probably you see unique IDs, typically these UUIDs, in this brilliant hex notation. And then we have the actual data. And then supposed to be ... Besides the object ID, we are also want to convey the version. I mean, suppose we want to ... I mean, version's very important ... Actually, when Meteor announced what they're releasing, my first idea I was like, they will get, like $100 million of VC money. They will do everything, and I must change my job and do something else. But then Meteor released, like DDP, their data protocol, and I look into the protocol, and it has no notion of versioning, it has no version. I mean, if you can, say Git everybody knows Git has those hash identifiers for every version, right? And every next version it basically references previous ... It has versioning so, when you have two Git repos, you may merge them, you may see how things are developing, so you may, basically, meaningful merging. If you have no versioning information, you cannot meaningfully merging. Then you can only maintain a connection, and receive dates, but for example, if you close your laptop, open your laptop, then the server, no one knows where the client is standing. And the client, no one knows where the server is standing. So versioning is very important. So suppose we want to ... Basically I decided to keep this example, because if you add version information, it will be like couple other unique IDs. So, the Replicated Object Notation. It is like we say, it is the same object with object ID ... Sorry. Object ID, type ID, version ID, another version ID, two versions, each field was modified separately, and obviously fields and values. So you see, it has more information, but it is more compact. The problem is it is kind of funky, with all these nice symbols, but at least if you look for a hash sign, you know this is object ID. This is actually UUID. The same is here. But it uses different serialisation, which is base-64 with some abbreviations, so it is way more compact. This one is hex, the common one. So basically, Replicated Object Notation is a notation which represents data as a stream of changes. Even if you have a status object, state snapshot, it still represents it as a compacted stream of changes. Like all the changes that still affect the end state. This compressed RON record actually unzips ... Sorry. Unzips to with tabular thing. So basically, there are three records here. You have an object, you have snapshot of an object, version such and such, it has two fields, this field this value, this field this value, and version, version. One nice way to understand it, it is like a stream of ... So it is basically Reactive: everything is a stream of changes; You react to events. Events are immutable operations. Every immutable operation is a line like this. Once you have many of those operations, we may compress them efficiently, because those enormous IDs typically, they share a lot of bits, so it is easy to compress them. So it is Reactive. It is based on the immutable operations. If you send the state, you may always send updates. I mean, there is actually no difference between state and updates. And the other thing, the Excel example that makes easy to understand, actually. How to understand this in Excel terms? This is an object ID, so we may understand it as a row ID. And this is a field ID, we may understand it as a column ID. And this is a version, so every cell may have multiple versions. It is like 3D dimension in our Excel. So basically, you may understand it as a lot of changes in infinite, worldwide Excel something. Another good example is Google Bigtable, basically the fields of Google around some on the back end. Bigtable more or less made along these lines, so we have, like, infinite table, infinite number of rows, infinite number of columns, and each cell has many versions. There is a huge difference. Bigtable assumes linear history, this one assumes partially ordered history, but that is like ... details ... So basically, Replicated Object Notation is notation that sees everything as a stream of changes. And because of the also unique global unique IDs everywhere, when you receive a single operation, it is completely context-independent. I mean, even if you take this operation out of the snapshot, it is still the same operation, it is completely independent of the context. You still know which object, which place in the object, which value, which version, and which type of the object. And obviously it is possible it is quite natural to implement serial keyed data structures on top of this Replicated Object Notation, and data structures like, not sure for everything, offline variability, and all the nice things which I don't have time to demo, because actually I have a demo, which I am showing around like since 2014, and probably there must be some people here who saw it in Bratislava, for example. No? No? No? Strange. Nobody was at ReactiveConf? - [Audience Member] Yeah, I was. - Ah yes yes yes. So, more or less, so ... This is the thing. Everything is a stream of changes. Everything is a stream of immutable operations. Operations use unique identifiers to point to various entities, and you may point back to an operation, because every operation has its own unique identifier, which is like globally unique. Like this is timestamped with this process ID, it doesn't matter. I am playing with this notation quite a lot. First of all, it is obviously real time by default. It is quite an interesting feature. Everything subscribes in real time. It is like a distributed Excel. If you change something on one client,, it immediately propagates to another client. And it is kind of natural. I mean, if you consider Facebook, they have some sort of real time, right? But they had to build some infrastructure. I mean, it is not like by default state of all React.js apps, real-time synchronisation. They had to build the infrastructure. Here it becomes kind of natural thing. Then something I didn't actually expect, but it kind of happened to me. I mean, once you start thinking in terms of Replicated Object Notation and operations, basically your application starts kind of disintegrating. It starts like ... You take something and it turns out to be a lambda. You put it away like ... Next thing is also lambda. Well, and then next thing, it is also a lambda. I mean, you no longer even have that dependencies, one package and another package, basically, where everything depends on the notation, and then it is like a Unix pipeline, and you expect some sort of operations to come in, then you process them, and then you produce some other operations. It is also a bit similar to Kafka. I mean, that is general idea is there is a shared data bus, and you take the events you need from the data bus, you process them, and you put some of the events back on to the data bus. It is a very nice architecture. I used it in production and it worked quite well, actually. So third thing: everything is cached. So you may cache all the data you download you may cache it, because it is versioned and if you reconnect to the server, you want to know whether the server has any updates to the object, you have the object version. So you may always tell this error: object such and such, version such and such, and server says, "No changes." Good. So basically, everything is cache-able. Everything you download you may cache forever. And finally, it is offline writeable. So you may disconnect from the Internet, do your changes, and then you connect to the Internet and it merges. I may show the to-do list example, but I am kind of but basically I am tired of showing it at every event I attend. So, but - [Audience Members] Yeah! Yes! - I hope it still works. I mean, I restarted the server like 10 minutes ago, ah-ha. So here it is. Like wien. Like wien. So basically, if you go by VC url, you might start vandalising this to-do list. One second. Ah, it doesn't magnify the address bar, unfortunately. One second. Boom. Normally it works. I mean, I didn't check it recently. I mean, the problem with this application, if the server crashes, it continues to work because it perfectly works offline, so there is no way to knowing, but this is not my laptop, so it cannot be cached here. So I suppose it must ... Uh ... - [Audience Member] Well, it changes, but it doesn't change here. - Oh, maybe it is-- - [Audience Member] It's working on my mobile phone, there are many people doing something different. - Ah yes, I see it is vandalised. Maybe the laptop is slightly offline. - Maybe if you tried a different browser? Maybe try Safari or ... Because if it's triggered by certain hotkeys, maybe the Vimium plugin is ... - Ah so, so something like a ... Okay then, I will use my laptop. So basically, I see a lot of shit. - Just wait a second. - More or less as I expected. Ah yes yes yes yes yes. - Yeah, must use this screen reader - Oh, this version is much nicer than mine. So basically, apparently Wi-Fi, Wi-Fi is misbehaving, so it usually happens at any ... More or less like this. I hate this demo because I am showing it like since 2013. So ... Basically, at that time ... Yeah. That is my demo. So, questions?