## About this talk

Grakn is a hyper-relational database that has automated inference as one of its core features. In this talk we give an overview of top-down and bottom-up inference algorithms. Then, after introducing some Grakn concepts, we demonstrate how queries involving inference steps are resolved lazily. Lazy inference is crucial to respond to queries quickly and to avoid unnecessary computation.

## Transcript

Hi everyone, thanks for coming. As you can see from the title, this talk is gonna be about inference in Grakn, in particular about lazy inference, and hopefully that's gonna be the interesting part for you guys, for your meet up. But I thought given the occasion I could give you some introduction about what inference is and how it works and a bit of background. So I could say it's 60% introduction about inference and about Grakn and 40% about how we do lazy inference in Grakn. There's a slide about myself. So, I have a bit of an academic background. I did a PhD in computational logic with machine learning. If you've heard of inductive logic programming, that's what I did for my PhD, and then I left academia and I started working in finance. So I did a bit in Bloomberg where I worked in automated news generation, and then I worked in Citi as a quant dev, whatever that means, and then I moved to Microsoft Yammer, and after that Skyscanner. So I did, a bit of my expertise is in distributed systems and micro services and web services. Now I'm at Grakn. I'm lead engineer and I've only joined the company in May, so I'm probably the last one to join Grakn, and so far I focused mainly on the distributed engine for Grakn, but yeah. Giving my background on giving this presentation now. So, okay, what's inference? So this is a very general definition. So we start from a knowledge base. We know something about the world, and given a set of inference rules we can derive conclusions. So inference is basically the set of rules that we use to generate new knowledge from something that we know already. So, for example, it's something that we do as humans. It's inspired from the way we perform inference. So let's say my knowledge base is that assertion number one today is a sunny day and assertion number two if it's a sunny day and, or if it's a bank holiday, then I don't work today. So imagine these are my life rules. Then I perform a type of inference called deduction and then I derive that I'm not gonna work today. This is just one type of inference. It's the probably the one we use the most, but there is a, you know, a number of things that we call inference. Just to give you a few examples, if I was starting from the fact that I don't work today, and my knowledge base contained this rule, I might be tempted to say that I don't work today because it's a sunny day. This is called abductive reasoning. I go from the conclusions and I try to derive the causes. This is called incorrect reasoning. It's wrong. A lot of partitions use it, but that might not be the case. Maybe I'm not working today because maybe it's a bank holiday or there's another reason for that. Abductive reasoning is also very important in scientific inquiry. So you observe something, then you make a hypothesis. You try to find what's the possible cause. Then you get this cause and try to make a few experiments and try to derive a new theory. Going-- Something about inductive reasoning, it's reasoning, it's generalising from instances. So, if, if I talk to some of you and you tell me you are a software engineer and I talk to someone else who tell me you are a software engineer, then I might derive that everyone in this room is a software engineer. That's inductive reasoning. I generalised a rule from that. Obviously again this is wrong. This is a useful way to derive general rules, but it's, there are exceptions. Maybe someone else has a different job here and if we add probabilities to inductive reasoning, we perform some kind of probabilistic reasoning, some, many times, and we're not gonna get into that. This is also what we do usually when we use a database. You know, you can call it inference as well. You have a knowledge base which is a set of tables. Then we perform a query. We have some application logic on top of it, and then I might derive some conclusions from that. Cool, okay, this is the I'd say, probably the boring bit. I need to introduce some notations and concepts, so the rest makes more sense, and I tried to put them in this slide, okay. This is some notation I'm gonna use, and it's basically computational logic notation. So if, by the way, how many of you have used pro log before? Cool. How many of you have used answer set programming before? Oh, cool. Okay, so this is logic programming notation. If I'm saying human bracket Socrates, I'm say Socrates is a human, and it's called a predicate. Human is a predicate symbol. If I'm saying human X and I'm using an upper case letter then that one is a variable. Now by itself, that doesn't mean anything. I need to quantify the variable, and in logic programming every variable is universally quantified. So, if I'm stating this in my knowledge base I'm saying everything is a human, or for every X, X is human. Another bit of notation, the comma means and. So, if I'm putting this in my knowledge base, I'm saying everything is a paradox and Socrates likes everything, which is not what I want to say probably. What I want to say if something is a paradox, then Socrates likes it, and that's-- This is, this basically means if. Just another thing. So, formal logic is, is a very rich language. You have ors, you have ands. Logical connectors. What we use is a subset of that called definite clauses. So if I'm putting something in my knowledge base it's gonna be a definite clause, and the reason why I'm adding this constraint is gonna be clear later. So a bit of deduction. This is something you've seen a million times probably. So, there's my knowledge base, only two simple facts and this is a running example I'm gonna use in the following. So I know that Socrates is a human. Everything that is a human is a mortal, and I derive that Socrates is mortal. This is called modus ponens, and it's called syllogism. Notice something about deduction which is quite interesting is we are not learning anything new from our knowledge base. Every fact that we are deriving is already in it. We are just making it explicit. We are getting an additional fact. We're not adding anything about it. We don't know anything about other philosophers, and about other people. I was looking for funny images to add to the presentation and then I found this one about deduction, but this is a good example of wrong deduction, right? So, the character says, I'm good at understanding numbers, and then the stock market is made of numbers, and he is disappointed that he's not good at understanding the stock market. We can formalise this in logic using these two predicates. I can say good at me numbers, and made of stock market numbers. This kind of deduction doesn't work, because there is a rule missing there, which is this one. So if this was a true fact about the world that if a person is good at something and something else and this something is made of something else then I'm good at this something else. This is, this is not true, of course. So that's why the little guy is disappointed. Okay? A bit more like getting a bit deeper about inference and so what's a Herbrand Base? Apart from all the complicated definition is we get everything that is in our knowledge base and we ground it. It's a set of all possible assertions that we can make given the entities of our knowledge base. So all we know in our knowledge base is about Socrates. There's no one else in here, and all we know about is human and mortal. We have only two things in our Herbrand Base. So it's a set of things. Cool. So, an interpretation is an assignment of truth values to elements in the Herbrand Base. So I can get subsets of the Herbrand Base and those that I, the things that are in the subset are the true things for us. So we have two elements. We have two truth values. Two to the power of two. We have four possible interpretations here. Stop me if this doesn't make any sense, or if, if you have questions, okay? Where I wanted to get with this is how do we know that something is true in a knowledge base? There's something called a model for a knowledge base. So out of all the interpretations, we get one that makes every single assertion in a knowledge base true. So, for the first one to be true we need to have human Socrates. For the second one to be true, since human Socrates is true this must be true. So the model that we have here is this one. Cool. This is a useful fundamental theorem about logic programming. A knowledge base with no negation, so with definite clauses, the way I showed you before, has only one and only one minimal model. So this makes things very easy. Everything is very well defined, and it's easy to understand what you, what a logic programme means. Now things get a bit more complicated. So, I don't know, you know, if you're into this kind of stuff, but people write thousands of papers about what semantics to use, what language we should have, and a lot of the research in computational logic can be summarised in this slide. So, we have our semantics, which is for example that model theoretical semantic I gave you before. So in that way we define what is true, and this is the definition. On the other side we have an algorithm, a proof that we use to go from a knowledge base to derive something else. It's called G because usually this is a goal. So we have something that we want to know whether it's true or not based on our knowledge base, and a lot of research goes into understanding what language are we using here, what computational properties does the language have, what is the semantics represented here, and what proof procedure we use here, and is the proof procedure sound and complete? So if there's something that is implied by the semantics, do we find it by using this? Or if we find something by using this procedure is it something that is implied by our knowledge base? Okay, this is all very abstract, but it's gonna get a bit more concrete soon. You might ask what are the semantics that we need here? We have these models. We have interpretations. We pick the models. Things get much more complicated when we try to get something useful out of computational logic. For example, who knows what closed world assumption is? Okay, cool. So when you query a database, for example, in this case you have a set of flights, you get a result, and cool. Those are things that we know that are true. What about things that weren't returned? Do we know that are there things that don't exist? Are they flights that don't exist? Or is it something that we haven't added yet to the knowledge base, or if, is it something that we don't know about? So whenever you're building an application you're always making one of these assumptions. When you're assuming that what is not in your database is false, then you are making a closed world assumption. Now, everything was very nice with formal logic and, but computational logic people thought of complicating it by putting closed world assumption in together with formal logic and it gets complicated because of things like this. So, imagine this. Take this very simple knowledge base here and consider this, this, consider possible interpretation. Like for example, Adam is a man. That's always true. Take the case that Adam is a husband. In this case, then we are satisfying both rules because he's not single, and that makes husband true, and this rule is false because husband is true. So when the conditions are false, then the rule is true, basically. So here's a possible interpretation. We have another one, and this is where people started to diverge in the community. So, with negation there's no one way to do negation in logic programming. You have people that started saying, well, we do need just one model. So I'd rather have true, false, and unknown, and keep just one model. Another set of people thought, well, this is great. We can have multiple models. We have multiple ways to see this knowledge base. They just use the power of this, which takes me to the first way of doing inference in logic programming. This is bottom up inference, and when you say bottom up inference in logic programming you're usually talk about answer set programming, and if you have never, if you've never used answer set programming this is something I'd recommend to check out because it's actually really powerful, and I remember it came out. People started using it during my PhD. So around six or seven years ago, and the idea there is that we have multiple models. So, okay, let's drop all the constraints that we had to make things simple. So let's add more things in our language. So, people can express things like cardinality constraints, and say in an interpretation of my knowledge base I want from five to 15 humans. Everything outside of the scope is wrong. It's not the model. Another thing is, everything gets typed in, typed in answer set programming, and I highlighted that because it's something we are also using in Grakn. Having types on variables simplifies the inference and also it's better from the perspective of a knowledge engineer. It's the reason why we have classes in Java and why typing thing is useful. In pro log, people didn't do that. And answer set programming, set programming is based on SAT solvers. So it works like this basically. Step number one, you ground the knowledge base. You get rid of all the variables. Step number two, you compute all the models. Grounding the knowledge base means this. These two are already ground. So, we put it there. This one has X. It is a variable. We substitute X with everything that we have in our knowledge base. So we end up having two versions of this rule, one with Socrates and one with Plato. Now, this works well in some cases, but imagine you have a distributed database with terabytes of data. There's no way you're gonna ground everything, put it in memory, and then, and then compute the models. Under, under the hood this is what happens. After we, we ground the theory, everything goes into a big conjunction. This is a big logical formula, and SAT solvers are satisfiable solvers. They check whether there is a possible assignment of truth values to these elements that makes this true. This is obviously a very small, a small example, but imagine to have anything useful, you know, imagine you have something like 5,000 variables, 50,000 variables. This is a more likely real application scenario. In this case, you need a SAT solver that assigns all these truth values. You have two to the power of 50,000 possible assignments that this SAT solver needs to go through. That's like 10 to the power of 10,000. You might think this has no possible useful application, which is false. There are SAT solvers computations around and things in this order of magnitude are solved in sub seconds, fractions of seconds. In ASP, the story that is a bit similar to what happened with, you know, networks, the theory was there for a long time. It was really nice, looks really cool. People couldn't really do anything out of it, and then there was a breakthrough in SAT solvers and it made it so this thing became useful. Again, this has its own applications. If you have something that is very inference heavy and you have very complex rules, you have a lot of intricate constraints and you have cardinality constraints, you definitely want to use ASP. If you have something that is knowledge heavy, so you have a lot of knowledge, a lot of tables with a layer of inference on top of it, you, you better use another type of inference, which is what we do in Grakn. Okay. I'm gonna tell you a bit about Grakn. So, just the essentials. So in Grakn everything is a sub type of these three things and everything is hierarchal in Grakn. This is why we call it a hyper relational database. So even these are sub types of a thing called the thing and so you can see this looks a lot of like entity relationship model. So you have elements, actually, we have another slide for that. Things like Socrates and Plato are entities in our model. Things like mortal can be modelled as, as attributes. You have relationships that connect two entities. In this case, teaches could be a relationship that connects Socrates and Plato, and in addition to what we have the computational logic we assign roles to relationships. So in, in logic there was a meaning in having Socrates before Plato. The semantic of that was given by the order of the two. We label this explicitly saying that Socrates is a teacher and Plato's a student, and that's a role, and I want to show you more concretely what that looks like. Let's see if I try to get my cursor there. So we have this beautiful interface. And if I, okay, here it is. So that's the thing. So this is a, an example from an integration test that I'm using. This is the core model of Grakn. You have a thing, relationship, entity, and attributes, and in this particular example, in this particular knowledge base you have entities that are either G objects or a university and you have sub types of G objects which are region, country, continent, and city. You have one relationship called is located in and one attribute called name. If you, if you poke these things, then you get the countries. So since this is just an integration test and not real data you have only five countries I think. The important ones. There it is. As you can see, everything lays in a hierarchy in Grakn. I'm gonna go back to this example later, when we get into how inference is performed. Try to get to my other screen, yes. Good. Okay, so this is about the inference algorithm. Okay, so there's the rules that I showed you before. We express in Grakn like this. It's a bit of a different syntax. It has some extra bits in it, also because we have to specify other things. We specify roles and we don't have just predicates but we have is a, and attributes, and so on. When I was preparing this slide Mika, who is here, made me notice that this is not how we would model mortal in Grakn. We would probably do something like this. Since everything that is a human is also mortal then human is a sub type of mortal. This makes inference easier for us than going through rules. Okay. So this is parts of the example that I showed you before. We have this one rule in our knowledge base and this is a transitive rule that says if X is located in Y and Y is located in Z then X is located in Z. This is an implicit rule. So having something like this means we don't have to write that London is located in, in Europe. We can just say that London is located in England and England is located in Great Britain and then we can infer this transitive relationship between things. This is the running example. So if we run this in Grakn-- Hopefully I'll get out of this, yes. Yup. So, we get the query saying match X where X is a city, and-- Actually, let's-- Can you read that? Yeah, cool. X is a city and Y is something that is an attribute called Poland and X, so the city is located in Poland. So it gave me everything that is located in Poland. Now you can see where I'm going with this. We didn't add things. We didn't add cities that are in Poland. We have things, and then we have regions that are in Poland. So in this case the result of this query is that we have Warsaw and Wroclaw. I'm not sure if you pronounce it like that. More or less. And these little dots here are inferred relations. They're not things that exist in the database but we infer them by asking this query. If you double click on them, then you-- you get all this stuff coming out, and you can see how the inference was performed. So we have a region called Mazovia and the city is inside this region and this is a generated relation. If you double click on it, we see that this is an actual thing in the database. Cool. So this is a very simple example. How do we do that? Cool. Back here. So this is the, the main algorithm and this is something Casper implemented. So if you have very hard questions, ask this guy over there, and the, the algorithm is basically like this. We get the query. We put it in a stack, and then the query is basically, we can ask for solutions. We say, give me the next solution, and that's an iterator. It's implemented in Java. It exactly implements the iterator interface. So every time we ask next we get whatever is on top of the stack. We expand it. So if it's a conjunction like this, we expand it to single conjunction and we verify, when we need to verify something we look, we look it up on the database, and we continue. At some point we get to the point of an answer. When we get an answer we return. If we don't get an answer, again, we go back, get to the top of the stack, expand what we have until we get there. So, concretely, so this is the main goal and we select one of these. The reason why we got the first, the first part of the query there is just part of the algorithm is called selection rule and it will affect the efficiency of how the inference is performed. So the first thing that we do is, let's go to the database and let's look up something that has name Poland. And we do that. We, we check an index. We get these nodes, and we perform this substitution in the query. So everything that is Y becomes node Poland. The reason why I'm calling it node Poland is that the name Poland is the attribute, where the node represents the country. So as you can see, we change Y with these nodes and the next thing that we're gonna do is again let's look up in the database what, if there is, if there is a relationship called is located in with, with Poland. There are a few other steps in the middle, but this is a simplified version of what we do. So what's next? So we are looking up this node Poland and this is basically where-- Let's get this stuff. So when we do that, we find that we have something connected to Poland and another thing now. Here. Cool. Da da, da da, da da. So this is all the stuff that is located in and it gave me everything because I'm using the inference rule. But basically we are going only one hop. So we got this region here. What happens here is that we put something in the stack that has this region and we continue with our inference procedure. So, the, the lazy part here is that rather than retrieving everything that is located in Poland here, we go to the node, and this is another difference with pro log, for example. We don't consult an index. We have a node there. You can just do a hop hop, well, one hop jump, and get whatever is connected to your node Poland. So, at this point, we keep our pointer here to the node in the database, and we continue with the inference. So, we found that X can be bound to Silesia. Okay, so we substitute X with node Silesia and then we have one last thing to show which is Silesia is a city. This unfortunately is not true, because Silesia is a region. So this thing fails, okay? What happens next is that, so this has failed, cool, we have pointer to the next, to the next nodes. So everything in Grakn is represented as a graph, as you can see in our visualizer. So we went to the first nodes. We have a pointer to the next nodes. We can move to the next one at this point. The next one is Mazovia. Again, we're gonna check if it's a city. That's wrong. Not a city. That's bad. We have a third thing that we can check, which is the rule that we have in our knowledge base. So this rule here, we unify the head of the rule with, with our query. So, we put, we get this X here. We substitute it with the X in our query and we take X and we substitute it with node Poland. This is just a syntactical substitution here. Now we have, we want to prove this, which is the head of the rule, and to prove that we need to find another relationship that connects X to Z, and then we want Z to be located in Poland. So this is our standard resolution step. This is what we do in pro log as well, for example. Now we have these two things to prove if we want this to be true. So we need to find anything that is located in in our database, any kind of relationship there, and we need to find something that is located in Poland. Okay? And again, selection rule. This is a bit where instead of picking this we know that this is much more efficient. We already have a pointer in the database. If we are to do this, we would go there and consult an index and it would be much, much worse. This one, we go to the particular node that we have. We have a pointer there, and we try to get something that is connected through an edge. So, what's connected to Poland there? We have this region I'll call Mazovia. Again, we do the same substitution. We change Z with Mazovia and we put it here. Now, we have two things to prove, and I'm gonna skip all the intermediate steps. You can see that X, you're not gonna copy on the databse X located in Mazovia is another backtracking point where we keep a pointer to a database that we can go back to, and this is, this succeeds. We get Warsaw and Warsaw is a city. That succeeds. So at this point we have an answer on top of the stack. We put it in, in a set of answers, and we return. So, all this happens within a next. At this point, if we wanted the solutions, we invoke another next. So when we say, when we make a query, we say, give me 10 results. We perform 10 next on this thing. Note that this is quite, this is important. We have a distributed database. Every look up is very expensive. In programming implementations for example we would go to our knowledge base, get everything that is located in Poland. We would put it on the stack, and then we would continue the resolution from there. So, in this way, for us it's essential that this kind of computation is lazy because we, every look up we're doing might end up in another node. Here we have some backtracking points. So if we want more answers, again, we take this off of the stack. We go back to this point. We have a pointer, go to the next one, and we're gonna find the other solutions. And I'm almost at the end of this. So, this is, this is a Grakn implementation of the SLD resolution, and there are a few interesting differences here from standard SLD resolution which is what is used in pro log and what we do. First of all, we have things that are represented as nodes and we can avoid look ups. So this is, this makes it much more efficient for domains where our queries go through entities that are connected to each other, and of course, we can't just do resolution, because what we have is not a logic programme. We have a richer knowledge base with entities and attributes. So we have to interleave graph look ups with resolution steps within rules. And we need to do additional things, like whenever we're checking a relationship I did not show you in the, in the relation tree but we need to check that a certain type can play a certain type of role during the relation. This is an additional step, but that's not really our performance concern. What we are concerned here is to have the smallest possible tree, and having unification variables with a certain type really reduces the size of the tree. Again, this is something else that is something that we do differently. We fetch the results lazily. We don't do a query and get everything that unifies with a certain predicate. But we keep a pointer there, and if it's really needed, we get something else, and we cache intermediate results. So if within a sub query we get to something that we already derived then we basically look it up from a map. This is something that is implemented in some implementations of pro log and it's called tabling. So, conclusions. So, we think that inference is a more powerful and general way to derive conclusions from data. SQL is a very specific way to query for data and it's really linked to the way you present things rather than to the way you want to use data. And in Grakn we have this concept of rules that we use to define this implicit concept, and we use a top down procedure to perform inference, and look ups in a distributed database might be very expensive. We need to be lazy in the way we perform them, and the take away point about how we do lazy inference, we have a stack of partial goals, and we only expand goals whenever needed, and whenever someone is asking for the next solution. And that's, that's all for me. Yes? All right, so time, there are two answers to this. How do people model time in logic and how do we do that in Grakn? So time is modelled with integers, for example. So integers are a primitive type in pro log. If you go deep behind what's the number it's basically, it's something I didn't mention, which is a phantom. So you have an element called zero, and you have an element called a successful zero and you have another element called a successful, the success of a zero. If you want to keep everything in formal logic, this how you represent numbers. In, in Grakn, numbers are attributes, and we don't have anything that you can use to perform algebraic operations from numbers. So it's not very supported at the moment, but we're definitely working on that because there are a lot of applications that would benefit from that. - [Man] Thank you.