About this talk
Code is made up of three things: names, spacing and punctuation. With these three tools a programmer needs to communicate intent, and not simply instruct. A good name is more than a label; a good name should change the way the reader thinks. Good naming is part of good design. This session looks at why and what it takes to get a good name.
What am I gonna talk about here? I'm gonna talk about a little bit, the broken windows theory, it's the very simple idea that when you are, it was an observation made in the 1970s, and it has since been corroborated in a number of different ways, that when you see, when a building, a derelict building, is left for some time, it may retain its form, its structural integrity, its appearance, for quite some time. But when the first window is broken, fairly soon after the second window, the third window, and so on, is then broken. In other words, it is almost like it becomes fair game, people don't look after it anymore. And this has been observed in a number of places, a number of social scientists have done various interesting observations, including one actually to do with stealing car tyres and car wheels in various parts of Los Angeles. It actually turns out that the only difference between areas like Beverly Hills and the less well-to-do areas is the amount of time before the first wheel is jacked off, but apart from that, all cars that were left derelict for some time went downhill. So we have this observation, how does it apply to code? Well, yeah well, I leave that to not very much imagination at all. Once you start seeing a piece of code start going downhill, everything else follows. Now we could talk a lot about that, but I'm just gonna talk about naming today, and the value of this stuff. So my parents gave me a very conveniently internet unique name, I like to only half-jokingly, but I'm offering this as real advice, if you haven't had kids yet, consider Googling your candidate names, first and last name combinations. That actually came to be useful a couple of years ago with my oldest son, when he was looking for a domain name and getting on social media. What should I look for dad? And so I said try your first name and last name, and it's just like, it's available, well yes, of course it is, I Googled your name before you were born. I checked, I did that for your little brother, as well. There's that opportunity. Now I've, let me think, what's relevant, this is, 97 Things Every Programmer Should Know. A book I edited a number of years ago, crowd-sourced and open-sourced. Plenty of good advice, I'll refer to a couple pieces there. And a couple of other books, not quite as relevant, but certainly, when it comes to some of the thinking. These are two books with wood architecture, and we might think, oh architecture, that's big stuff. What's that got to do with naming? And people sort of say, oh you're talking about just naming conventions. They'll say stuff like, oh, it's just naming. Names are just labels. There's nothing deep or profound about them. It's just semantics, you ever heard that one? If you're in the middle of a debate, perhaps a profound, meaningful debate, it's just semantics. Go and look semantics up in the dictionary. It's just meaning. Well what else is it that you're actually doing? What is it that you are doing when you are coding? You're creating a system of meaning, that is what you're doing. And so therefore to say it's just meaning sort of misses, pretty much, most of the code. That's what you are doing. You don't have many ways of expressing yourself, and so it is about the relationships. You've got the language offering you relationships between various parts, but it's your choice about how you do that, and it's an act of communication. A good name is not merely a good label, it should change the way that you think. Okay, if it is merely a label then you're talking jam jars here. But jam jars are not a system of meaning, this is a system of meaning, that's what you're getting with this stuff. Now I happen to care a great deal about meaning and words, and things like that. I run a page on Facebook called Word Friday. Every Friday I put up an unusual word and its definition, but the rest of the week I put up just other stuff to do with language and linguistics and meaning, so if that tickles your fancy, then go for it. And obviously, you know, I don't just deal with geeky computer stuff, I like taking photographs of books, these are some of the language books I've got, so I've got quite good at taking photographs of books, they're much easier than people. I spoke to a portrait photographer yesterday, it was very interesting, and I said that people just don't, I never have the problem of a book giving me the wrong look. You never catch it and worry about its red eye, or it's just blurred or anything like that. But I do enjoy dictionaries, and so I thought I would cull the dictionaries for the word code. A set of instructions for a computer, hopefully we're reasonably comfortable with that idea, but maybe we've moved a bit above that level. A computer programme, in other words, we think of it more holistically, it's not merely a set of instructions, we're not just scraping our knuckles in the land of assembler of ones and zeroes. It's actually, no, no, I have a construct here, and it may be far, far higher level, a level deeply removed from the machine. Then it gets interesting. Now this reinforces what I was mentioning before, a system of words, figures, or symbols used to represent others. And then we get the problem that we often encounter in legacy code, especially for the purposes of secrecy. In other words, it's very clever that some people take the message of code a little too literally. It's just like, yes, this is coded, it's encrypted, there is a high level of encryption on this, you will never guess what it does. And we end up being software archaeologists at this point. A set of conventions or principles governing behaviour or activity in a particular domain. Well that is also important, because what you're talking about is code of practise. A code by which people follow, in other words, their habits, their coding guidelines, their cultural norms within a company or within a language community. So let's pick on one of the, so you've got an agile mindset tour coming up in the next meetup, so let's talk agile. For a lot of people, agile is user stories, and some variant of scrum. And I'm very interested in a very simple idea, a very simple idea, this way of framing it has, I guess, only come to me in the last year or so. Often as developers you forget, well you worry about other people's requirements, and there's this idea of focusing on the customer's requirements. Now that's brilliant. Then there's this question, why are you doing that, because this fulfils the requirement. Why are you doing that, it fulfils this requirement. Ah okay, that's nice and easy, why are you following those coding guidelines? Why are you not following those coding guidelines? Why are you doing this work? It turns out that as developers you also have requirements, you're entitled to have requirements. They're not the same kinds of requirements, but you can shape them, there are so many different ways of naming things, formatting things, organising an architecture in the large. These are not arbitrary. They create a space for you to work in. So if we're gonna say, this is the question, we have as a role I want feature so that benefit, but the role I want to focus on here is as a programmer. I'm entitled to be able to state things in my environment. I'm working in a particular environment, what are my choices in terms of tooling? What do I want that for? Turns out these are surprisingly hard questions. Well, actually no, the questions are really easy, the answers are really difficult. When you ask people why are you doing that, and what do you want from it, well a lot of people never really thought about this, and so many of our instincts about code layout and paradigms and things like that are very tribal, and when you actually sort of push them a little bit, and go what purpose does it serve, is that more or less maintainable than that? It's quite a hard, that's quite hard. So I'm gonna simply say I want good identifier names. Well, that seems reasonable. So that I can read code. This one comes up quite a lot, I do a number of workshops, I have a workshop called good code, where I get people to sort of explore this question. So I can read code, and I always point out to people, no, you can always read code, that is never the problem, it is normally written, well, there are cases, I mean I know I've seen some code that's been written using emoji, because identifiers have a very liberal idea of what is acceptable these days, but you can always read it. The difficult bit's the understanding. 'Cause I want to read and understand. I can read German, I can't understand very much of it. But I can read it, that's not enough, there's that something deeper. Let's go a little bit further. Let's start, people are very bad at the so that clause, so let's expand it a bit, let's understand a bit more. I can spend less time determining the meaning of code and more time coding meaningfully. Well that's a fun way of remembering it, and it anchors this idea of meaning. What you're doing it's an act of meaning, it's an act of communication. You're trying ultimately to put a model into somebody else's head. That is a nontrivial task. Yeah, but that's a pretty lame I want, because what do we actually mean by good? 'Cause good is one of those words that stops conversations. I want good stuff, giving code a good name. Yeah, I want good stuff, I don't want bad stuff. Oh, I'm glad we had this meeting. What kind of architecture should we have? I think we should have a good architecture. Brilliant, okay. What about name conventions? A good one, yep, brilliant, this is going really well. The problem is there's a lack of concretion there, We need to sort of explore that. We need to explore that, let's start off with the idea that this is communication, when we talk about communication there is a very simple concept, signal-to-noise ratio. It's our first port of call. A measure used in science and engineering that compares the level of a desired signal to the level of background noise. Okay, we can take it further because a signal-to-noise ratio is used in domains and disciplines outside its strict engineering origins. Sometimes used informally to refer to the ratio of useful information to false or irrelevant data in a conversation or exchange. Ah, now, this is getting interesting, 'cause we can start looking at code and going, okay, signal versus noise, what is the noise. One of the greatest sources of noise is identifiers, and the way that we choose or fail to choose appropriately. And it's not because we're being stupid or evil, it's sometimes our understanding of how to communicate is not as good as it could be. So I'm gonna go back a few years, pre dot net to Shakespeare who cunningly encoded within his plays some very interesting knowledge that we've only come to realise relates to code. And this is actually about memory management, to be or not to be, that is the question, okay, so we're talking about object life cycles. There's a whole question here that he's actually resolved within Hamlet, it turns out that Ophelia prefers explicit memory management, whereas Hamlet is much more a garbage collection guy. I'll leave that for another talk, but no really, it is all, that's what the whole play is about, it is about object lifetime. All of the blood, all of the death, and oh yeah, this is what it's about, it's a metaphor blown large. But anyway, we're gonna take this soliloquy, and this is not how a programmer would write it. There's a book, it was actually featured on the Word Friday page, Long Words Bother Me by Tom Burton. He decided he'd have a go at recasting that in business speak. Continuing existence or cessation of existence, those are the scenarios. Is it more empowering mentally to work towards an accommodation of the downsizings and negative outcomes of adversarial circumstances, or would it be a greater enhancement to the bottom line to move forwards to a challenge to our current difficulties, and by making a commitment to opposition to effect their demise. And this is wonderful, because this captures the very problem, I'm gonna add a longer word, because that makes it sound more important, more meaningful, or maybe I'll make the point more clearly, 'cause those short words don't look like they're carrying their weight, so let me use something with a little more impact. In fact, we could describe it using one such word, impactful. This merely is good, but this is impactful. And that's the problem we end up with, without realising it, our coding guidelines or our coding habits generate this, and a lot of those end up in our names. In Smalltalk Best Practise Patterns by Kent Beck, which is now, it's now about 20 years old this book, if you can read smalltalk, this is a wonderful book, because about half the advice is easily portable to other languages. If you can't read smalltalk, then I recommend learning enough smalltalk to be able to get this book. It's one of those, a wonderful book. People will be using the words you choose in their conversation for the next 20 years. I quite like referring to this, because now we're at that point 20 years later, 20 years after he wrote this. You want to be sure you do it right. Unfortunately, many people get all formal, just calling it what it is isn't enough. So here's one I've seen a few times. It's not the longest name you're ever gonna see, we'll talk about those later, public class ConfigurationManager. We just can't resist it all those little affixes, the suffixes and prefixes and bits and pieces, it's configuration manager. And you kind of look at it, and I remember having this conversation with one group, I said, well what is this, what do you use it for? Well it's the configuration, isn't it, well you mean, what, like that? Woah, this is dark magic. I just removed that and actually it now describes what it is, because ultimately all code is a managerial controller of something else, and occasionally the words manager and controller do have value. The problem is if you use them everywhere, well it's like any form of currency, it devalues it. It's just a case of it has no meaning. Otherwise, we're gonna end up with well this is a bit twiddling manager, or this integer is not merely an integer, it's a twos complement 32 bit, it's the 32 bit twos complement manager, and you end up with this kind of ridiculous sort of recession into noise. So they have to tack on a flowerly computer sciencey impressive sounding, but ultimately meaningless word like object, thing, component, part, manager, entity, or item. Just pause for a moment, I want you to think have you done this in the last week, and now that you realise the answer is yes, feel mildly guilty, but not too guilty, because this is a very large majority. We sometimes reach for the word without really thinking, are we saying anything? If I'm working in an object-oriented language, by using the word object on something, am I saying anything more helpful? If this variable is called data, what is this thing that we do in computing, it's all about data. By calling a variable data, it doesn't help. All of these things, they're us grasping, they show us, we're trying to make an effort, but we sometimes don't quite get it. Now we'll come back to the naming aspect, because one of the things that we end up doing is this grasping for meaning, how do I convey meaning in code? And well it turns out one of the most popular ways of doing this is to try and offer comments. And this comes from Rob Pike's notes on programming, see it was written in 1989, so it's around the time I was learning, see. Rob Pike is a kind of Unix demigod these days, he's probably better known for his work on Go. But I really liked what he had to write here. A delicate matter requiring taste and judgement , I tend to err on the side of eliminating comments for several reasons. First, if the code is clear and uses good type names and variable names, it should explain itself. Now, let's be very clear about this one. When we say that code is self-explanatory, that is a judgement that can only be considered by getting another self to do that, you are not the self that makes that judgement . Yeah, my code makes perfect sense, no, you get another human being to look over it, and if they tell you, I have no idea what you've written, then maybe it is not self-explanatory. That requires external judgement . But nonetheless, there's a basic message here. Second, comments aren't checked by the compiler, and as we know, they are not checked by the developers either, so that's, you know, your two main audiences, neither of them is actually interested. As my wife notes, come Christmas time, presents come out and she will observe, she will say, why don't you look at the instructions Kevlin? And I will say something like, oh, no, no, instructions are a kind of sign of defeat. And she'll get on and get things working, because she read the instructions and I'll sit there grappling with the technology and eventually make some breakthrough, or eventually I will look, after I admit defeat, I will look at the comments, the instructions. It's only when everything's gone wrong do you look at the comments. And at that point, you realise, they're not right. They are noise, they parrot the code, they tell you things that are not relevant, they tell you things that are not correct. Third, the issue of typography, comments clutter code. One of my favourite examples came from his paper. I love this one because there is more to this story. When I first read this, I laughed, I laughed. And then a few years later, in the mid-90s I was doing a contract in the city of London, and there it was in a piece of code. This one, not that one, this one. And it was just like, you know what, I always thought he was kidding, and actually, no, it is real. So we have this problem, this challenge, when we ask people to comment code, there's no structured advice there, we're not telling them what it is that we want to see. And if you just say tell us what it does, then you run into another problem. This is, I think my most re-tweeted tweet actually, is this observation, a common fallacy is to assume authors of incomprehensible code will somehow be able to express themselves lucidly and clearly in comments. TI's the same person with the same mindset, they will tell you the same things that were going through their head when they wrote the code. And there is a chance they will be clearer. But, I'm not gonna put a lot of money on it. I'm gonna say it's a non-zero chance I have met that programmer who actually was very good at writing documentation, but his code was truly code, it was codified. But he's the only person I've met for who that is true, 'cause the problem is, what will happen is that somebody else, they will simply write what they have thought, but they aren't even doing it in English. So that's not a guarantee that you're going to do anything other than fill up the space with noise. So how do we find out our thinking and our names, the names in our code? Well this technique originated with Phillipe Calcado many years ago that I was very keen on. Take your code and shove it through a tag cloud generator. Strip it of things like comments, first of all, and string literals, and sometimes people prefer to switch case, get rid of any case sensitivity. I advocated this for a bit, and this is a company, a games company in London, and somebody saw me do a talk where I mentioned this, and they thought, alright, let's try that on our code. Absolutely fascinating result, I mean look at that. Public's quite big. Void is quite big. This actually uses the unity engine, but an earlier version of it. We see that when it comes to domain abstractions, GameObject, and Vector3, that's the whole thing. And GameObject, indeed, if we made it case insensitive, then we would probably end up with that being larger. Wheel, car, okay, so I think this might be some kind of driving game, but there's other things that are fascinating in this code. Look at the size of false, and bool, and else, and true. If you're looking for if, it gets filtered out, 'cause it's two letters are short, this used wordle, which filters out the very short words. There's a lot of flags in this code. This is C Sharp, there's a lot of flags in this code in C Sharp. Certainly around this era, 2011, you got plenty of polymorphic constructs to play with, this is what flags normally end up being replaced by. Normally, I will give you a small piece of behaviour, look, have this lambda, or have this object through an interface, this is what it does. Rather than here's a flag, feel free to switch an if on it. In fact, it turns out the cases, where's case, there we go, case has some say in this code as well. So in other words, there's a lot of jumping around as opposed to going with the flow. Here is the object, or the function, the capture that expresses what you need. And now, I'm gonna give you a flag, and then you're gonna decide, so if else, if else, if else, if. So we get a sort of a sense here, but you don't see a lot of the domain. Wheel pops up a couple of times, and car, and you know, I'm gonna guess that jukebox is the music they'll play in the background, so that's not really the core of the domain. So there's sort of a sense here that we're sort of missing something in terms of our communications. - [Male] Swear words. - Swear words, well, yes. I don't think that those tend to turn up that large, because normally people embed them in other words, you know, you're not just gonna sort of swear a little bit, you're gonna swear in an elaborate camel case kind of way. And it is unlikely that you will duplicate those phrases again and again, so if we actually stripped out the camel casing, and actually broke down the words along those lines, I think you might get a little more here actually, that's a very, so that's your homework in case you were wondering. So let's talk about other cases where we find meaning and struggle, and when we say I want a good name, we need to really understand this idea of meaning, it's not just merely labels. So I'm gonna pick an example from Dan North, this is a fragment in Java code that he talked about in 97 Things Every Programmer Should Know. Code in the language of the domain, now what we see here is if you look at this name, it's very simple, I quite like this one, because what we see is if portfolio Ids by trader ID.get, trader.getId .containsKey porfolio.getId. This uses a map of maps. All the names are, there's not a single abbreviation in there except a domain abbreviation, Id. And when I say domain abbreviation, Id is something that we use in the real world, it's not a programmer abbreviation. So there is a sense here that it's not obvious what this is doing. And yet everything is clearly labelled. But we haven't actually told people the meaning, some people might reach for a comment here. What is the meaning of this code? What is its ultimate intention? Well it's ultimate intention is to determine whether or not a particular trader can view a particular portfolio. This is to deal with insider dealing, in others words, hey, as a trader, you can't look at that portfolio. You're not supposed to, you don't have permission for that. Now that, well it's difficult, well what comment would you add for that? Can view, well this is about the trader being able to view a portfolio. No, no, we're just using all words, we haven't actually said anymore. And yet, this is the other challenge that we have is that all the words here were meaningful, there was nothing there, but they're sort of emergent sense was lost. This tells you what is actually meant. So what do we do with stuff? Well, A Theory of Objects. This is a profoundly unreadable book, it's here because, one, I like photographing books, two, I thought it would be quite cute, while my kids played Lego, they are now a little bit old, disappointingly, I mean that's the reason you have children is so you can play Lego, so I'm looking for another excuse. Architectural Lego is my next excuse, I'm gonna use that one. This is a profoundly unreadable book, it was published in 1995, I think it's got six sections, the first section is in English, the remaining sections are in sigma calculus, which they invented. All I'm gonna say is if you have a sleeping problem, then I recommend this book. But however, the English stuff is really good in it. The reason I'm referring to that is actually partly that, and our desire to use agglutination, sticking Lego bricks together. This is one of the most common ways we name things. Now, I did say I'm quite interested in languages, and there is a concept, agglutination is a process in linguistic morphology derivation. So if you're looking for some vocabulary to wow your colleagues with when you go into the office tomorrow, see if you can just sneak that one into a conversation. Maybe an enhanced version of buzzword bingo, so this is, take some fairly cryptic words, and I'm gonna say go to top of the class for linguistic morphology derivation. In which complex words are formed by stringing together morphemes, each with a single grammatical or semantic meaning. Okay, so there are languages that are more or less agglutinative than others. English actually is Germanic, history does give it forms of agglutination, we are quite happy to marry words together. Sometimes even when they're not actually, this is an English word that curiously doesn't use any English morphemes. I'm not gonna try and pronounce any of these, but this means fear of long words. So it's sort of elegantly self-referential, for which the word is otological in case you're wondering. And then again, this word was contrived to be the longest word in English. It is a mix of Greek and bits of Latin, and it's intended to describe a pulmonary condition that you might get through inhaling volcanic ash or other fine dust. But that's as far as we can get. Now you've gotta head north, this is Norwegian. At this point, you start realising, ah ha, now I understand, these long words, this is what they're for, they are there for administration and bureaucracy. This refers to traffic regulation if I recall correctly. Folk as people's traffic safety and, yeah, sekretariat, always a good word to throw in if you're not sure about anything. So there's a notion here that these long words are there for the purposes of bureaucracy, and therefore obviously German's gonna win this one. That, and now there's a question, is that the longest word in German, I think it's 65 letters long, it's abbreviation is 12 letters long, and the question is, is it still the longest word because it refers to a meat packing director of an EU meat packing directive that is no longer in force, so it's kind of an ex-word. But actually properly agglutinative languages, you gotta go with a language like Turkey, the Turkish family, the Turkic family are very good at this, and this means something profoundly artistic and profound and meaningful, and I have no idea what it is. So what's this gotta do with how we name things. Yeah, this is how we go about code. The world seen by an object-oriented programme. That's a rather nice cartoon from a couple of years back. So we got indoor session initializer, visitor monitor interface, multi-butt supporter, brilliant one that one, living space separation decorator, entertainment provider singleton, thirst quencher container and so on. An object-oriented programme, and it's kind of like, hang on, there's something missing here. There's an indirectness, there's this idea that what we're doing is we're actually hiding from the meaning, we're actually shrouding the meaning, it's almost as if we're afraid, too timid to actually say what it's doing. And we've ended up in a situation where like in The Matrix where Morpheus says to Neo, stop trying to hit me and hit me. It's just like stop trying to hit me and hit me. Give me the meaning, what are these things, don't be afraid to say it. If you're dealing with a language that has, you can nest classes, you have name spaces, you have all of these constructs that allow you to avoid collisions, so feel free to use them. If your concern is oh, well you know, if I use the word door it might collide with another word door. Well I need to see a code base in which that's gonna be true. I've worked in code bases that do not have those benefits. Very few code collisions actually occur in practise. Don't be afraid to say what it is. All of these things, the only one that actually comes out, even remotely close to its cryptic form was television remote control. Do we do this elsewhere? Well, yes we do, we've got lots of examples where we just tack on noise words. So I rifled through the dot net libraries and looked through some of the exceptions. And the naming was not exceptional, or rather it was, because how do I know that they're exceptions? Well they've all got the word exception on the end. I find this slightly curious, it took me a couple of years to notice this one, and that was even after doing Java, where I sort of, it took me a while, I was blind to it, and I suddenly thought, hang on, why do we put exception on the end of everything? I mean, why am I not putting not exception on all the other classes? Why is it that I'm trying to say, guess what, nothing else is important, but I'm gonna draw your attention to this class, 'cause it's an exception? Really is that the only thing that I was interested in? Obviously I couldn't tell that it was an exception, oh no, because it only appears in a throw statement, or a catch clause, or derives from exception, I had no clue at all, thank you for adding that extra piece of noise. So therefore put object and value, instruct, and whatever else on the end of everything else. But the point here is what happens when I take away the word exception. And in these cases, we actually see, there is no ambiguity, the code is actually a lot more direct, it's a habit that we need to see through. Access violation, well that can't be good. I mean just a basic comprehension, I don't need the word exception on the end to tell me this is not good. Argument out of range, bad image format. You know, bad, there's no ambiguity with that one, this is not a euphemistic word bad. Cannot unload app domain, these are all very negative. Entry point not found, well that must be okay then. Invalid operation, oh, I wonder if that's a bad thing or not. There's no ambiguity here at all, these are actually good names. So they're hiding. Now you might say well this doesn't always work, removing exception, and you'd be right. If we go to this lot, and we strip off exception, you're right, it doesn't work. But that's not actually a fault of removing exception, what it does is it shows you how bad the name is, because it shows you that there's no sense here. You know, argument, what, you want one, this is one, I don't know. Arithmetic, well, yes, I mean you know, it's a great evil. It's not proper maths. Context marshal, who, and does he carry a gun, I don't know. Field access, format, null reference, well I mean there is a philosophical debate to be had about whether or not null references are good or not, but this is not the place to have them. That's not the issue, there's absolutely nothing wrong, I have seen null references in code, I've equaled equaled against them, I have assigned them, nothing bad happens, nothing bad at all, it all happens when you dereference them. It turns out that what we've discovered is when you take off these lazy suffixes, when you take them off, and you say, well what am I left with, am I left with something that's actually good, normally it's not, we're missing something. So what's the problem, invalid argument, that's the thing that's wrong, okay. It's an invalid arithmetic operation, okay. It's an invalid format. The problem here is not that it was a null reference, but that it has been dereferenced, you applied a dot and it fought back. It's not merely an array rank, it's an array rank mismatch. So in other words, the notion here is that sometimes some of our habits prevent, I mean feel free to put the word exception on the end if you just think that you've got a little bit more space to the right-hand side of your screen that you want to use up, but the point here is that don't deny yourself a good name. This is the key. The goal here is to omit needless words. So where are our noise words, and this is the bit that sometimes gets people a little bit uncomfortable. I've already suggested, you know, what, tacking the word exception on the end of every exception, not the best use of bandwidth. You need to tell the reader something that they can't already get from the context. Think of it in another way. And there is that idea of adding additional meaning, and sometimes, maybe you're in an environment where you have to put certain prefixes and affixes on, that's fine, but think of those names without them. First of all, get a good name, and then you put the prefixes and the suffixes on. But sometimes what we find is without realising it, we've ended up with very poor names, they don't really communicate something. So one of my favourite pieces of MSDN, if one can actually have a favourite bit of MSDN, I do have a favourite bit of MSDN, it's the line that says do not use Hungarian notation, that is my favourite statement in the whole of it. When I first came across Hungarian notation was in a, oh yeah okay, so maybe I'm not that young, was in the original Byte article where Charles Simonyi introduced it. It's actually pretty much only copy of Byte magazine I've held on to, because I remember reading it and I think it was in October, October 1991, something like that, I remember reading it and thinking, did I just pick up the April issue? 'Cause surely you cannot actually be serious about this as a naming convention, this is so anti-software engineering, I've never seen anything so anti-software engineering in my life. And it is always worth keeping in mind that the reason it was called Hungarian notation is because his colleagues could not read it. There is a very strong clue. It's like, hey, we can't read it, and you're Hungarian, so, funny. It's just like the point there is that no, this is really not a good strategy when the first thing that somebody observes is its unreadability, it's basically line noise. In fact, line noise has actually got a nicer quality to it. And there's a point, if you look back over a lot of code that was written using Hungarian notation, what you'll find is the quality of many of the names are quite poor, because all of the effort goes into the type differentiating prefixes, and it's almost as though you feel, oh I've done my work, and then it's just like, well it's a thing, it's a collection, it's a whatever at the end, and you haven't really said much more, and it's just like, ahhh. So the idea is if you are going to add extras, feel free to, but the first thing you need to do is what is the thing that it is? And then see if you can get a good name, and then question these, 'cause we do have some nasty habits. Here's one. Get. It finds its way into all kinds of identifiers, get. And there is this idea that, well you know, it helps me know that this is a function. Well I've got some really, really bad news for you, particularly 'cause I've had this conversation with a couple of people, it's like, well you know, object-oriented programming is all about nouns and functional programming is all about verbs. Actually no, functional programming is not about verbs, that's called procedural programming, and that's the degenerate failure mode of most functional code. Functional code is not about imperatives, get is a command, it is about as imperative as you can get, quite literally. Get is a state-changing command. We use it for queries, which is a major abuse of the English language. My standard line on this one is that, indeed yesterday I went to a cash machine on the way home to get some money. Huge, great side effect on my bank balance. It was not merely a query, disappointingly. There was this whole idea that these are significant state changes, but also when we use certain words, are they as unambiguous as we would hope? Well actually, not really, because I mean, this is my copy of the OED, I did say I quite like dictionaries and stuff, they no longer sell the physical printed version, but it's about 20 volumes or so, but here, this is as a dictionary, if you are interested in how to use a word, this is almost entirely useless, this will be the worst product you ever bought, 'cause this is about the history of the word. I mean look at that, get, I mean everything you ever wanted to know about all the inflectional forms going back 1,000 years, and how it relates to Swedish, and old Norse, and even middle Swedish. It's all there. If that tickles you, this is great, otherwise it's entirely useless. But the thing I want to get across to you is that there are four head entries here, and this is proportional. In the printed version, it takes over 30 pages to describe the word get. It is one of the two longest entries in the whole of the dictionary. The other one being, correct. And you can see that one as well. So it turns out these are not unambiguous terms, and in many cases, sometimes there is the right way to say things, and sometimes when we think of certain protocols and we think about http, it has a get command, yeah, but that's it. I mean it's very, very simple, in the context of it, there's no get whatever, it's just called get, that's it, and there's put, and it's really simple, you post and you do things like that. There's a very symmetric and simple model there. But here, in most cases, we're hiding something, or avoiding saying something. There's no concept of relationship, you're trying to find, people love the person class, so let's use the person class, turns out people can get married. So therefore what are you gonna do, set spouse, or? There's some interesting referential integrity issues that are sort of implied there. Actually that is a good example of get married. But that has a side effect, you can take a person and a person, and they can get married and that is actually the domain language, but I'm not getting, it's not a query on the spouses and the relationships, it's actually a command to do this. It turns out that sometimes we do use the word get properly. So it doesn't add anything, it turns out that when you go through code and you remove most things that have get and set, you can normally find a better word. And it's this idea we deny ourselves a better opportunity, and we also accidentally fall into the idea that get and set turn up in pairs, and you also see this has a shaping effect on our code. So if you're dealing with C sharp properties, almost without thinking, whenever I run training courses, almost without thinking, people can't help themselves, when they put a get, they put a set in, as well. Particularly, it's good for whenever I do TDD training, 'cause I say, well where's your test for your setters. Oh, well we haven't got any of those. So what have you got setters for? Oh, well we just always put those in. It's like, yeah, there's a big difference between code that has mutability and code that doesn't, that's not a minor detail, it's a very profound decision as to the relationship between state and something that operates on it, and I'd really err on the side of immutability if possible. So accidentally sneaking in set in there is not a good idea. And we get this idea because they rhyme, and they're the same length, get, set, there's this beautiful rhyme thing, and oh they're opposites and they align beautifully, and actually, they're not opposites at all. The opposite of set is either reset or unset. That's the opposite, it's not get. There's this idea that without thinking about it, these words have shaped, or our expectation of these words has shaped our code already and our habits. So there is an idea here. One of the things I do when I'm not taking photographs of books and messing about with code, is I write, I have a hobby, I write short fiction occasionally. I got this rather nice Christmas present a couple of years back, a book on writing, and it comes with a pack of cards, the pack of cards is the most interesting thing. If you get stuck writing, then pull this out, and I've often been attracted to this idea, the first time I came across this idea was Brian Eno's oblique strategies, which is a stack of cards, if you are stuck in some artistic endeavour, and you find yourself at a mental block, pull out a card at random, and see what it tells you. It's a random piece of advice. There's actually a Twitter feed, which I follow as well, which has got some good stuff there. So this is the idea, you're stuck with writing, okay, pull out the card, choose the right name. Pull out another, turns out half the advice is applicable to code. Pull out, eliminate words, oh yeah, that's gonna be pleasurable. Even better, if it had the specific idea of eliminate lines, I'd be even happier. Get specific, sometimes people are very afraid to name what is going on, and so there's this circumlocution as they add lots of words around the idea without actually hitting the idea itself. And so, there are some examples where we can see how this stuff shapes our thinking. So a workshop example I've used for a number of years, and people have used this going back to the '70s, imagine a book-lending library and I remember a few years ago a colleague of mine said, oh Kevlin, look, book lending libraries, people have been using that example since the '70s, you've really gotta get with the times and use something far more contemporary, like, I don't know, video rental. Still here. So a member has a relationship, they have a loan, a loan is against a book copy. So we've got a reasonably good first point here about names, number of times that people will choose the word book is surprisingly high. When you start doing stock management, book copy versus book title, one is a product description, I can have five copies of Harry Potter and the Philosopher's Stone, how do I differentiate these concepts, there's two concepts there. One might be as it were, the stock versus the catalogue. So therefore when you are confronted with an ambiguous term, your challenge is not to choose which one the word means, your challenge is to avoid the word on its own completely. If book is ambiguous, and if you deal with publishers, they deal with book copies, book titles, book manuscripts, the word book can refer to any one of these. When somebody says they are writing a book, they're not writing a book copy, 'cause that's a very inefficient process. Oh you want a book copy, let me just, I'll get my monks on it, and they'll copy it out, that's not the way it works. But the fluidity of human conversation, and the way we switch context is absolutely brilliant, but in a code base, we don't have that same context switch. Code has context, there are nested scopes but human conversation is not nested scopes, so we have much more of a flow. It turns out some of the rules don't apply, and so we need to disambiguate. So book copy, loan, member, great, brilliant. Whenever I've run this, eventually people start saying, well you know I wanna decouple a few things, lets throw some interfaces in there. So we've got member, we've got book copy, what do people name these things? IMember, IBookCopy, I Tarzan, you Jane, you know there's all this. The point here, again, we can quibble over the use of the prefix as to its value, but the thing I want to point out here is whether you use a prefix or a suffix or anything like that, what's actually happened here interestingly is we've denied ourselves the opportunity for a good name. Normally when people use this, you can see this in the libraries, there's an iList and a list. Well, we haven't really said very much. Well what's happened is the naming has been motivated by the desire to avoid collision. Nothing more, it's not a desire for communication, it's a desire to avoid collision, and that's it. I'm sure we can say more about it than just that, but this is where it gets interesting. How do we think about these things? Just because of the way that you read things, the way that you align things, is just that, ah, we think of it like this. These are associated, we think of it vertically, we think of class hierarchies like that. It turns out that's not the right way to think about them. People often package these things together. In fact, you can see this going right back into the com era. In the com era, it was very common for people to package together interfaces in the same DLL as the implementations of those interfaces. That is completely wrong. An interface is supposed to be a separate concept. I should be able to use an interface without accidentally calling it, pulling in the implementation because it happens to be in the same unit of deployment. There's a separation that is only kind of nodded at, but it's never properly fulfilled. And so what we've got here is a kind of interesting case, the way the code is inviting us, the naming is inviting us to think like this. And in many cases we will find that people will package these things up, and group them like that, but that's not how to think about it. Let's go back again, really what we want to do is think about it like this. This is not about a member hierarchy and a book copy hierarchy. This is about a loan, a loan defines a relationship. It turns out that object orientation is slightly misnamed. It's not actually about the objects, naturally our IDEs draw us in towards that view, but actually it's about the relationships between things, that's where the fun happens, that's where the bugs happen. It's where the misunderstanding happen. The point is that what we're trying to do is describe relationships, but because languages do not tend to convey those kinds of relationships very strongly, we tend to rest on the ones that they do, which is why things like inheritance end up being that kind of flame to which every programmer moth is drawn. It's just like no, no, no, this is about loans, what we're doing is we're describing a loan. What is a loan, a loan is a relationship between one party and another. How do we describe that? Well, what roles do these play? We're asking the wrong question, we started from a concrete implementation, and now we're doing an extract interface, shove an I on the front and everybody's happy. No, no, no, no, we're asking the wrong question. What is the role? An actor's part in a play. A person or thing's function in a particular situation. Ahhh, this is really good. What is the role of that object in the relationship of loan, is it being a member, is it important that it's a member? What does a member have? They probably have email addresses, they have an address, turns out that's not really important. Probably whether or not they've got too many books on loan or how much they owe the library. You know, actually you can't take out this loan, that's probably more important. Is it important, all those details are not important at this level. In other words, abstraction. Abstraction's about getting rid of the stuff we don't need. What is the thing's function in this situation? Expected behaviour. Now sometimes people use that, and they will say, right okay, I will retain the I prefix, but I will now describe a capability, so I can be borrowed, for example, or I can borrow. That's fine, I'm not wild about that naming convention, but it's in the right place, its heart is in the right place, because what it's trying to do is extract the role, and the role has a really interesting consequence. If we just use the conventional terms, borrower and loan item, the problem is the English is not very good at this tone. Borrowable is just difficult to say. Loanable is not strictly speaking a word, so loan item, and I'm fully guilty of using the word item here, but treat it as a competition. There are lots of possibilities here, but definitely English is quite unambiguous on this term, borrower. Now what is interesting here is that when you look at it from this point of view, you realise those hierarchies were a distraction. It's about loaning. And it's the loaning, this is the relationship, this is a first-class citizen, it has a name. And in turn, to be a borrower, you don't have to have half the stuff of a member. So if you start from member and you extract interface, you probably bring an awful lot of baggage up. It turns out that borrower doesn't have most of the details of member. So the name has misled us, but I have seen equivalent examples in production code where people have effectively started from the concrete class, extracted the interface, realised they didn't need everything, taken away half of it, very good, but unfortunately left the name. And now it no longer seems to apply, here is a member that doesn't have membership details. What, that doesn't make sense. We've ended up calling it iMember. So there is this idea, I mean feel free to put in all the other stuff for reasons of convention, but I'm old school me, when I see the word interface in front of something, I'm pretty good at figuring out what it is. It's like, oh, you've got a whole language construct, a word devoted to that, I'm fine with that. However, the idea is you add all the seasoning afterwards. You cook the main dish, you get the meaning, then you add your prefixes and suffixes. Now it works the other way, sometimes people are sitting smugly, particularly if they work in other languages, and they say, oh look, oh no, no, I don't always do this. We don't do I, no, no, we're good. So here's an example I ran as a workshop with a group, a data definition language, parser, and they would come up with something that read the data definition, they'd have a builder that created a parse tree, a tree that contained the data, brilliant. And then some people would come and try to factor it out. Let's decouple the builder make it a bit easier to test and mock and all the rest of it, brilliant, fine. So what do they end up with? Builderimpl, same problem. It's the same problem, you're just being smug about the different parts of the hierarchy. It's actually the same problem. And the funny thing is, this is, it also becomes obvious that this has a forcing effect on the shape of our code. Because when you look at it, and you go, well hang on, what's really going on? Again, we've got that vertical alignment there. What's really going on is this is the relationship with the parser, it's not about builders, it's nothing to do with that. I'm a parser, I emit events. And so I'm gonna have a coherent interface that describes the things I can receive. Hey, guess what, this is an integer. We've just begun reading an array, we've just finished reading an array, here is a string, and so on. In other words, I'm gonna handle these events. There's lots of possibilities for that, what's the emphasis? The relationship is here, in fact a colleague of mine used to call this the outer face. You can't consider the parser on its own, it's a two-part deal, this is the misleading element of name-object orientation. It is sometimes people like to call it message oriented, but it is relationship oriented. And it's like, well actually, there's a parse and there's a thing that listens to it. Or there's an event handler or something like that, we can mess about with the names, but all of them suggest something very, very different to the shape builder. Now that's interesting, 'cause if I'm a parser, and I'm looking at a ParserListener, and this is one of those things I learned years ago when drawing on white boards, I'd always put a little eye in, or a little stick figure, because that gives you a point of view on the code. I am here, what do I see? Actually, I don't see that at all. You're listening to me, I'm just giving you events, you're doing stuff, I have no idea, what's this tree nonsense you're talking about? That is not relevant, it turns out that shouldn't be there. That's a stray dependency, it turns out that good naming will change your dependency structure, it will change your method set, that's how good a good name should be. It's not merely a label, it's actually, here's how to think about and reason about this thing. Quite literally what we're after is code that is reasonable in the classic sense of the word. You can reason about it, and you can draw reasonable deductions. Oh, we like our prefixes and suffixes, factory and impls, I have this one, an example. It turns out it's a connection pool, the fact that it's a factory is really not interesting, but factory is one of those bolt-on parts we find so convenient. And it's like, it's a connection pool. Now that tells me much more about this, we recycle connections, okay that's useful, and that is a pooled connection. That is what makes it it. So I'm gonna finish with some consideration of the way we name our tests. Because our tests are also code, they deserve our attention and our respect. And what we've got so far is the idea that any good name should be an act of communication. It should convey some kind of meaning, we should need to be more specific. It's no good telling somebody, hey guess what, this is a meaningful name, or name your variables meaningfully. Many years ago in a galaxy far, far away, well Whiteladies Road to be precise, first job I had after university, one of the, not a contract I was involved in, but my boss was involved in a job that is now down where Templeback is, and he came back one day, we were having a bit of a discussion about meaningful variable names, and he said, oh yeah, yeah, I've just seen a guy who's got a really meaningful naming convention. Yeah, do tell. A is the first variable that he thought of. B is the second. You can see where this going. Now if you say, well that's not meaningful, you're wrong, 'cause it has meaning. C is the third, if you showed me D, I would tell you that's the fourth one he declared. It's not the fourth one he used, because apparently this guy was big on recycling variables. Makes it extra exciting. Just as you figured out what it means, it changes, ah no. So there's this idea that that phrasing, and we sometimes catch us, it's like, meaningful, yeah, that's not very good, that's not, you know, 'cause it does mean something. What is the meaning that we want? It's something to do with structure, structure, perhaps not, something to do with relationships. Ah yeah, that's a little closer. Intention, purpose, these are the things that are quite difficult, quite slippery concepts. And so what does that mean when we start talking about tests? Because tests often end up being second-class citizens, 'cause they're not part of the production code itself. And so there is this question, what do we really mean? So Nat Pryce and Steve Freeman, so they are basically the parents of jMock, after which every other mocking framework is patterned. Are your tests really driving your development? Well, everybody knows that TDD stands for test driven development. However, people too often concentrate on the words test and development and don't consider what the word driven really implies. For tests to drive development, they must do more than just test that code performs its required functionality, they must clearly express that required functionality to the reader. There is that idea, you can see this one, this very simple idea is echoed across many different sensibilities that have become far more normal than when they wrote this originally 10 years ago. We see, that is they must be clear specifications of the required functionality. That ides we see embodied in BDD, but it is not unique to BDD, it's something that we want for all our tests. There is that idea, test not, and there is the fact that I want to be able to code meaningfully. Tests not written with their role as specifications in mind can be very confusing to read. So there's this idea of how do we capture the intention. So let's talk about meaning and object. Here are a bunch of books that have the word object in them. They are books on psychotherapy, they're my mother's, which might explain an awful lot about me. But it turns out that psychotherapists have a deep, a deep obsession with objects and all kinds of stuff that really doesn't, is not supported by modern understanding of the brain, and neural science. However, it does give me a great excuse to talk about, it's a stack, brilliant, I'm gonna talk about stacks. So it's a stack of objects, a stack of T, so we've got a stack of T here. What do I got, I got some kind of a representation, I got a default constructor, I've got a property for the depth, I've got a property for the top. I can get the top element, peak if you like, but I'm gonna use nonstandard names here. I'm gonna have a push and as it were a pure pop, it just pops, it doesn't have a side effect of returning T. So yeah, there's a bunch of stuff there, there's a certain intuition that we might have about its behaviour, but how are we gonna test that it is correct? And this is kind of one the lab rats of TDD, and it's quite interesting to see how people go about it. One of the most common ways of doing it, if I'm gonna do this in any unit, I might just go, right, brilliant, I'll do a one for one alignment. It's a very common thing, and there's enough code generators out there that support doing the wrong thing. Hey, guess what, you've written this, brilliant, would you like stubs for these? Oh absolutely because that was actually the hardest bit about testing. I mean really I had no idea where to start, but thank you for rewriting these signatures. That's really not helpful. First of all, we have a noise word. Test, let's get rid of it. How do I know it's a test? Well, it's got an attribute that says test, there is no ambiguity there whatsoever. The point here is that we've got that. Now it doesn't look very exciting, but there's something else about this one. If we look closely, there's a brittleness to our naming convention. The brittleness is revealed by the fact that in most cases for this, you don't actually need a default constructor. I did use, here's an interesting question, maybe you're guilty of it, but it's one of those things that I was looking at the other day, and I thought why do so many people have default constructors when there's actually nothing interesting going on? Or why are they initialising their fields in their default constructors, when they could actually use field initializers, which somehow is a little more direct? I was kind of musing on this one, but it does lead to this interesting observation that your tests are surprisingly brittle if you just base them on these things. This still has default construction, it's just not provided by the programmer. It still has an initial state you care greatly about, but it's not defined by the programmer. So if you just basically piggyback off the names, you're gonna find it's a surprisingly brittle approach. So this is our instinct, the problem is we have a very deeply ingrained instinct that things should align, they should be symmetric, that's our get-set, but in many cases, these alignments are not valid. If I'm gonna use an object, then I use objects in a context, to use an object, to push on an object, I must have an object. So therefore, what is its initial state? And then when I pushed on it, what questions can I ask of it? What are its properties? I'm gonna need a much broader description of the thing. I need a narrative, and it turns out the narratives are not, they're quite tangled. You end up in order to be able to push something, you have to have constructed the thing. So therefore I'm gonna use a constructor. If I want to say anything about its post condition, I then have to query its top and its depth. So therefore I'm going to use other properties, so you're gonna end up using the object, well that's kind of good, because it shows it's cohesive. You have a narrative that cuts across its interface. Now if you think this is unique to an object model, it's certainly not. If you have a function, unless it's hard coded to return one value, you're gonna use that function in different ways. And so therefore there's more than one scenario of usage. Show me the scenarios, don't pack them all into one test method. The notion here is that we take a very different approach. If I drop out the language construct, although I don't really buy into the BDD naming philosophy of using the word should, I regard it as a noise word, but if that's the one that helps you, then I absolutely, I'm its greatest fan, but I do regard it as sort of training wheels, but I think it's a noise word, it doesn't actually convey much meaning. But nonetheless, there are certain ideas in it that are very useful. The idea of thinking of your set of tests as a speck is very useful as a way of approaching it. And the idea is you want to try and describe the constraints, the behaviours, the properties of the code. It turns out that with this philosophy many different testing schools are a little closer together than people think. And take advantage of the fact a test fixture is not there just to test a class. Stack spec, a new stack has no depth, so we can create something that is a valid textual description of the thing. An empty stack throws when queried for its top item. An empty stack throws when popped. An empty stack acquires depth by retaining a pushed item as its top. A non-empty stack becomes deeper by retaining a pushed item as its top. And a non-empty stack on popping reveals tops in reverse order of pushing. Now this is a very descriptive kind of structure, there's no ambiguity here. What is also interesting is if you actually drew up a state model for a stack, it's not gonna be the most exciting state model. Hey, I've got a new stack, and so there's the act of creation, and then it's empty or not empty. But nonetheless this actually relates to it very closely. And one of the simple narratives for example-based test cases is given when, then. You actually see that what we've done is we've grouped by givens. All the givens are together. Given an empty stack, given a non-empty stack, there's this. You can separate this out by given a non-empty stack, when an item is pushed then, you can do all of that, but I've discovered that if you try and put given when, then in the names of things you end up with much longer names, and it just looks like bad english sometimes. But if that helps you think in that direction, then absolutely go for it and cook it down. But similarly we can group things, maybe the initial state is not the most interesting thing, maybe the when is the most interesting thing, in which case, you organise the relationships. When pushing onto an empty stack, then something. When pushing onto a non-empty stack, then something else. Brilliant, okay, so there we have when popping from an empty stack, it throws, when popping from a non-empty stack, you get the most recently pushed item. So you can organise it by the whens. You can also organise things by the thens, but that's actually less interesting for stacks, but it's really good for testing things like pure functions with particular classifier functions. You group things by their thens, their conclusion, their classification. So therefore this idea of adding structure and meaning, some of it you can see is spatial, it's not simply just in the name, it's an arrangement Of names and nesting and grouping. So that also tells us something that we can start thinking about and take back from tests back into our code, 'cause this captures intention, it captures a value. So here is a value of these things, what is interesting is I can actually state the value of this naming approach. When a test parses, it describes a property that I have in my code. So therefore when these parse, what can you tell me about stacks, well new stacks have no depth. Non-empty stacks become deeper by retaining a pushed item as their top. How do you know, because it's green, and the name is a proposition, so I used propositional naming. Whereas for side effect-based methods, I will use verb naming. For non side effect query methods, I don't use verb phrases like that. For classes, I will typically use noun phrases, and they fulfil a role. And here, when it's red, then it tells you a thing that you don't have. So if an empty stack throws when popped is red, then clearly an empty stack does not throw when popped. It's the negation of the proposition, I might not know why it does, but at least I know what. And so therefore, what I've done is I've said this is what I want from my tests, which takes us right back to the idea of actually this is what I want from all my code. I want the names to serve a purpose. I can be quite precise sometimes about this. So as a programmer, I want, what is it that I want so I can spend less time determining the meaning of the code and more time coding meaningfully? I want identifiers to communicat