Sessions is temporarily moving to YouTube, check out all our new videos here.

DocumentDB

Mark Allan speaking at Dot Net North in March, 2017
399Views
 
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

DocumentDB is Microsoft's cloud-first database solution for schema-less data. With DocumentDB you can query any collection of JSON data with SQL and JavaScript, and because it's designed for the cloud, you get guaranteed low latency, global replication, unlimited scale and a usage-based pricing model based on storage and throughput. Mark will discuss the usage of DocumentDB and how querying, indexing, consistency and so on differ from a traditional relational database.


Transcript


I'm Mark, that's me. I actually, I'm from Manchester originally. I was brought up in Stockport, then Burnage. And then through a series of unfortunate events, ended up in Northern Ireland. So I've been away since 1980 something or other. So it's nice to be back. Thanks for having me for a few hours, at least. And, yes, I'll be talking tonight mostly about DocumentDB and Azure as parts one and two. Azure functions such as about parts one and two. There is a small elephant in the room. So I'm just going to do a small part zero as well here. We can't get passed Azure functions without saying the word serverless. And, unfortunately, serverless is, I'd love to find the person who thought of the word serverless and give him a really big hug because it's an awful name. When Azure came along, it actually started as platforms of service. They got into Vms eventually, but they actually started as platforms of service, so in other words, as well as shipping off the operating system, you shipped off the platform so that as long as the code you were writing was aimed at one of their supported platforms, you could just upload the code onto it and go. Now anybody who's used Azure app service will have seen that in action. The main problem with that still is you have to set up a number of instances, you have to side your instances, so it's still very obvious that you have a number of servers there running and have to be able to increase the number of them, make sure they work. If you've got three of them, you need to make sure it's designed in this proper way so you can share states across them. New hotness recently, containers. So, basically Docker. With the whole new dev ops movement, there's been a big push towards containerizing everything. So in a container situation, you're still effectively running a tiny, tiny VM, but it's so tiny that you can run hundreds and thousands of them on one machine. Now, again, you still have to run up something like Kubernetes or Docker Swarm, or something like that to actually orchestrate all those. So, from there, the natural evolution from that is serveless, from the code side at least functions as a service, where you're just throwing your code at the cloud and saying, please run my code. So the essential aim is the servers are completely abstracted away. You don't need to care about what's under there at all. You don't need to press any buttons to give you more servers or fewer servers or to orchestrate them. You literally just say, here's a bit of code, run it. Or here's a bit of data, store it. Still you're coding data. That's my personal feeling around serverless, there's being a new term. There's lots of people, lots of thinking, lots of different things. Some people think it's just the code, but essentially the way I'm looking at it is you've got your code and data, that's what effectively makes up your application. You're just throwing it up to the cloud and letting the cloud handle everything else. So you've got, in theory, fully elastic scale from not running anything at all to running petabytes of data and zilions of hits on your website. At the same time, you also should be getting elastic price. So if you're, if you set up your startup on serverless, then when nobody's using it, you're not paying anything. When you then expand and suddenly hit pay dirt, then all of a sudden your hit traffic goes up and you'll start paying for it at that point, but you don't have to do anything. So you end up with a very low admin cost, so you end up having your cake and eating it, or at least having a very large amount of cake but only paying for the bit that you do actually eat. You know, I like the minimal admin. Personally, I'm a developer. I'm not a ops, so I like to keep the ops part of dev ops down as far as possible. In terms of what's on Azure, like I said, the code bit is what most people think of serverless as. So in fact a lot of people think of serverless as Amazon Lambda. But Azure functions is what allows you to actually just take snippets of code and run them into cloud. Azure Storage, does anybody use Azure storage for anything at the moment? Show of hands. Yeah, a couple. So you've already got the idea of having code, the data storage in the cloud where, again, you just pay for the amount that you're storing. If you only got one gigabyte, you only paid for one gigabyte. But if you upload 3,000 gigabytes, 3,000 gigabytes are available to you. And DocumentDB is a more structured one, which I'll move on to shortly. Some people also bring in things like logic apps, which is drag and drop workflow to stick things together, and obviously there's things like cognitive services Azure ML where you can put big processing loads out onto cloud as well. But certainly the first two most people think of as serverless and definitely the first one. How many people here use SQL Server? And how many, good, everybody. How many of you use Azure SQL of those? Not so many. And how many use any form of no SQL database? So anything that's not. Ah, good, good. So a lot of you will know roughly what we're going on here. But a document database, essentially, it's just a key value store, but the value that you're storing is a document, usually JSON one or two storage is XML. What this is useful for, you're not getting relationships. So you don't get the relational part of of a relational database. But what you do get is a schema-free storage. So, for instance, if you were saving off order confirmations for your online shopping, and then at some point, the format of your order changes so that you've got extra information on there. Like you stopped stored an extra social media channel against the contact or something like that. You can, with a document database, you can just add that extra bit of information to your JSON document and still store it away in your orders, collection table, whatever you want to call it. You can because it's a fully document database, unlike table storage, Azure table storage, where you're just keyed on the key, DocumentDB will actually index on what's in that JSON document. So if you want to pull out a customer I.D. or name or something like that, you can do that. Regardless of the fact that your schema might have changed halfway through. Because you don't have the relational side of things, of course, it's very easy for them to make it extremely scalable, so you can get up to petabytes of data supposedly with DocumentDB if you want. Or you could end up with no data in it at all. The thing that I really like about it is that you're storing objects, you're storing documents, you're storage objects. You will not need to touch entity framework at all. You will not need to touch any ORM. ORMs are evil. The spawn of the devil, I hate them. So for me, DocumentDB, perfect. DocumentDB itself, like I said, there's various other document databases, servers, the Amazon one, there's Apache have thing called CouchDB, and the old granddaddy of them, MongoDB, which has recently changed to be sat on a Postgres. But DocumentDB itself was built by the, well, specifically, the Azure SQL team. They put a lot of work in SQL server to put in JSON colon types. Automatic indexing, you know, if anybody's using, the ones of you who are using Azure SQL might have got query index recommendations through every software saying, did you know if you add this index, you'll get faster query forms based on what you're putting through it. So, they took that work and built DocumentDB fully cloud first, so, well, as part from the emulator, there is no locally hosted DocumentDB. It is entirely cloud database, is a service fully managed system. Because it's in the cloud, you get web scale there is a joke for anybody who knows MongoDB. MongoDB advertises itself as being web scale despite the fact that it keeps throwing away people's data. But you get full, you get as much storage and throughput as you want. There are actually some very aggressive SLAs for what latency and what throughput you get with DocumentDB. So as large you build it, the performance will stay the same supposedly. It's all on SSD storage, apparently underneath. So you really are getting maximum performance. No SQL is a bad description for DocumentDB because you query it using, well, a variant of SQL. There is an emulator which you can use to develop locally. It doesn't have things like the consistency levels and stuff that the full cloud has, but you can run a few collections locally and practise your API work and get it working, and then migrate it up to the cloud once you're ready. So, if you're setting up a DocumentDB, you'll be going to databases here and their no SQL DocumentDB. Can anybody read the screen, by the way? - [Man] Can you make it bigger? - Yeah, I can make it bigger. I can make it smaller, bigger. Right, so, first thing to notice here, you've got API among DocumentDB and MongoDB. If you have something that's written for MongoDB, you can actually make DocumentDB wire compatible with it so that you can just change your Mongo connexion string to point to DocumentDB and move your workload over just like that. Not an awful lot going on else there. It does take several minutes to actually create one, so here's one I made earlier. And I'll just walk you through a few of the more interesting bits of what you get in here. So, first thing to note there, by default, you end up in just one region. But literally just by ticking one of these, you can make your database geo replicated. Down here you can decide which is your main right region and which ones are gonna be read only or read write regions. But at that point at which you want planet scale or just have a local database in the U.S. and another one in Europe and another one in Asia, you can just start ticking on there and hey presto. I'll go into a lot of these in more detail later, but you get a consistency level there. So you can decide, particularly when you're replicated across a planet, you can decide whether you want your rights to be immediately readable or whether you want eventual consistency, that sort of thing. You get some access keys. And then within your database, you'll have a pile of collections. So those are your actual document stores. Now, this is where we slightly depart from serverless right now. At the moment, you can choose between a 10 gigabyte one and a 250 gigabyte one. And you can add anything else you want as long as e-mail support and ask for it. No, apparently, they are, you know, soon you should be able to actually set that manually, just going through an and pressing the buttons and going. Ideally, what I'd like to see them to do and I assume they're heading in that direction is for this to become more elastic, so it will actually scale based on your throughput, but at the moment, they say it's because of the three point guarantees, they have to know in advance how much traffic you're planning to put through it. So, at the moment, you reserve an amount of throughputs, and that's what you get. I'm assuming that they're aiming to get fully elastic at some point. And I'll come back to the difference between partitioned and unpartitioned later. Here I've reserved 400 RUs. RUs is a magic number, it's called a request unit. And it's actually the amount of processing needed to get a one K document. But basically all the throughput pricing, all the non-storage pricing is in RUs. So I'll show you at the end how that works. Settings, not actually going to be an awful lot in here. Indexing, which I'll come back to, interesting one there. You've got a TTL. If you're, for instance, saving off log files but you want to sort of get rid of them automatically after a month, you can set the TTL either on the whole database collection or an individual document so that you don't have to come back and run a housekeeping job later on and clear up. And then down here you get your SSMS type stuff. So here, I've got my actual documents. So I've got just effectively three rows in my table three documents in my collection here. I've got just JSON, I could have three completely different JSON documents there if I really wanted to. And here again I'll come back to this a bit more later, but I can say select star, and I'll end up with all three of my results in there. And there's usual metrics and stuff as well. All right, so, if you're thinking this from a SQL server point of view, that would be your server, your account. That would be your database. Within the database, you've then got users which are effectively collections of information. And then a collection is sort of halfway between a table and a database. It works the same as the table in that it's effectively just a collection of documents or rows. But because you're not relational, that is your whole database. So that's actually the level at which you connect store procedures, triggers, and functions. The other thing we noticed in there was called partitioning. Again, the ones of you who've used Azure Storage, this will be somewhat familiar to you. If you've got a tiny collection, you just have a key and then the document. Now, the limit on that partition, you know, there's a hard limit there of 10 gigabytes of storage. And a hard limit of 10,000 RUs. If you want to go beyond that, then you'll have a partition collection. So each individual partition will have that same restriction. But beyond that, there's no actual limit. Now, as we've saw, there's a soft limit of 250 gigabytes on the current one. But if you contact support, you can get that raised as far as you want. Partitioning is one of the things you need to think about up front because of the fact that there are limits per partition. But essentially, you want to think of, it's basically any path in your JSON document you can use as a partition key. And that, all the documents that share that partition key will get physically stored together, so they'll all be on the same disc, in the same place, and will all be retrievable together, searchable together by DocumentDB behind the scenes. Now, for instance, if you're using DocumentDB with IOT and you've got each of your devices uploading its information. You probably want to use device I.D. as your partition key. Even though they'll only be potentially one document per key, That'sa perfectly legitimate way to do it. Alternatively, if you were use a multi-tenant system, where each tenant was storing an amount of data within that limit, then you might use that as your partition key. Within the documents, you're going to be querying using a SQL' based queries index. So in other words, I think they call this an extended subset of SQLs. So, in other words, what they've done, they've taken SQL, they've thrown away everything that isn't the select statement. They've thrown away anything that relies on the fact that you've got multiple tables in relations. And then they've added extra things because you're dealing with essentially untyped data. So you get all these extension functions like is define, so is this even in my document? Is it an array? Is it a number? Also for things like, it has geo support, so you've got things like ST distance, so you can work out where which ones are within a certain distance of a point and stuff. You do still have stored procedures, and you still find functions and triggers. But they're written in Java script rather than SQL. As a side effect of that, that's how transactions are also performed. Essentially, you're transaction is a stored procedure. So if your stored procedure runs through, your transaction gets committed. If your stored procedure fails, your transaction gets rolled back. So there's no specific commit or roll back. It's just run the store procedure, it works or it doesn't work. As far as using Java script, before anybody asks, no, you don't get NPMs. So you can't run off a web server in your stored procedure. You are sand boxed to what DocumentDB gives you. So you get a context through, that allows you to perform your SQL statements. And you do get some useful link style functions in there. So you can do your map redeuce type functions Just wanted you to have a look through at the sort of things you can do. So in this case here, they're using a food based example. But you can see, that query there looks very, very similar to the SQL that you'd be writing every day. Filtering is very similar. Order by, very similar. Now, obviously now, we're getting to, in this one, we're getting to the stage where we're noticing that we're in a document. So we're ordering by the weight there. But that's part way down. So we're not ordering by just dot weight in grammes. We're taking the foods. Taking the first of the serving information, then taking weight in grammes and ordering by that. So we're starting to diverge slightly from normal SQL. The other interesting one is you do actually get joins. Now, I know I said we're not relational and we only have the one table. What the join actually is is an intradocument join. So if you're got a parent and some children, your join is joining the parent document against the child document. So you'll end up with, you've got four children with four results. There's also flattening functions you can do beyond that, so you can decide whether when you come back whether you're returning all four of them in one or whether you're returning four separate results. I won't go too much into that, but the other useful thing we've got here, there is a nice little PDF here which contains essentially the whole SQL syntax on one convenient document. So you can see order by, in, all that sort of stuff. You've got your coalescing operators and other stuff. You can use parameterized SQL there. And some of the built-in functions. Lots of spatial ones down here. And then most of the rest of them are to do with the fact that you don't know ahead of time what you've got in your document or what the types are. But it's quite easy to get the hang of if you've done normal SQL. Once you get your head around the whole joining to yourself thing. Because you're potentially running huge amounts of data, you've got lots of partitions, you're spread across the planet, when you do a write, you've got to start thinking about how quickly you want that write to be committed and how quickly you want to be able to see that. So you get a selection of what they call consistency levels. Now, again, these are set at the account level, but you can then override them at any level below that using the APIs. Strong is what you'd expect it to be. If you write, you're strongly consistent, nobody will be able to read until that's committed and they get that new value. Now, the more interesting one is what's called bounded staleness. You can specify either an amount of time or a number of writes beyond which you don't want to go back so you can say I don't want anybody to get more than a 10 second old result. I don't want anybody to get anything that's more than three versions old. Again, this is the sort of thing you'll have to do a bit of planning for to work out where exactly you want that. But it's an interesting level between full consistency and eventual consistency. Session, which is the default one, basically means mostly we're gonna be eventually consistent. But for my session, so if I'm on a phone in an app, and I make a write to my high score table. I want to see my own high score back immediately. So I will always get my own writes back, even if somebody else doesn't see my high score yet. And then eventual consistency just means do your best. It's useful for situations where as long as it's consistent within itself, it doesn't matter if you're getting something a little bit old. Similarly on the indexing side, whoop, excuse me. I'll just going up from the bottom here. You can also index consistently or lazily. Normally, you want a consistent index so that as soon as you've written something that's in the index you can pull it back. But if you're doing extremely high levels of writing or if, for instance, if you're bulk loading, you can say give me a lazy index. And, again, that will just capture when it's ready so you can upload your millions of rows and the index will capture. Now, obviously if you've gotten strong consistency on the collection for reading, but you've got lazy consistency on the indexes, then you're essentially going to end up being eventually consistent anyway. If you fetch something it will be up to date but you might not even know it's there. But the way indexing works is every single path in your documents is by default indexed. So if you've got a string, that will be indexed. If you've got a number, that will be indexed. Anything that DocumentDB recognises will get indexed. So if you've got a geo JSON, so that's essentially a little JSON object that says type point or type polygon. And then some lat long forms. And if it sees one of those, it will index that spatially as well. Now, obviously, you're getting charged for storage in DocumentDB, and indexes are storaged, so if there are particular paths that you don't want to index or if there's only a couple of paths, you can either opt out particular paths or you can opt out the whole thing and just opt in certain paths. Your I.D. and your partition key will always get indexed but anything else you can decide whether or not you want it indexed. Security, for the most part, if you're talking from a server, you'll just take the master security key, and that will be that. There's two keys, a primary and a secondary, the idea of the secondary is you can use that if your primary key gets compromised and you need to change it, you can switch to second tree, change the primary key, and then switch back to primary. But there's also user-level authentication, if you want it now. I'm not going to endorse this, but suppose you have a Zamran app, and this is what they try to push, that Zamra can talk directly to DocumentDB, so you've got your games and you can store your data in the cloud and have your high score tables and all that stuff. What you can to, as long as you set up a simple middle man service, your app can call to the service, say I would like a token for this user, please. Now, as I said, the user will have specific access to specific either collections or individual documents written that collection. But the mid man service can then call off with its master key, go and fetch a token for that user, return it back to the app, and then the app can then go and talk directly to DocumentDB with that token. By default, that will last an hour, but you can push it up to five hours if you want. Moving on to how the API is used, the essential way you talk to DocumentDB is through a rest API. So everything that happens to DocumentDB goes through rest. Everything in if we, if we go back here, you can see everything has a URI, so slash DB slash collections slash documents. So everything in DocumentDB has a URI. And then you can use those URIs in the RESTful API to access all the resources. Now, obviously, not everybody wants to do that, particularly the authentication on those sort of services is a right pain to actually code up yourself. So they've wrapped that in a variety of dot net APIs, Zamran, as I mentioned, and also dot net core and traditional dot net. And they've also got wrappers for no Node, Java, Python, and even C plus plus for Win 32 apps. They all wrap the HTTP, but they do give you extra stuff like retry logic exception handling, all that sort of stuff. So they'll translate all that into the paradigm that you're used to working with. So here's one I made earlier. So, I'm assuming since we're a dot net north, we want to see the dot net C sharp version. And can you read roughly what's going on there? Yeah. All right, so, essentially, I've got a database called API demo and a collection called demo collection. And that's my end point there, so that's the information I need. And in secret dot CS there, I've got my key as well so that nobody's going to go and hack into my demo database. And down here, I've just got a thing that creates a new database or creates the database if it doesn't exist. Creates the collection if it doesn't exist. That gets the URI of it, and then using that URI, shove some documents in it. Does a SQL query on them. Also uses, if you've used your link to SQL, very similar thing. You've got a link to DocumentDB provider as well. Then a simple example there of doing the same sort of thing but with paging so that you can fetch 10 results at a time, 100 results at a time. And then an example of spend stored procedure. So, terribly complicated, create database. You go and look for a database with the I.D. If there is one, delete it. Once you've done, create a new one with that I.D. Similar sort of thing for the collection, create one with that I.D. We'll give it an indexing policy. The precision of the index there just means index everything. If you're indexing a string, you can if you want to save time and storage. Ask to just index the first 10 characters. If you've got a number, you can say index the first to a particular byte precision or digit precision. Again, we're creating a collection there. A few more options there, total throughput. And then to dump a document in, we literally just say create document. Since we're in dot net, we're using a strongly typed one here so we can serialise it and deserialize it. Now, in fact, that would work just as well if I just took that out and made that an anonymous type, that would actually still work. But when you pull it back in a strongly typed language, you tend to want to be able to work with a non-dynamic object. So that's why we're using that there. That will then create two documents in my collection. I can then create a query here using a SQL query spec. And just say use a bit of SQL there. Now, here I'm using a parameterized query, so I'm just passing in a SQL parameter. Same as link to SQL. As soon as you twolist that, you'll get a local copy of it. And then you can go and say how many you found. And the similar sort of thing here for using link, so we can say, so we can query the collection, pull out based on the last name there. Select just the name and two list it. And that will convert that into pretty much the same SQL query there and pull that. Useful, interesting thing, cross partition query. If you've got a partitioned set of documents but you're not aiming at a particular partition key, you can specify how much parallelism you want. So if you're pulling for instance on a name but you don't know which partition that's in, you can tell DocumentDB that you actually want to fire off a query across all partitions at once. Or you can, if you want to have it slow but more consistent, then you can tell it you want slight, just a serialised search. But again, this uses under the hood, if you're using this through Rest, you'd end up with what's called a continuation token, so you'd get the first 10 results back, and it would give you a continuation token, you'd then fire the continuation token back at the service and it would give you the next 10 results and so on and so on. So in the API here, we've got that wrapped as a has more results. And then finally, this is a store procedure here. So this is actually, this gets uploaded to DocumentDB as a store procedure, but it's just a Java script function. Doesn't really matter what it's called. You get as part of your run time there, you get this get context message which returns the DocumentDB context, and then using that context, you can then go and pull out the collection you're in, go and start querying the documents using SQL. And down here, we can see we're throwing an arrow there. If we throw that arrow, that will then roll back the transaction, and that will again get that transmitted to the dot net API just as a normal exception, which you can then capture and handle as you wish. Let's get back to the work. So here we're reading that store procedure, creating it, and then running it. And, yeah, you should then find we got two families and that's a result from the stored procedure of select star. Just worth noting here, when you get it back, you get all the stuff that you asked for in the first place. You also get these extra little bits and pieces, the RID there is the internal resource I.D. Self is the URI to this particular row, document, whatever you want to call it. And then there's an e type for consistency as well. So if you need the, that will get exposed to you in the API as something called self link and I.D. and things. But that's all stored, and if you go and query, in the query explore, it will say that as well. Any questions on the API? Or anything else so far? The query looks very similar, SQL queries look very similar to HQL. Is that the Hadoop query language? Hibernate query language, okay. As I say, I hate ORMs. I never use hibernate. Never use hibernate. But, yes. It's probably been inspired. No point reinventing the wheel, I guess. Does the eventual consistency approach guarantee order easily? Not documented, I think is the short answer to that. I believe I have heard that it does. Don't take my word for it. But I think it does. What happens if one of your petitions exceeds its 10 gig? Your writes will start to fail, basically. Yeah, so if you've got a multitenant set up, and one of your tenants goes over 10 gig, then, yeah, they will run out of storage. So that's either your excuse to upsell them to something different or your chance to go back to the beginning and rework out your petition scheme to what it should have been in the first place and then go and do that. That's why I was saying partitioning is the thing you need to consider upfront. Nearly everything else is elastic is flexible, but partitioning, you need to think about it and get it right, ideally, before you get too far down. - [Man] Does that matter if you partitioned it too small - Yeah, small, large granularity can get you into trouble. Small granularity is perfectly fine. Like I say, if you literally you could have your I.D. is the partition. And you can have the all the old partitions. You, obviously, if you do have related elements, then may might end up being not physically stored together, so things might slow down in that situation. That's one of the advantages of keeping things in the same partition. But if that works for you, then you can certainly do that. All right, coming to the end of this bit now. Things that we'll integrate with at the moment. Obviously, Azure being Azure, they keep changing things. So everything I say tonight is probably changing as we speak. I have literally given a talk on Azure, and they've announced new stuff while I was doing the talk. But last time, when I wrote this slide, you could integrate it with Hadoop to get analysis over your, obviously, you're potentially having petabytes of data, you've got big data, you may well want to fit it into Hadoop and do your map reduce analysis type stuff on it. You might have seen as I was whizzing through the portal there, there's an enable Azure search on it, so you can use Azure search to search through all the stuff in your documents. Backups, you do get some automatic backing up going on. So, you'll get regular backups automatically made. And they will be stored for 90 days. If you want to get one of those backups back, you actually have to e-mail support and say, oh, damn, I just deleted all my documents by mistake. Can you restore a backup for me please? The alternative to doing that is as your data factory which you can then use to export to blob storage and reimport from blob storage without having to go through that process. And if you want to keep backups more than 90 days, obviously. The data migration tool is very useful for getting your data in and out. So that will, as well as DocumentDB, that talks to SQL and Mongo and CSV. And if you've used SSIS, it's kind of a similar thing, but not tied to SQL server. So you can use that to load up your data initially, or to pull it all out to move off to Mongo, if you really want to. Power BI, obviously, you can use to do your analytics and graphs and charts and all your dashboards for the management. Useful little API they've got as well introduced not too long ago, they've now got a change feed API. So you can get push notifications when documents in the collection get updated. So if you're into that event-driven reactive style of doing things, you can actually push that directly from the database all the way up. Pricing, the fun bit. Right, storage is very easy. You pay a few cents per gigabyte, whatever it is, a few cents per gigabyte per month. Or if you've got lots of indexes, you'll pay a bit more. If you've got lots of regions, you'll pay more because you're storing in lots of regions. Throughput is the fun bit. Basically, at the moment, like I said, you have to reserve an amount of throughput, and you'll get throttled if you hit that level. So you'll get a rejection back. If you're using one of the APIs, it will do a certain amount of retrying. But essentially you'll get throttled to that throughput. And it's all done in this weird, mysterious thing called request units, which they define as the amount of processing or throughputs required to get a one kilobyte, select a one kilobyte document. What they do give you, though, is what they call the capacity planning and pricing calculator. So we go for the capacity planner there. What you can do is you can upload a sample document here, tell it how many you're going to be storing, roughly how many you'll be creating, how many you'll be reading, how many you'll be updating, how many you'll be deleting. When you feed that all in, it will come up with roughly how many request units per second it thinks that you'll be using. You can obviously on a per collection basis, if you've got varying large documents and small documents, you can upload different documents with the different numbers of reads and writes and get the total. And then once you've got that going, then you can go off to the pricing calculator, feed in how much storage you think you'll be using, how many argues you were given. There's a thing there so you can work it out in days, hours, and months. And then it will tell you how much it thinks you're gonna get charged per month for, per day, for that amount of throughput. All right. Just some resources there for the things I was showing you. And, all right. That's the end of part one. Anymore questions? Yeah. The transactions in stored procedures, are they limited to one collection partition? No. They are fully access across whatever you end up updating. So, yeah, even if you update across regions, it will all get rolled back if you lose your transaction. Yep? Does it have out of the box support for Hadoop or Spark or other things? Well, it was out of the box support for HD Insight, which is the Azure hosted Hadoop. So, yeah, you get that. You wouldn't, as far as I know, get anything for Spark, for instance, at the moment. There may well be something out there open source, but there's no first-party support for that.