Tuesday, November 19, 2019

How to Handle DateTime in Cosmos DB


     DateTime data type can be trouble. In SQL Server, there are many datetime datatypes. Some of them are in UTC format, some of them has time zone information, some of them is just basic date and time or just date. Depending on what you need, you can pick one of them and hopefully you pick the right one for what you need.

     What if you want to store datetime in a database and there is no datetime data type available? If you use NoSQL Database like Cosmos DB, you will have this problem. Data is saved as Json in Cosmos DB and other NoSQL Databases. JSON does not have a data type. Json supports only strings, numbers, Boolean, null, custom objects and arrays.

     Things can get tricky if you have queries that need to retrieve data by date. Since there is no date type, how can database engine compare date to date? I don’t work at Cosmos DB team, so I am not sure how Cosmos DB handles this problem internally but my guess the answer for this question is just like any database answer…. It depends 😊 

     I believe it depends on how you save the datetime information in the database. There are many ways to save date and time, the default way is; It will be saved as string but you can also save it as a number. Date format depends on how you serialize your data into JSON.

     Let’s say the date we want to save is 11/15/2019 01:58 PM and we are in Eastern Time Zone. Here are the most common serializer outputs. 

     .NET JavaScriptSerializer.                    "\"\\/Date(1573844334736)\\/\""
     .NET DataContractJsonSerializer.       "\"\\/Date(1573844334736-0500)\\/\""
     Built-in JSON object of JavaScript.     "2019-11-15T13:58:00.511Z"
     ISO 8601 Format                                    "2019-11-15T13:58:00-05:00"

     One of the most common and recommended versions of datetime in many systems is ISO 8601 format. This is the format Cosmos DB recommends too. Also; It is recommended to store all dates as UTC. It is recommended because it is readable, data sorts correctly, includes fractional seconds and it is endorsed by many groups including W3C.

     As you can see it does not matter which serializer you pick your datetime will be a string. In some way database engine needs to know that this string is datetime and use it as datetime to retrieve data. If this happens in SQL Server, you are looking at table scan because you need to convert all the data into the datetime data type then try to compare data. We are not even touching to potential Time Zone or elapsed seconds problems.

     You can save datetime as a number too. As you can see in the first two serializers, there is a number in the Date constructor. This number is known as Unix timestamp. It represents the total number of seconds that have elapsed since January 1, 1970. It does not count leap seconds! Cosmos DB’s Timestamp (_ts) follows this way.

     It’s easy to convert datetime into Unix timestamp in JavaScript. Since Cosmos DB stored procedures are in JavaScript too, you can convert the datetime coming from user into Unix timestamp then search CosmosDB by using a number rather than string. If you need to use SDK to find data, you can use UnixDateTimeConverter class to convert datetime to Unix timestamp in your code.

     To search by Unix timestamp from stored procedure or SDK; First you need to save datetime in Unix timestamp as number. I recommend saving the dates in string format and Unix timestamp format in the database. You can use Unix timestamp to search in database and you can use string format to display date in User Interface so you don’t need to do any conversion in client side or server side. You might need to pay little bit more for storage, but I think it's worth it. You can customize the Indexing policy and index only Unix timestamp.

     Here is an example, I have the date in ISO 8601 format and Unix Timestamp format in the following document.
     I have a container named Posts, It has 100.000 Json documents in it. Let’s say I need to find all the posts of a specific user after a specific date. Since I have the datetime in Unix timestamp. I will use CreationUnixDt property in my query, It should be faster to find a number than string in any database. Let’s test if this applies to Cosmos DB. First let’s search by CreationDate property which is a string.



     Now, Let’s use the same query but this time search by Unix Timestamp which is a number. First, we need to convert date (2008-07-01) to Unix Timestamp. There are many ways to do that. For example, In SQL Server you can use the following T-SQL

SELECT DATEDIFF(SECOND,{d '1970-01-01'}, '2008-07-1')

This returns 1214870400. Now let's use that in the query.


     There are interesting results here, first of all it looks like It costs more to search by a number. Index lookup time is significantly low when you search by a number and execution time is higher when we search by a number. I am surprised about these numbers. If this was done in SQL Server, there will be huge differences between these two queries. Results of these queries tell me that there is not that much difference between these two queries.

     You have one more option to save datetime in JSON object. You can create a custom object for it. This might cost you more for storage since you will have more properties to define a date and time. But you can search database however you like since it will give you most flexible solution, you store what it matters for you in this way. If date and time is crucial for your solution, this might help you to store datetime the way you want. You can store the location of user in this custom object too. Sometimes, it's not enough to know what time zone user was in when transaction occured. There are a lot of countries in the same time zone and current datetime data types do not include any information about the location of user.


4 comments:

  1. great information, the only article i found for cosmos datetime
    very less documentation for cosmos, and seems not much are using it....
    to find a small thing need to put lots of efforts

    ReplyDelete
  2. "Here is an example, I have the date in ISO 8601 format and Unix Timestamp format"
    I assume both are stored in local time, since date is not ending in Z?

    ReplyDelete
    Replies
    1. I'm exploring the option of storing both utc and local time to allow flexibility:
      "time":
      {
      "utc": "yyyy-MM-ddTHH:mm:ss.fffffffZ"
      "local": "yyyy-MM-ddTHH:mm:ss.fffffff"
      }
      Time in cosmos db is probably what is most problematic to get right, I wish Microsoft had enforced use of the reccomended format "yyyy-MM-ddTHH:mm:ss.fffffffZ". To much flexibility for such a critical part IMO.

      Delete