Saturday, April 7, 2012

Performing Range Queries on Amazon's DynamoDB

I’ve been working with Amazon’s DynamoDB quite a bit lately. Just this last week I found myself stumped by the matter of how to perform a range query on the DynamoDB. I knew I needed to use the "BETWEEN" operator but I couldn't sort out how to arrange the array's correctly in the attribute value list. I scoured the official DynamoDB docs and the AWS SDK for PHP docs. I couldn't find any examples of how to perform a range query and the documentation for the nested array of options passed into the query left me a bit muddled.


At this point all I had left were my wits so I started banging at the code until it worked. Thankfully, in the end I figured it out.


The "AttributeValueList" is itself an array that holds two associative arrays. These two arrays specify the type, string or number, as the index. The value of each array is the value to look for. Since the "AttributeValueList" is a regular list type array the first array it holds is the start of the range and the second array is the end of the range.


This sample should be fully executable with that latest(1.5.3) AWS SDK for PHP. I also have samples of how to create a table, update items in the table and retrieve a single item from the table. Though these are very nearly the same as AWS's examples. I made them primarily for demonstration. To use the example from start to end you'll want to put your AWS keys in the props.js file. Then run "createDatabase.php", "updateDatabase.php" and "betweenQuery.php" in that order. The between query should return 2 results.

One other important thing to note. As I currently understand it range key operations can only be performed on a single hash key. Say you have the hash keys 1, 2 and 3 and a range key based on the UNIX epoch recording when an event happened. You'll have to do a range query on each key and merge them on the application side to get a complete picture of all the updates to those keys during a give time range. 


With this in mind a friend of mine suggested that hash keys are rather like table names while a table in DynamoDB is more like database. It being understood that like all NoSQL databases you can't do joins across the tables this comparison holds up rather well. This inability to query across multiple hash keys fits with the description of Dynamo in the original paper. The paper outlined how data is partitioned across servers based on an MD5 hash of the hash key. Therefore, if range queries could be performed across all available hash keys considerable load would be levied against the whole cluster. Amazon, understanding this, appears to have put a hard restriction on queries to ensure they can reliably respond to requests within milliseconds.

No comments:

Post a Comment