Amazon SimpleDB Developer Guide: Scale your applications database on the cloud using Amazon SimpleDB

Free download. Book file PDF easily for everyone and every device. You can download and read online Amazon SimpleDB Developer Guide: Scale your applications database on the cloud using Amazon SimpleDB file PDF Book only if you are registered here. And also you can download or read online all Book PDF file that related with Amazon SimpleDB Developer Guide: Scale your applications database on the cloud using Amazon SimpleDB book. Happy reading Amazon SimpleDB Developer Guide: Scale your applications database on the cloud using Amazon SimpleDB Bookeveryone. Download file Free Book PDF Amazon SimpleDB Developer Guide: Scale your applications database on the cloud using Amazon SimpleDB at Complete PDF Library. This Book have some digital formats such us :paperbook, ebook, kindle, epub, fb2 and another formats. Here is The CompletePDF Book Library. It's free to register here to get Book file PDF Amazon SimpleDB Developer Guide: Scale your applications database on the cloud using Amazon SimpleDB Pocket Guide.


  1. Tell us what you think
  2. rusoto_sdb - Rust
  3. SimpleDB versus RDBMS
  4. Class AmazonSimpleDBClient
  5. Amazon SimpleDB

Before you use boto, you must of course set up your environment so that boto can find your AWS Access key identifiers. Set up two environmental variables to point to each of the keys as we have done before. We are not providing a location for the bucket, which means it will be created in the U. This is the key to the file in S3. We can make it expansive and add lots of other attributes such as file extensions, custom header data, and so on, but we are going to keep this simple so the concepts are clear and add these two attributes.

The file key needs to be unique as it is also the key to the actual stored file in S3. The simple way would be to use the song name, but that will not work if you have more than one song with same title. We will use a MD5 hash that is generated from the name of the song, name of the artist, and year. This should be unique enough for our purpose, and will be stored as part of the metadata for the song in SimpleDB. We will accomplish that by writing some Java code to go through each item in our domain, generating the hash for the key for the file located in this directory, and updating the attributes.

We will do the actual uploading of the song file to S3 in the next section. This approach is useful if you want to store songs in different buckets. The bucket we created before is publicly accessible and when we put in the song, we will also set the access to public read. The FileName for storing in S3 is the item name dot song title. This approach would only prevent two identical songs uploaded to the same bucket and same item name. We will discuss the code in detail in the Uploading the songs to S3 section with PHP, as the approach is different than the Java and Python examples.

Creating additional metadata with Python Python includes the hashlib module that makes it easy to create MD5 hash digests. It also supports algorithms other than MD5 for the hashing, such as sha1, sha, sha, sha, and sha In my case, they are all named with the convention—the name of the song. So that means a file listing of the songs on my laptop looks like this: Now we need to add the additional attributes to our existing songs in the songs domain.

We will accomplish that by writing some Python code to go through each item in our domain, generating the hash for the key for the file located in the directory we created, and updating the attributes. This is one of the truly impressive things about SimpleDB—the ability to dynamically add new attributes to an existing item. No schemas to worry about, no upgrades, just a simple web services call! No database administrators sitting on your shoulder about migrations!

Any time that your use case changes, or you need to store additional information, it is extremely easy. The Java and Python examples will use a hashed filename as the key while the PHP code will use a name composed of the item name and the MP3 filename. Now all we have to do is upload each corresponding file itself to our S3 bucket that we created earlier in this chapter. The first is used to select the parameters of the upload: the file, bucket, and item to upload to. The second program using the parameters from the first program does the actual upload.

The steps involved are as follows: s3upload.

Display list of buckets. Display list of items including a link to the song if it has been uploaded already. User selects an MP3 file to upload. User selects a bucket. User selects an item name. File uploaded to temp area. Copy temp file to S3, naming it with the FileName from step 3.

Add the attribute FileName with the value from step 3. If a previous FileName exists, replace it. If a previous FileKey exists, replace it. Let's walk through these one at a time. This is the user interface. This is obviously a very basic user interface but the focus of this sample is on illustrating uploading a file to S3, not good GUI design.

Let the user select a file to upload. The program will call s3uploader. Select file: List the available buckets. First create an instance of S3 object and echo a list of radio buttons with the bucket name. File, 2. Bucket, and 3. Item and then clicks on 4. The file is uploaded to a temporary location. PHP provides several variables with information on the file.

The sample has a basic filter to permit only certain file types to be uploaded. Next is a check for maximum file size. Again this is not required but is protection against a user uploading gigantic files. Next, query the information on the file, which is now in S3. We will loop through each item, look for a file named using our convention, and upload it to S3 using the generated hash as the key.

Sounds much harder than it actually is, especially when using Python! That's it. Now we are all done. Our songs domain has all the metadata and the songs themselves are in S3!

Selecting the Right Database for Your Application - AWS Online Tech Talks

Retrieving the files from S3 Downloading our files is quite simple. We can use any of the libraries for downloading the files. S3 supports two ways to access files: a file can be set with a permission of public that is everyone can view it or private the key and secret key are required for access.

Here is a quick way to download the files using Java. The song items that have FileKey are listed. Then just click on the player to hear the song. Let's look at several examples of BoxUsage and understand how SimpleDB changes for usage can make your application not only faster but cheaper to execute. In backup, an SQL statement is used to define what part of the domain is backed up to S3.

For the songs domain here are the two extremes. First select all 11 items in one call. This makes comparing two usages easier. While the second scenario required 11 calls to SimpleDB, it consumed only just over twice the time. The lesson here seems to be that using NextToken is not very expensive. One of the most glaring examples of how different strategies can affect cost is using Select versus getAttributes for retrieving the attributes of one item. In the songs domain there are two ways to retrieve all of the attributes for item getAttributes and Select. As Select is more flexible, the tendency is to use that, but what is the cost?

Both getAttributes and Select can retrieve an item by its item name. For example, query all attributes for item. Cost of Select Certain capabilities of Select have the potential to be expensive. Let's look at several queries using the 11 items in songs. Equal and begins with LIKE are charged the same, but LIKE for a string anywhere in the value costs more as a complete table scan is required to check every record for the substring.

This is on a very small domain so the cost difference is not significant although on a large domain it may not be.

Tell us what you think

Cost of creating a domain Creating or deleting a domain is expensive. It consumes BoxUsage of 5, muH. If you create a domain and it already exists, you are still charged the cost of creating the domain. A common coding practice is to verify a table exists before writing to it. In SimpleDB if you create a domain that already exists there is no error and the domain remains unchanged.

But it is far cheaper in SimpleDB to check if the domain exists rather than creating by default. Domain songs created BoxUsage: 0. It is best to check if the domain exists and only create it if it does not exist. Check for the domain's existence with domainMetadata Domain songs Metadata requested the domain exists BoxUsage: 0. The article does not cover Select or batchPutAttributes as they were announced later. Cost of creating items Multiple items can be created with putAttributes as well as batchPutAttributes. The first makes a rest call for each item, the second can create 25 items in one call.

If you are creating more than one item, my experience is that the batchPutAttributes is cheaper. Car1, car2, car3 created BoxUsage: 0.

Werner Vogels' weblog on building scalable and robust distributed systems.

BoxUsage u'0. SimpleDB BoxUsage values seem to vary only for select operations, and are quite consistent for the domain-related operations. Let us try the other operations such as getting its metadata. You can either print out the cumulative BoxUsage value or a dollar value for the BoxUsage. There is always the possibility that the size limitation of 10 GB will be a limiting factor when your dataset needs to be larger. In such cases, SimpleDB gives you the ability to create multiple domains.

This will of course mean that data will need to be partitioned among the multiple domains so that each dataset in a domain is under the 10 GB size limit. For example, let us assume that our songs domain is hitting the limit of 10 GB. We could partition our domain into multiple domains, each dedicated to a specific genre. However, this will add complexity to our application as currently SimpleDB domain queries cannot be made across domains.

Any partitioning of the data means that we will need to make select queries for each domain and then aggregate the results in the application layer. Performance is the most common reason to partition your data. Multiple domains increase the potential throughput. Each domain is limited to about 50 to puts per second. Split a domain into 4 and you increase throughput to to puts per second. This is the key for large scaling applications.

Another reason for partitioning is that our queries start hitting the timeout limit due to the large dataset. In this case, we can make queries against the smaller multiple domains and aggregate the queries. Summary In this chapter, we discussed the BoxUsage of different SimpleDB queries and the usage costs, along with viewing the usage activity reports. In the next chapter, we are going to look at using caching along with SimpleDB. In this chapter, we will consider one simple strategy for avoiding excessive SimpleDB requests: using a cache to store the data locally.

The cache that we will be using to accomplish this is called memcached. It stores the cached data in temporary files. Caching Caching Caching can help alleviate both the issue of making extra requests to SimpleDB and the issue with eventual consistency. We discussed the principle of eventual consistency in the earlier chapters, and it is one of the main principles behind the design of SimpleDB. However, the possibility that things may not be consistent immediately after you make some change to your SimpleDB data can throw things out of whack for your own application.

If you are aware that this can happen, you can take it into consideration when designing your SimpleDB-based application and leverage caching to help alleviate it. Memcached The most popular solution used for caching these days is called memcached. It is an open source project originally developed by Danga Interactive and used for their LiveJournal website. Since then, it has been used all over the world to improve the performance and scalability characteristics of applications and web applications. Memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

At the most basic level, memcached can be considered a simple memory cache that can be deployed anywhere and accessed from anywhere over a network. The beauty of memcached is that it is great for storing data using a key and then retrieving that data back using the key. Memcached utilizes highly efficient, non-blocking networking libraries and as a result is very, very fast and high performing. It is highly recommended that you only run memcached inside your firewall where you can restrict access to the server that runs it.

This will increase the processing times, as you will have to encrypt data before you store it and then decrypt it on retrieval every time. However, if you are storing sensitive information, the additional processing overhead is well worth it to secure the information. As memcached stores data signed keys, anyone who gains access to the server can query the server for data by guessing the keys. If you utilize keys that are generated using a hash algorithm on some simple strings, it will be next to impossible for anyone to just guess the key.

This will again add a little overhead due to the key generation, but depending on your application, it may be well worth the effort. This will usually get you the latest compatible version for that distribution. However, memcached is a fast-evolving project, and they are constantly improving it or adding security bug fixes. It is also a very widely used project, so the community is constantly providing patches to memcached. The best way to ensure that you are using the latest stable version is to get the source yourself and compile and install it.

This is a pretty straightforward process and follows standard Linux installation procedures. At the time of writing this chapter, the latest version of memcached is 1. Note that there is a limit on how much memory you may lock. Increasing the memory page size could reduce the number of TLB misses and improve the performance. In order to get large pages from the OS, memcached will allocate the total item-cache in one large chunk.

This is used for per-prefix stats reporting. The default is ":" colon. If this option is specified, stats collection is turned on automatically; if not, then it may be turned on by sending the "stats detail on" command to the server. Adjusts max item size default: 1mb, min: 1k, max: m Now you have memcached installed.

Please take a look at all of the different options that are available for configuring it. You may want to change this port to a different number if you like. Now we are going to run memcached as a daemon listening on port , so we can connect to it and start using it for caching our data. The most current version at the time of writing this chapter is 2.

Unzip the distribution to the directory of choice. You are now ready to start using this library to interact with our memcached server. At the time of writing this chapter, the version of python-memcached is 1. Download and install the package for your version of Python. Open up a Python console session and import the memcached library and there should not be any import errors. Storing and retrieving data from memcached It is quite simple to use the memcached client.

In this section we will store and retrieve data from the memcached server and get a feel for the API. Storing and retrieving data from memcached in Java The first thing we need to do before retrieving data is to actually create a connection to the memcached server. A static pool of servers is used by the client. You can of course specify more than one memcached server for the pool. Once we have a connection object, it can be used for all of the interaction with the server, and for setting and retrieving keys and their associated values. Here is a simple class that shows how we would do this in Java: package simpledbbook; import com.

MemCachedClient; import com. SockIOPool; import java. If so, those results are returned; if not, then the actual Select to the SimpleDB database is done and the results are stored in memcached as well as returned. Client [' After 30 seconds, this value will be automatically purged from the cache by the memcached server.

In the next section, we will start looking at how we can utilize memcached to alleviate the burden on SimpleDB and speed up our data retrieval process by leveraging the cache. Just click and it is ready to use. When you need to retrieve data from SimpleDB, first query memcached server to see if the data is currently available in the cache.

If the data is in the cache, then simply return the data, and do not make a request to SimpleDB. If the data is not in the cache, retrieve it from SimpleDB, and store it in the cache before returning the results back, so it is available next time you need it in the cache. If you are updating the data, all you have to do is update SimpleDB and also delete the data from the cache. This will ensure that the next request from the data will get the latest information from SimpleDB and not outdated information from the cache.

You can also just update the cache with the latest data that you have, thus alleviating any issue with eventual consistency, returning stale data when you turn around and make a request immediately. If you have data that automatically goes stale after a fixed period of time, you can always have a background process or job that periodically clears the cache and puts the latest information into the cache.

You can also tell memcached server to expire data and remove it from the cache automatically by specifying the amount of time the data needs to be in the cache when storing it. Using memcached with SimpleDB in Java The usage of memcached client is quite simple, as we have seen in the previous section. Now we are going to integrate the client into a simple class that interacts with SimpleDB, so you can see the advantages brought to the table by memcached. The class listed next is just an example and it always queries the memcached server for data before it goes and retrieves it from SimpleDB.

You can use any string value as the key for storage. We will use the name of the item as the key for this example. MemCachedClient; com. SockIOPool; com.

  • Amazon DynamoDB vs. Amazon SimpleDB vs. MongoDB Comparison.
  • Amazon SimpleDB.
  • Euro-Par96 Parallel Processing: Second International Euro-Par Conference Lyon, France, August 26–29 1996 Proceedings, Volume I.
  • SimpleDB Cons.

Domain; com. ItemAttribute; com. QueryWithAttributesResult; com. SDBException; com. SimpleDB; java. BigInteger; java. MessageDigest; java. NoSuchAlgorithmException; java. ArrayList; java. Date; java. List; java. Retrieving from SimpleDB. If your data needs to be larger than that limit, you should look at the various command-line options that can be provided when starting your server. First is cachetest. File: cachetest. This program takes an ID and calls the cache to try fetching it. If the fetch fails then a value is stored in the cache.

If the clearcache checkbox is checked then the cache is first cleared of that value so that the actual value will need to be fetched. File: selectcachetest. In the next part, the user interface is set up asking for input for the variables. If Clearcache is true, then the cache is cleared. Now try retrieving the data from the cache using the SQL as the key. The data returned from the SDB API is an array, so serialize and unserialize are used to convert from an array to a string. If the data is found in the cache, it is converted back to an array with unserialize.

If the data is not in the cache, SimpleDB is called. Before returning the data array, a copy is serialized and stored in the cache. As these are not returned in the array, they would also have not been stored in the cache. In a real implementation, the cache would be cleared when a put was performed on a record.

Caching would be most useful using getAttributes, as it would be easy to control. The key is the item name. If the item is updated with a put, then the cache is cleared using the same key, the item name. The key to performance would be to integrate the cache in the getAttributes call to store the original XML string from SimpleDB, rather than the serializing and unserializing the results array.

The XML approach would call none of this overhead. Rich Helms: "I added notes in the source code to the putAttributes and getAttributes functions on where hooks into caching would be added". Client [self. We will create a class that will encapsulate the logic for using memcached with SimpleDB. This class is intentionally simple to clarify the concepts involved without being bogged down with too much detail.

Check if it exists in the memcached server first. If it does not exist, we go retrieve it from SimpleDB or else we return it immediately from our cache. Let us now run a query against SimpleDB and see our cache in action. If you run this same query again within 10 minutes, which is the expiry time that we have set for the cache, you should get it from the cache without another call to SimpleDB.

You can layer this with your specific requirements and build up a more complex caching strategy for use with SimpleDB. You can experiment with other strategies and find one that matches your application requirements. In the next chapter, we are going to look at another way to speed up retrieval from SimpleDB by using parallel queries and multi-threading.

The data that we have been inserting has not really needed anything else. However, one of the things that really sets it apart is its support for concurrency and parallel operations. This support is what truly makes it a scalable database.

rusoto_sdb - Rust

In this chapter, we will explore how to run parallel operations against SimpleDB using boto. As you can imagine, this can cause major performance issues if you had updated several items and their attributes, as you would have to make one SimpleDB call for each item. This may not seem like such a big deal, but once the number of items to be updated starts going above , this becomes very significant. We will accomplish that by using the support for BatchPutAttributes provided by Typica.

The following code runs a query against SimpleDB and prints the results from the query.

  1. What is Amazon RDS??
  2. The design and evolution of C ; Reading, Mass. Addison-Wesley : 1994 repr. 1995?
  3. Amazon’s SimpleDB Enters Public Beta.
  4. Then we update attributes for both of these items, and run another query and print the results to show that the update was completed successfully. HashMap; java. Eventual Consistency The first part of the console output shows the attribute values before the change, and the later part of the output displays the changed attribute values as a result of our call to BatchPutAttributes. To add another item, the item name will be set, the array will be cleared, the new values will be added, and then another call will be made.

    Then make one call with batchPutAttributes. Note the maximum number of items you can send in one call is This operation can be invoked on a boto domain object by specifying a dictionary of items to be updated. You can also specify if the existing values are to be replaced or if these attribute values are to be added to the existing ones. The default option is to replace the existing attribute values with the provided values. The default option in SimpleDB is to not replace the existing attribute values, but to add a new set of attributes with the provided values.

    Here is sample of how we would use this method to update the attributes for multiple items with a single call to SimpleDB. The BatchPutAttributes operation succeeds or fails in its entirety. There are no partial puts. This will work just fine as long as you only need to update up to 25 items at a time. What if you need to update items, or even , items? You can certainly use the simple batch operation, but making the requests one after the other serially will seriously degrade your performance and slow your application down.

    Here is a simple Python script that updates items by making three different calls to SimpleDB, but in a serial fashion, that is one call after another. This will give us a baseline to look at when we convert this same script into using parallel operations. So running this simple update serially took about 3.

    Now let's see if we can speed that up. There are several different ways of parallelizing our requests using Python. We will look at the different ways of doing this in Python. In this section, we will use the java. Recent versions of Java provide a ThreadPoolExecutor class that has all of the functionality that we need for our purpose. We will first instantiate a ThreadPoolExecutor and provide a variety of configuration options to it such as the minimum size of the pool, a handler class that is executed in case the task being performed is rejected, and so on.

    Anytime that we need to run something in a different thread, we invoke the execute method on our ThreadPoolExecutor object and provide it an object that implements the Runnable interface and performs the actual task. That's all there is to it. In this example, we will create a simple SongsWorker class that implements the Runnable interface and does the actual updating of an item's attributes.

    The example is bit contrived and deliberately kept simple so that the concept is clear. Here is a sample Java class that performs update of attributes by using threads. Item; com. Iterator; [ ] Chapter 10 import import import import import import import import java. Map; java. ArrayBlockingQueue; java. RejectedExecutionHandler; java. ThreadPoolExecutor; java. TimeUnit; java.

    Level; java. You can retry this op again if you like System. Here is a simple Python script that uses the threading module. When the thread actually runs, it will use the domain object created in the constructor and use the batch operation to update the items. You can start as many threads as there are items to be updated. Each item within this array will be processed within one thread.

    Each item is of course a dictionary that has the items to be updated. Keep in mind that the 25 item limit for the batch operation means that each of the dictionaries in this array can only contain a maximum of 25 items. SimpleDB makes multiple copies of your data and uses an eventual consistency update model. An immediate Get or Select request read after a Put or Delete request might not return the updated data. Some items might be updated before others, despite the fact that the operation never partially succeeds.

    Here is the output when we run the simple threading sample. You can see such an improvement when just using a small set for our tests, but a large dataset will show you even more advantages of using parallel operations to optimize SimpleDB operations. The nicer way to use threads in Python would be to use queues and threading together. Here is the same code sample rewritten now to use a queue:! Queue class QueuedBatchPut threading. We first create an instance of the Queue class. We then create a pool of threads, with each instance of the new QueuedBatchPut class, which subclasses Thread and does the actual update of values in SimpleDB.

    We put all our work data on to the queue. The pool of threads picks up items off the queue and processes them. Each thread picks up one item that is one unit of work or one batch of items to be updated, and updates the values appropriately on SimpleDB. Once the piece of work is done, a signal is sent to the queue notifying it that the task is complete. The main program waits till the queue is completely empty and then exits the program. This could also be due to some network latency, but the clarity gained by using queues and decreased programming complexity is well worth the difference.

    Threading with workerpool There is an open source project named workerpool that encapsulates some of the thread pool pattern, and makes it easy to work with threading and jobs in Python. In this section, we will use workerpool to rewrite our sample. It is a bit cleaner and simpler than using the queues directly within your code. These are two similar ways of using threading, and you can use whichever is better suited to your programming style. Concurrency and SimpleDB The power of SimpleDB becomes truly apparent when you start taking advantage of the support for concurrency by writing multithreaded programs for interacting with it.

    You can choose the SimpleDB operation that you like—inserting items, deleting items, updating items, and easily scale it up using boto and the parallelization techniques that we looked at in this chapter.

    SimpleDB versus RDBMS

    Your application must be aware of this fact and handle this by retrying the request with an exponential back-off. This reduced latency in combination with parallelization will give your SimpleDB a real boost. There have also been some reports that Amazon enforces a limit of BatchPut operations per minute in order to ensure a good quality of service for all customers. This is a great reason for utilizing partitioning for your domains, even in cases when the domain does not exceed the 10 GB limit. You can apply the same technique to other SimpleDB operations, such as insert and delete.

    Summary In this chapter, we discussed utilizing multiple threads for running parallel operations against SimpleDB in Java, PHP, and Python in order to speed up processing times and taking advantage of the excellent support for concurrency in SimpleDB. Applications require a database that can adapt as the user community grows. SimpleDB can support this in a cost-effective way, as long as the developer is willing to learn a new database paradigm. As developers, we dream of creating software that catches the public's fancy and goes viral. Ten users today, 50, tomorrow. What a wonderful problem to have!

    Our books and publications share the experiences of your fellow IT professionals in adapting and customizing today's systems, applications, and frameworks. Our solution based books give you the knowledge and power to customize the software and technologies you're using to get the job done. Packt books are more specific and less general than the IT books you have seen in the past.

    Our unique business model allows us to bring you more focused information, giving you more of what you need to know, and less of what you don't. Packt is a modern, yet unique publishing company, which focuses on producing quality, cutting-edge books for communities of developers, administrators, and newbies alike. For more information, please visit our website: www. This book is part of the Packt Enterprise brand, home to books published on enterprise software — software created by major vendors, including but not limited to IBM, Microsoft and Oracle, often for use in other corporations.

    Its titles will offer information relevant to a range of users of this software, including administrators, developers, architects, and end users. Writing for Packt We welcome all inquiries from people who are interested in authoring. We're not just looking for published authors; if you have strong technical skills but no writing experience, our experienced editors can help you develop a writing career, or simply get some additional reward for your expertise.

    Oracle VM Manager 2. Learn powerful Xen Hypervisor utilities such as xm, xentop, and virsh Oracle Coherence 3. Build scalable web sites and Enterprise applications using a market-leading data grid product 2. Leverage Coherence events and continuous queries to provide real-time updates to client applications Please check www.

    To find a person by his or her phone number is easy. The design is simple, but as the name data is repeated, it would require care to keep the data in sync. Searching for phone numbers by name would be ugly if the names got out of sync. To improve the design, we can rationalize the data. One approach would be to create multiple phone number fields such as the following. While this is a simple solution, it does limit the phone numbers to three. Add e-mail and Twitter, and the table becomes wider and wider. The design is simple, but the phone numbers are limited to three, and searching by phone number involves three index searches.

    This approach has the advantage of no data repetition and is easy to maintain, compact, and extendable, but the only way to find a record by the phone number is with a substring search. This type of SQL forces a complete table scan. Do this with a small table and no one will notice, but try this on a large database with millions of records, and the performance of the database will suffer. The normalization for relational databases results in splitting up your data into separate tables that are related to one another by keys. A join is an operation that allows you to retrieve the data back easily across the multiple tables.

    The table structure is clean and other than the ID primary key, no data is duplicated. While this is an efficient relational model, there is no join command in SimpleDB. Using two tables would force two selects to retrieve the complete contact information. SimpleDB does not support the concept of joins. Instead, SimpleDB provides you with the ability to store multiple values for an attribute, thus avoiding the necessity to perform a join to retrieve all the values.

    Class AmazonSimpleDBClient

    Unlike a delimited list field, SimpleDB indexes all values enabling an efficient search each value. There are no schemas anywhere in sight of SimpleDB. This is yet another thing that is difficult for some people from a traditional relational database world to grasp, but this fl exibility is one of the keys to the power of scaling offered by SimpleDB. You can store any attribute-value data you like in any way you want.

    In the relational database, it is necessary to either add e-mail to the phone table with a type of contact field or add another field. Using a traditional relational database approach, we join the three tables to extract the requested data in one call. We ignored the issue of join versus left outer join , which is really what should be used here unless all contacts have a phone number and e-mail address. In SimpleDB, there is no concept of a column in a table. The spreadsheet view of the SimpleDB data was done for ease of readability, not because it refl ects the data structure.

    The proper representation of the SimpleDB data is:. Structured Query Language SQL is a standard language that is widely used for accessing and manipulating the data stored in a relational database. SQL has evolved over the years into a highly complex language that can do a vast variety of things to your database.

    SimpleDB does not support the complete SQL language, but instead it lets you perform your data retrieval using a much smaller and simpler subset of an SQL-like query language. This simplifies the whole process of querying your data. This simplified textual data makes it easy for SimpleDB to automatically index your data and give you the ability to retrieve the data very quickly. If you need to store and retrieve other kinds of data types such as numbers and dates, you must encode these data types into strings whose lexicographical ordering will be the same as your intended ordering of the data.

    Dates present an easier problem, as they can be stored in ISO format to enable sorting as well as predictable searching. Name and Attribute. Value parameters. The client specifies the first attribute by the parameters Attribute. Value , the second attribute by the parameters Attribute.

    Value , and so on. However, it cannot have two attribute instances where both the Attribute. Value are the same. Optionally, the requestor can supply the Replace parameter for each individual attribute. Setting this value to true causes the new attribute value to replace the existing attribute value s. Because Amazon SimpleDB makes multiple copies of client data and uses an eventual consistency update model, an immediate GetAttributes or Select operation read immediately after a PutAttributes or DeleteAttributes operation write might not return the updated data.

    The following limitations are enforced for this operation: total attribute name-value pairs per item One billion attributes per domain 10 GB of total user data storage per domain. Returns information about the domain, including when the domain was created, the number of items and attributes in the domain, and the size of the attribute names and values.

    Amazon SimpleDB

    Returns all of the attributes associated with the specified item. Optionally, the attributes returned can be limited to one or more attributes by specifying an attribute name parameter. If the item does not exist on the replica that was accessed for this operation, an empty set is returned. The system does not return an error as it cannot guarantee the item does not exist on other replicas. NOTE: If GetAttributes is called without being passed any attribute names, all the attributes for the item are returned.

    It returns domain names up to the limit set by MaxNumberOfDomains. Calling ListDomains successive times with the NextToken provided by the operation returns up to MaxNumberOfDomains more domain names with each successive operation call. The Select operation returns a set of attributes for ItemNames that match the select expression. The total size of the response cannot exceed 1 MB in total size. Amazon SimpleDB automatically adjusts the number of items returned per page to enforce this limit. For example, if the client asks to retrieve items, but each individual item is 10 kB in size, the system returns items and an appropriate NextToken so the client can access the next page of results.

    The DeleteDomain operation deletes a domain. Any items and their attributes in the domain are deleted as well. The DeleteDomain operation might take 10 or more seconds to complete. NOTE: Running DeleteDomain on a domain that does not exist or running the function multiple times using the same domain name will not result in an error response. The CreateDomain operation creates a new domain.

    The domain name should be unique among the domains associated with the Access Key ID provided in the request. The CreateDomain operation may take 10 or more seconds to complete.