Actualizado: octubre de 2011
Learn more about RBA Consulting.
Windows Azure services are accessed by using a set of REST APIs that allow the services to work well with the HTTP and HTTPS protocols. By relying on the REST architecture, Windows Azure removes platform dependencies. Any platform or language that can work with the HTTP or HTTPS protocol can issue calls into the storage service.
REST, or Representational State Transfer, is an architectural style that is most commonly used for the Internet. In REST, every entity or resource is addressable. When REST is used in conjunction with the HTTP protocol, the HTTP methods GET, POST, and DELETE represent the actions that you can perform. Because everything in Windows Azure is represented with URIs, all storage services are accessible through REST APIs.
For example, the blob service exposes both the container and the blobs as resources. Each blob must be part of a container, and a container is a part of the storage account. When you address blob resources, you can implement a hierarchical pattern for each blob. For example, if you store books, the paths of the URIs to two of them could be /book/book1 and /books/book2. This is an implementation detail because, in fact, Windows Azure stores blobs flatly, and not hierarchically. With Windows Azure storage, "/books/book1" is a string value that represents the name of the blob. An example of how to use the REST API with the blob service is to create a container. Here is an example of a URI for a new container.
To insert the new container into the storage account for your blob service, issue the HTTP request with the PUT HTTP method. Similarly, to insert a new block blob into the newly created container, call the Put Blob API by issuing a PUT request to the following URI. Here is an example.
In this example, the block blob must be 64 MB or smaller in size to issue such PUT request. If the block blob is greater than 64 MB, you must break up the blob and use the Put Block and Put Block List API. For more information, see "Blob Service API" at http://msdn.microsoft.com/en-us/library/dd135733.aspx.
Similarly, the REST API exposes queues and messages as resources. Here is an example of a queue URI.
Issuing a PUT to that URI creates a new queue that is named mynewqueue. You can create an unlimited number of queues within each storage account, as long as the queue names are unique among all your existing queues. To put messages into the queue, you must reference the <queue>/messages path and issue a POST to it. Here is the URI.
To specify the message itself, include it in the HTTP request body. The message body must be in a format that can be included in an XML message. Here is an example of an acceptable message body.
<QueueMessage> <MessageText>This is my message</MessageText> </QueueMessage>
For more information, see "Queue Service API" at http://msdn.microsoft.com/en-us/library/dd179363.aspx.
For the table service, a generic resource named Tables, and the tables themselves are exposed as resources. Unlike the other services, the REST API for the table service is compliant with the OData web protocol. This compliance allows you to use the WCF Data Services (formerly known as ADO.NET Data Services) client libraries. See the Client Libraries section of this article for more information. To create a new table, perform a POST to the following URI.
You must include a body in the request that conforms to the WCF entity set. (It also conforms to the Atom feed format.) The following example shows part of the request body.
<entry xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata xmlns="http://www.w3.org/2005/Atom"> <title /> <updated>2011-05-10T00:00:00-07:00</updated> <author> <name/> </author> <id/> <content type="application/xml"> <m:properties> <d:TableName>mynewtable</d:TableName> </m:properties> </content> </entry>
After the table is created, you can insert entities by using POSTs to the following URI.
Just as you did when you created the table, you must also supply a WCF entity set. Here is an example.
<entry xmlns:d=http://schemas.microsoft.com/ado/2007/08/dataservices xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom"> <title /> <updated>2011-05-10T00:00:00-07:00<updated/> <author> <name /> </author> <id /> <content type="application/xml"> <m:properties> <d:CountryCode m:type="Edm.Int32">1</d:Age> <d:MyOwnTags>tag 3</d:MyOwnTags> <d:PartitionKey>Address 2</d:PartitionKey> <d:RowKey>AddressEntity2</d:RowKey> <d:Timestamp m:type="Edm.DateTime">0001-01-01T00:00:00</d:Timestamp> </m:properties> </content> </entry>
For more information, see "Table Service API" at http://msdn.microsoft.com/en-us/library/dd179423.aspx.
The Windows Azure SDK provides several libraries that wrap the REST APIs into meaningful, structured classes. This section focuses on the managed library that contains classes for a subset of the REST APIs. Here is the complete list of namespaces in the managed library.
|The SDK does not include a complete set of managed APIs for the service management REST APIs. These are provided by the Microsoft sample at http://code.msdn.microsoft.com/Windows-Azure-CSManage-e3f1882c.|
The managed library contains classes that represent the different components that make up Windows Azure Storage. For example, the storage account is represented by the CloudStorageAccount class, and credentials are represented by the StorageCredentials class. For a complete list of all the classes, start with the "Windows Azure Managed Library Reference" at http://msdn.microsoft.com/en-us/library/dd179380.aspx. You can navigate to the desired namespace from there.
Windows Azure uses the PartitionKey value to group entities into partitions. Partitions are important because they enable scalability. The following figure, from http://blogs.msdn.com/b/windowsazurestorage/archive/2010/12/30/windows-azure-storage-architecture-overview.aspx, illustrates the storage service architecture.
The three layers of the storage service are:
Front-End (FE) layer – The FE layer is a collection of servers that accept incoming requests for some entity. The FE performs authentication, authorization, and it routes requests to the next layer, which is the partition layer.
Partition layer – The partition layer is comprised of partition servers and partition masters. Each partition server is assigned a set of partitions to serve, while the partition masters monitor the load of each partition server. The partition layer load balances requests across the partitions across the partition servers.
Distributed File System (DFS) layer – The DFS layer is made up of many storage nodes that persist units of storage that are named extents. This is the layer where the entities are actually stored. Any data from the DFS layer is available to any partition server. The DFS layer load balances I/O operations.
Partitions are an integral part of the storage service because they allow data to be logically separated into groups. The Windows Azure storage system is comprised of partition servers whose primary job is to serve a set of partitions.
One of the most important attributes of partitions is that they allow Windows Azure to load balance the storage system when a particular storage node can no longer sustain the target throughput (this is known as a server "getting too hot").
Data is replicated multiple times within the DFS layer to ensure its durability. The unit of storage is called an extent. Each extent ranges in size from 100 MB to 1 GB. It is possible for a single entity to span multiple extents. Copies of each extent are stored randomly in one or more of the hundreds of storage nodes that serve the DFS. At least three copies are created. Because larger entities are broken up into multiple extents, they are written to multiple disk drives. Any of the replicated data can be read.
After the request traverses the FE and partition layers, it is written into the storage system at the DFS layer. Each extent in the DFS is assigned a primary server and multiple secondary servers. All writes are handled by the primary server for the extent. From the primary server, the write is delegated to the secondary servers for the actual write operations. The primary server does not return a successful write status message to the client until two secondary disks/servers were able to successfully write the data into three fault and upgrade domains.
Fault and upgrade domains are logical groupings of servers that extend vertically across each layer. In other words, the set of servers for any given layer is distributed across fault domains and across upgrade domains. Fault domains provide availability when hardware fails. Upgrade domains provide the same availability during service upgrades.
Data recovery occurs when hardware, such as servers and disk drives, fails at the DFS layer. When a failure occurs, all of the extents that were stored by the hardware are lost. Requests for data that belonged in those extents are handled by other replicated extents. However, because of the failure, the number of replicas for the lost extents is lower than it should be. Therefore, available network and storage resources will be used to replicate the extent. The number of replicas for the lost extents is returned to an acceptable number within minutes.
In addition to providing durability at the regional level, Windows Azure storage will provide a replication process called Geo-Replication that allows data to be replicated to other regions. This will add redundancy at a geographic level. The goal of such process is to provide disaster recovery from large-scale events such as earthquakes.
The scalability of each of the storage services is primarily related to the partitions that those services produce. The target scalability for any given partition is approximately 500 entities per second. See http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx to learn more about scalability targets. Load balancing occurs when the partitions in the storage service become too hot, or are encountering too many requests. For optimum scalability, you must ensure, if possible, that there are many small partitions. Smaller partitions usually mean that there are more partitions. More partitions means that the partition masters can do a better job load balancing the requests for the partition layer because the smaller partitions can be reassigned to other partition servers. The following table shows how partitions are created for each service.
PartitionKey property of table (developer defined)
The blob service uses the blob name as the PartitionKey. This means that the blob service is already optimized to be fully scalable because each blob name is unique for each container.
The queue service uses the name of the queue as the partition key. All messages for any given queue will always be served from a single partition. This means that each queue already has its scalability target defined, because it is the same as the scalability target for a single partition. If you find that your queue's throughput is higher than 500 entities per second, you may need to implement a system that uses multiple queues for the current high-performing queue.
The table service is worth exploring in more detail because the developer has more control over its scalability. The rule of thumb is to create smaller, but more partitions for large scale applications. To do this, use finer-grained values for the PartitionKey. Larger-grained values cause more entities to be grouped together into larger and few partitions. See http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx for more information.
Be aware that ascending or descending PartitionKey values can cause bottlenecks during your write operations. These patterns are often called the Append Only pattern for ascending values, and the prepend only pattern for descending values. This means that Tables that are having entities inserted with an append only pattern are always having their entities inserted into the last partition of the table, which means all of those requests go to a single server limiting the throughput to around 500 entities per second. The same effect applies to the prepend only pattern. Whereas if they insert entities across the whole partition range, then that allows the service to use many partition servers to process the requests allowing thousands of entities to be inserted per second. For more information on how to scale your table using an effective partitioning strategy, read the "Real World: Designing a Scalable Partitioning Strategy for Windows Azure Table Storage" article.
Load balancing also occurs at the DFS layer. While load balancing occurs at the partition layer because the partitions become "hot," from too many requests, load balancing at the DFS layer handles the I/O load to the disks. If there are too many read or write I/O operations, the storage system will use other DFS servers that have the replicated data. In other words, the reads or writes to any particular disk is load balanced across other disks/servers.
The Windows Azure Storage service provides developers with services that are always available, durable and scalable. After you have a subscription and a storage account, you can manage your entire collection of storage services.
The blob service allows you to store binary and text files. The service supports authenticated access and anonymous access. A blob can either be a block blob or a page blob. These options allow you to optimize how your blobs are stored and delivered.
The queue service allows you to provide a durable messaging infrastructure.
The table service allows you to store structured data. Although they are comparable to relational database tables, Windows Azure tables do not maintain a schema. Instead, the schemas are defined by each entity that is stored in the table. Tables have PartitionKey and RowKey properties that define the clustered index for the table.
The architecture of the Windows Azure storage system uses three layers to handle incoming requests, provide availability, and to load balance requests. The FE layer handles the request first, before sending it to the partition layer. The partition layer has partition servers that are each assigned a set of partitions. Finally, the DFS layer stores the data across many storage nodes.