Monday, April 3, 2017

Elastic Storage in AWS Cloud

What is Elastic EBS
Elastic EBS Use Cases
Elastic EBS Testing
Elastic EBS Limitations

What is Elastic EBS

Public clouds are known for elasticity of compute resources. For example, AWS AutoScaling Group (ASG) can dynamically scale up a compute farm (VM or cloud instances) to service higher load or scale down to safe cost during off hours. What missing was a dynamic scaling of block storage such as: online volume expansion, storage tiering or temporary boosting volume's iops and throughput. Recent announcement of Elastic EBS feature addressed some of these shortcoming.  SAN features like online modification of volume attributes (size, iops and type) without detaching or attaching volume or restarting an instance are now possible in public cloud. No down time is required!

EBS (Elastic Block Storage) is AWS network storage offering that allows attaching block storage to a running cloud instance in a given Availability Zone (AZ).  EBS volumes support both: solid state disks (SSD), called io1, gp2, for improved random iops or lower latencies, and magnetic disks (HD), called st1 and sc1, for higher sequential throughput.  Elastic EBS feature makes it possible to mix and match EBS volume capabilities to achieve best storage performance for a given workload.

Elastic EBS Use Cases

Elastic EBS fits well for various use cases listed below:
Performance Boost: 
  • To improve IOPS performance temporarily during nightly batch or quarterly end processing
  • To improve sequential IO throughput for Analytics and DSS queries, io1 or gp2 volumes can be modified to st1 volumes that offer higher throughput than io1 and gp2 volumes.
Storage Tiers to Save Cost:
  • Modify volume types to save cost. One can build online storage tier without impacting performance using various EBS types available: io1, gp2, st1 and sc1
On Demand Storage Capacity
  • Instead of over-provisioning storage, start with a fixed size EBS volumes for all instances in your cluster. Monitor EBS volume capacity and raise it when reaches 80-90%. This may save cost as the capacity increase is performed on as needed basis.
Reduced Administration Overhead
  • EBS storage saves great deal of administration work as compared to ephemeral storage (direct attach storage). EBS is persistent storage and thus does not require additional steps in protecting data such as copying or refreshing data, as in the case of ephemeral storage, on instance launch. 
  • Elastic EBS feature makes EBS more attractive as it takes away the size consideration out of capacity planning decision. Both volume and file system can be expanded online without requiring instance reboot.

Elastic EBS Testing

Two new EBS API calls: ModifyVolume and DescribeVolumesModifications were introduced to support Elastic Volume feature.  awsconsole, awscli or API calls can be used to modify volume. Completion status can then be polled via Cloudwatch metrics or via DescribeVolumesModifications API to trigger some action.
" For example: File system size can be expanded online once volume change completion
      notification is received"

Modify Volume -- iops Test

Provision iops on io1 volume can be modified (increase or decrease) dynamically. Modification request takes few minutes to complete. Volume continue to perform at original iops level until volume modification is completed

To test the feature, I created and attached 500 GB ebs io1 volume type. DescribeVolumesModification API is used to poll volume changes. Iops are increased in every iteration by 2000. fio benchmark tool was running in the background to perform read via direct IO path. Measured storage iops, tput and latency. See bash script used for testing below

Modify Volume --size Test
Created and attached 500 GB ebs gp2 volume. DescribeVolumesModification API is used to poll volume modification status. Size of gp2 volume is modified (expanded) to an additional 200GB in every test iteration. xfs_grow is invoked to grow xfs file system to utilize additional volume space. fio read IO load on a file system was running in endless loop. Measured storage iops, tput and latency. See batch script used for testing below:

Modify Volume --type Test
Created and attached 3TB sc1 ebs type volume. DescribeVolumesModification API is used to poll modification status. Dynamically change 3TB volume from one EBS volume type to other in a loop:
sc1->st1 | st1->gp2 | gp2->io1 | io1-> gp2 | gp2->st1 | st1->sc1
fio read IO load via direct IO was running in endless loop. Measured storage iops, tput and latency. See the batch script used for testing below:

Elastic EBS Limitations

  • Elastic EBS feature is supported only on current generation of aws instances
  • There is a limit on one modification request per volume every 6 hours.
  • Volume modification change request takes from few seconds to tens of minutes to complete.
  • Minimum charge is for 6 hours once volume modification is completed.
  • Modify volume option "iops" applies to "io1" (provision iops) EBS volume only.
  • Modify Volume option "size" can only be used to increase the volume capacity. Shrinking volume is not supported!
  • Both ext4 and xfs support growing file system online.
    • File systems cannot be grown online if EBS device is under Linux MD and RAID volume type is RAID-0. Linux MD RAID supports growing for RAID-1 (mirror) and RAID-5(parity) volumes only.
    • File systems can be grown online if EBS device is under LVM (Linux Volume Manager) control
    • File systems can be grown online on drbd device as long as drbd device is backed up by LVM.
  • Modify options: "iops" and "type" (except "size") are transparent to instance and thus can be done to all types of configurations (raw, block, MD, LVM, drbd etc..) . 
  • AWS IAM InstanceProfile need additional permission to run ModifyVolume and DescribeVolumesModifications API from the instance to modify volume online. 
    • One can use scripts provided in the script section of this document to modify volume's iops, size and type.
    • One can use AWS lambda function provided by AWS to tag all volumes requiring size changes to set "maintenance" tag and then a script that runs on the instance to resize all targeted volumes.

EBS continue to evolve to match SAN like features in public cloud.  We expect more SAN features to be added into the growing feature list in the future such as: Thin provisioning, deduplication, LUN level snapshots or mirroring and multi-attach volumes for clustering application like Oracle RAC or clustered filesystems.

NOTE: EBS does support snapshot feature, but snapshot is saved to S3. When restored, data is copied to EBS volume as the data is accessed (lazy copy). Due to lazy copy from S3 to EBS , IO latencies jump to 100ms from 2-5 ms until all blocks are copied to EBS from S3. LUN level snapshot copies the data directly to destination LUN.

No comments:

Post a Comment