AWS S3 plugin - Cost optimization for medical image storage?

Hello,

I’m using the Orthanc PACS with the AWS S3 plugin to store medical images in the cloud. I was wondering how people are managing their costs when using S3 as their storage solution. Do you have any tips or best practices to share for cost optimization? I would appreciate any advice on this matter.

Thank you in advance.

Hello Ibrahim,

I believe there are 2 options open to you. These options are

  1. Object lifecycle management

  2. Using S3 Standard intelligent tiering

Option 1. Using Lifecycle Management

In order to manage your stored images i.e objects so that they are stored cost-effectively throughout their lifecycle, you will need to configure the Amazon S3 Lifecycle. The S3 Lifecycle configuration is a set of rules that define actions that Amazon S3 applies to a group of objects. There are essentially two types of actions:

  • Transition actions – These actions define when (defined time) objects transition from one storage class to another. For example, you might choose to transition objects from the S3 Standard to the S3 Standard-IA (Infrequent Access) storage class 30 days after creating them, or transition objects to an archive like the S3 Glacier Flexible Retrieval storage class 120 days after creating them or to S3 Glacier deep Archive one year after creating them. Please note that there are costs associated with lifecycle transition requests.

  • Expiration actions – These actions define when objects expire. AWS S3 deletes expired objects on your behalf based on your settings. There are also costs based on when you choose to expire objects.

Opton 2: Use AWS S3 Standard intelligent tiering

This may be your best bet since you do not know how frequently your generated and stored medical images will be accessed. The Amazon S3 Intelligent-Tiering storage class is designed to optimize storage costs by automatically moving data to the most cost-effective access tier when access patterns change.

S3 Intelligent-Tiering automatically stores objects in three access tiers: the first tier is optimized for frequent access, then a lower-cost tier optimized for infrequent access, and a very low-cost tier optimized for rarely accessed data.

For a small monthly object monitoring and automation charge, S3 Intelligent-Tiering moves objects that have not been accessed for 30 consecutive days to the Infrequent Access tier for up to 39% savings and after 90 days of no access, they are moved to the Archive Instant Access tier with up to 66% savings. If the objects are accessed anytime in the future, S3 Intelligent Tiering moves the objects back to the Frequent Access tier.

For greater savings on rarely accessed stored Medical images (objects), check out the opt-in asynchronous Archive and Deep Archive Access tiers in S3 Intelligent-Tiering.

Considering that you have already been charged a fee for using the object monitoring and automation feature, there are no retrieval charges in S3 Intelligent-Tiering in contrast to when lifecycle management is used (there are charges if objects are moved back for access).

Note that S3 Intelligent-Tiering has no minimum eligible object size, but objects smaller than 128 KB are not eligible for auto-tiering. These smaller objects may be stored, but they’ll always be charged at the Frequent Access tier rates and don’t incur the monitoring and automation charge.

Sorry, this was quite long. Hope this was helpful.

More reference: https://docs.aws.amazon.com/AmazonS3/latest/userguide/intelligent-tiering.html

Regards,

Nuhu M

Hello Ibrahim,

then there is this also

S3 One Zone Infrequent Access — This storage class is the only one that sacrifices availability. While AWS advertises the same 11 9’s of availability at this tier, they offer the following footnote, “Because S3 One Zone-IA stores data in a single AWS Availability Zone, data stored in this storage class will be lost in the event of Availability Zone destruction.” This storage class has the lowest cost per-GB to store data, while still allowing millisecond access to the data. This storage class should only be used for reproducible data. Good use cases include:

  • Regional replication from other S3 buckets. If you are using cross-region replication to backup S3 buckets, this is a good storage tier as all of the data should be reproducible.
  • Copies of on-premises backups. For folks who need to implement an off-site backup for their on-premises solution, this is a good solution.

reference:
https://towardsaws.com/s3-storage-tiers-use-cases-and-best-practices-f19973be1bd5

Regards,

Nuhu M

Hello Nuhu,

Thank you for your email regarding the options available for managing stored medical images in AWS S3. Your detailed explanation of the two options - object lifecycle management and using S3 Standard intelligent tiering - was very helpful.

After carefully considering both options, I believe that using AWS S3 Standard intelligent tiering is the best choice for our needs. Since we do not know how frequently these medical images will be accessed, the automatic data movement to the most cost-effective access tier provided by S3 Intelligent-Tiering is very appealing.

Also, the fact that S3 Intelligent-Tiering has no retrieval charges, and that there are opportunities for cost savings with Infrequent Access and Archive Instant Access tiers, makes this option even more attractive. I appreciate your recommendation, and the reference provided for further reading.

Thank you again for your assistance.

Best regards, Ibrahim

Hello Ibrahim,

I am glad I was able to help.

Wish you the best.

Regards,

Nuhu M

Please be aware that Orthanc version 1.11.2 was not able to retrieve studies from AWS Glacier deep archive storage.
Not sure if this was resolved in version 1.11.3.

Hello,

Usually, statutory compliance is one of the main reasons why data sets and medical images need to be archived. considering the fact that retrieval of “previous” medical images may be required for quick “medical” decision-making, it is therefore very important to have the right data storage and archive strategy for relatively easy access to relevant medical data.

If you have archived some medical images in S3 Glacier Flexible Retrieval, you can only retrieve the data typically within 3-5 hours. If you archive the data in S3 Glacier Deep Archive, you can only retrieve it typically within 12 hours. Even from a regular on-premise tape archive with tape libraries etc, retrieval will never be instant and it will take some time.

I believe that if the S3 Glacier Deep Archive needs to be part of the medical organization’s general data & business protection strategy (including the archive), it is advised that data retrieval from S3 Glacier Deep Archive needs to be planned in advance of the actual medical encounter with the patient based on the duration required to retrieve the data.

Warm Regards,

Nuhu M