AWS s3 cache

Hi folks, I have Orthanc 1.7 installed on vmware, Im using the s3 plugin (the one not from osimis). Users use Osirix, is there a way to manage a cache so that when users retrieve studies from Osirix, it is a little faster? what is the best approach?

Note: at the moment I cant have Orthanc installed on ec2.

Thanks a lot!

Hello,

Well, S3 is obviously limited the bandwidth of your network connection.

You could consider setting the “StorageCompression” configuration option to “true” in order to divide the size of the stored DICOM instances by roughly a 2 factor, which should in turn speed up network transfers:
https://book.orthanc-server.com/users/configuration.html

Also, give a try to the two S3 plugins that are currently available, as they might have different performance:
https://github.com/radpointhq/orthanc-s3-storage

https://book.orthanc-server.com/plugins/object-storage.html

More evolved solutions could be envisioned, such as using 2 Orthanc servers: The radiology workstations would connect to an Orthanc server “A” that stores data on a local drive and that would act as the cache; This first Orthanc server would be fed by a second Orthanc server “B” that stores the full database and that would use S3. The cache Orthanc server “A” would have a capacity limited by e.g. the “MaximumStorageSize” configuration option.

The prefetching strategy that would bring DICOM instances from Orthanc “B” to Orthanc “A” would be implemented by an automated script and/or by a Web application that uses the REST API of Orthanc (this strategy is evidently highly dependent on your very specific workflow):
https://book.orthanc-server.com/users/rest.html

HTH,
Sébastien-

This is not completely related, but just a general questions about using the S3 plugins and Amazon S3 storage. I signed up for a free account just to try the S3 plug-ins and also found the pricing calculator:

https://calculator.aws/#/

As just an educated guess I put 200 GB per month of data to store , 200 GB inbound per month and 2TB outbound per month and came up with an estimate of about 2300.USD per year.

I’ve never used S3 services before, but does that seem about correct if we generate 200 GB of data per month and expect that to be accessed 10x per study, such that we’d have 2.4TB total by the end of the year. I take it that Amazon pretty much guarantees not losing any data ?

I’d actually like to hear how people implement that. It seems like it would be nice to be able to just archive everything everything on an S3 service (a mirror) and then also have a working server sort of like Sébastien mentions that can be periodically trimmed with a feature to fetch studies from the S3 archive when needed. I take it you can have 2 instances of Orthanc running on the same local server on the intranet and automatically route studies from Server A to Server B when they are stored so Server B can be a permanent long term archive ? It does start to get a little pricey since you can get a pretty decent rack server for $5000.00.

As I am not a sales representative from Amazon, and as the Orthanc project is not actively supported by Amazon, I won’t personally answer such questions.

Sébastien-

Whatever storage backend you use, whether it’s a filesystem, database, S3 or something else, there are a couple of things to consider when calculating the total cost, which you touched on:

Firstly, how critical is the data to you? If you’re hosting locally you may want to use RAID or similar. If you’re using a cloud provider, you can usually select the degree of redundancy, or “durability”. If this is not the source of truth, but only used as a convenience, and you can easily access backups elsewhere, you can elect to use a storage class/medium with less redundancy.

The second question is what your data access patterns are. You may well have a large amount of data, but if only a low proportion of studies are required (e.g. only patients who require further treatment), you can often apply “lifecycle” rules which can dramatically reduce the costs further.

With respect to S3, you can do both these things, very easily. Rules can be applied based on access to automatically reclassify certain data as they age, for example.

Nick