Orthanc storage performance issue

We are using Orthanc as our mini PACS system and data storage system. But as our data keeps growing, we encountered severe performance problem.

1. Importing Dicom data: 10MB/s IO throughput. 2TB data takes 3days (200,000s) to finish importing
2. Extracting data from Orthanc: In most our cases we read dicom data by series instead of instance. But the original Orthanc database abstraction and SQLite/PostgrelSQL implementation organize data by instances, which leads to random access IO.

There are 40TB medical image data now and will be 10x size in 2018. Our user kept complaining about read & write performance issue every day.

I'm going to do the following things, please help me validate my plan and tell me how to do so in a correct way.

Convert random access to series access: Since our read requests come by series but not instances. we can convert 512KB random access to ~20MB sequential read
Q: Can I finish this by implementing my own DatabasePlugin and IndexPlugin?

I am not an Orthanc developer but have faced with quite alike problems… I am afraid that writing your own plugin would not help you with indexing 400TB with current Orthanc’s database API (check issue 41). If you suffer from PG’s slow reading queries look at issue 47. And I have no idea how to improve loading speed (do you really need it highr for populating Orthanc with data? it is a one time operation), may be you can try several Ortant instances connected to one PG database

Hello,

2. Extracting data from Orthanc: In most our cases we read dicom data
by series instead of instance. But the original Orthanc database
abstraction and SQLite/PostgrelSQL implementation organize data by
instances, which leads to random access IO.

Just to make sure there is no confusion: AFAIK the default SQLite
backend does not support storage of DICOM data, only indexing it.
Instead, the default Orthanc Storage backend is the filesystem backend.

So, there are actually two backends to consider here:
- Filesystem backend
- Postgres plugin with "EnableStorage" set to true.

Are you saying you experienced the same issues with both, or just one
of them?

Convert random access to series access: Since our read requests come
by series but not instances. we can convert 512KB random access to
~20MB sequential read

Indeed that would be ideal.

Q: Can I finish this by implementing my own DatabasePlugin and
IndexPlugin?

Most likely, although unfortunately I don't believe the plugin API
would directly provide you with the appropriate signals to do so (i.e.
each request to each instance is isolated). This means that to do what
you want, you'd need preloaded cache semantics with a filling process
based on heuristics (e.g. "if someone requests the first instance of a
series, it is likely they will want the rest, so fetch it from
secondary storage before they ask for it").

If you think this is the issue, then a quick idea in the case of the
filesystem backend would be to try hinting the kernel with prefetching
facilities like readahead(2) (Linux) or posix_fadvise(2) (POSIX) to try
and get the instances of the series into the kernel cache. This way you
don't have to implement a cache yourself.

Alternatively, you may of course implement a plugin yourself where you
manage such a cache with full control (and with the accompanying
complexity).

In the case of the PG plugin, I recommend seeking some guidance in the
PG community.

If you think this can be helped with changes in the storage backend
API, feel free to submit a proposal on the issue tracker.

Lastly, I would recommend double-checking everything around Orthanc
before starting any such endeavor just to make sure system-level or
hardware-level issues are excluded. For example, do some I/O benchmarks
on the machine and another one (to get a baseline), if possible at all
levels (direct or physical volume/partition, logical volume if any,
filesystem) and do some Orthanc benchmarks again on the machine and
another one; then compare the actual numbers. It could be that by
discovering and addressing another issue, you would get good-enough
performance without any particular optimizations. In particular, I
don't feel like write performance should be constrained by Orthanc's
data model, at least in theory (unless instances are very small and
numerous and the allocation process becomes a bottleneck).

Hope this helps,

Hello,

First of all, make sure that you are using versions of the Orthanc core and of the PostgreSQL plugin with debug assertions turned off:
http://book.orthanc-server.com/faq/troubleshooting.html#performance-issues

Secondly, as mentioned previously by Thibault, if performance is important to you, make sure to store the raw DICOM files onto a RAID filesystem that is properly backup, and not in a PostgreSQL database. In other words, make sure to set option “EnableStorage” to “false” in the configuration file:
http://book.orthanc-server.com/plugins/postgresql.html

Thirdly, database optimizations (in particular issues 41 and 47, but also “batch querying” if many queries are to be executed successively) are planned to be carried on in the context of the implementation of the MySQL plugin:
https://www.indiegogo.com/projects/mysql-plugin-for-orthanc-software#/

So, please consider supporting this crowdfunding campaign by Osimis in order to make these optimizations possible!

Regards,
Sébastien-