Optimizing orthanc with large database

Hello,
What is the currently recommended solution for using orthanc as "near-enterprise" PACS?
Currently I'm dealing with 2 TB of data with postgres as database and plain files storage (db-v6/??/??) on ext4, no compression.
As I'm running out of space I considered migrating to something different - for example storing instances in postgres database.

Does the size overhead is really so significant?
Maybe mongodb would be a good solution?

Mysql is not available yet.

Potentially using postgres I could optimize many things, for example I could use remote postgres database.

What do You recommend?

Hello,

I personally have no stats about the overhead of storing DICOM files inside a PostgreSQL database with the dedicated “OrthancPostgreSQLStorage” plugin. Please contribute: We’d be delighted to hear your feedback about this topic.

Regarding MongoDB, a third-party plugin is available, please share your insight about it:
https://github.com/Doc-Cirrus/orthanc-mongodb

Regarding MySQL and database optimization, Osimis launched a crowdfunding campaign a few months ago, but unfortunately the objective was not reached:
https://www.indiegogo.com/projects/mysql-plugin-for-orthanc-software#/

Sébastien-

Concerning the overhead of storing DICOM files inside PostgreSQL, this is mentioned in the doc here: http://book.orthanc-server.com/plugins/postgresql.html#postgresql

Remark: When using the Storage PostgreSQL plugin, the DICOM files are stored as large objects in the database. This actually consumes more space than the DICOM file itself (around 40% more). When dealing with large number of files, it is recommended to continue using the default filesystem storage for DICOM files and enable the Index PostgreSQL plugin.

However, we’re really not postgreSQL expert and that’s just something we’ve observed but could not explain. Indeed, if someone understand’s what’s going on there, we would appreciate to understand !

So, file system is the way to go. With the help of LVM you should be able to have one logical volume spawned over multiple HD so you basically have no limits. If Orthanc is running in a VM, the hyperviser might handle it too.

Well, that’s why I asked, I was a bit afraid about the overhead mentioned here and I hoped to get some real use case statistics.

In fact, I’ve just tried to setup orthanc with postgres Storage and Index plugin on virtual machine.

Surprisingly - the real database size took a bit less then files on harddisk (reported by linux du command) but - there is WAL (Write-Ahead Logging) log file in postgres directory, which boosts reliability, can be used to rollback database, etc - and the file itself took like the same amount of space as the database itself, which is reasonable as the whole database was uploaded in a short amount of time. (I uploaded 2 exams - second right after the first one).

Good news is WAL logging is configurable, and You can limit the log:

  • set the maximal size

  • set the time of rollback

(and others)

Bad news: As the postgres will allocate space for the WAL log file You won’t get it back. Even after expiration of the log harddisk space will still be allocated.

Note: I used Postgres 10.0.

In postgres 9.4 the WAL files were stored in pg_xlog directory, now it is called pg_wal.

So, some real data:

Debian testing fresh install, postgres 10.0, orthanc 1.3.0 from debian repo, postgres plugin from debian repo

currently: empty database, 3 exams were uploaded, removed, 2 of them uploaded, removed, simply I’ve played a bit:

After I’ve noticed wal filesize growing I’ve set checkpoint_timeout = 5min in postgresql.conf, I left other settings untouched.

root@orthanc-postgres:/var/lib/postgresql# du -h
945M ./10/main/pg_wal
56M ./10/main/base
1001M ./10

uploaded two studies: 575M in 1129 files and 337M in 662 files

root@orthanc-postgres:/var/lib/postgresql# du -h
209M ./10/main/pg_wal
820M ./10/main/base
1,1G ./10

820M/912M = 90 % (!) postgres compression?

Surprisingly - the wal file was shrinking during the upload!

I’ve uploaded the third one: 2318 instances, 350M (files exported from Ginkgo CADx)

97M ./10/main/pg_wal
1280M ./10/main/base
1378M ./10/main

1280M/1262M = 101%

(Technically i should use ./10/main/base/16385 for calculations which took 1258M, still <100%)

Drawbacks? Database fragmentation. After removal of the largest study the database havent shrinked at all. But here comes the VACUUM function.
Let’s leave our server for a few minutes - Size was still the same.

Uploaded a study - database size increased by the size of a study.

Invoked a manual VACUUM FULL on the table - filesize remained constant.

Uploaded another large study - no change in filesize (!)

Removed two studies - database size remained constant.

pg_wal stayed at around 100 MB all the time.

To sum up:

The overhead is not so large. WAL is there for reasons, and can be limited.

Don’t expect postgres to free the harddrive space after removal of instances, postgres keeps it allocated for incoming data. It might eventually be freed.

Let’s assume postgres knows what it is doing. ; )

I hope to try it with much larger dataset this week.

Hi Oskar,

thanks for this report ! I’ll add a link to this discussion in the Orthanc book.

Regards,