FIle UUID and NFS back end storage

Sigh. I had a long post describing a use case we have for Orthanc that ran into trouble, but I fat fingered some keystroke that made the browser lose the post.

What are the chances that the File UUID and Orthanc’s on-disk storage would behave badly with an NFS mounted disk?

I am not criticizing Orthanc here. I love Orthanc and the team.

I’m wondering if there are particular configurations to be avoided.

Referencing an old post, I find myself in the described scenario: the database (PostGRE) for a very large (15+TB, 30M DICOM) Orthanc repository was corrupted and I am in the position of having to re-ingest the DICOM into a new Orthanc. The process has been going for 3.5 months (not the weeks that the original poster estimated).

I suspect that the issue has to do with Orthanc’s File UUID and how that results in a lot of random read/writes on the back end storage. If I understand correctly, the File UUID are generated randomly per DICOM (or other back end file) and are used to determine the directory structure of 256 top level folders each with 256 sub-folders or 65536 folders overall. Any given Study may find its individual DICOM spread out randomly over a portion of those 65536 subfolders. Each folder will contain a random selection from different studies.

My hypothesis is that that random access does not play nice with NFS. I see 1% to 5% the sort of performance I see on my Orthancs with local storage.

Again, no complaint about Orthanc here. I’m just wondering if my intuition about the way Orthanc writes files would not play well with a NFS back end, especially when trying to re-ingest 30+ million files.

Thanks,
John.

Hi John,

First of all, I must admit that I have no experience at all with NFS or any other similar protocols.

All I can say is:

  • when you use NFS, it is seen by Orthanc as a local storage.
  • I guess each read/write operations come with a latency and these latencies may pile up quite quickly.
  • We have made optimizations for object-storage plugins to minimize the cost of the latency by parallelizing operations like modifications and zip creations (ZipLoaderThreads and JobsEngineThreadsCount configs). Maybe they can also help for NFS ? Maybe you can ingest files by using multiple HTTP client or DICOM clients ?
  • To me, a file system should not penalize you for spreading the files on random folders - this is actually usually a way to balance the load on multiple physical disks and that should actually improve performances.

Best,

Alain

Thanks, Alain.

By local storage, I mean the disk is hosted on the same machine hosting the Orthanc. In that setup, I see great performance. This problem is with the only Orthanc I have setup with remote storage.

I’m still tracking down the sluggishness. Given that the remote disk is slow even from a command prompt to access a folder not visited for a while, I figure the problem is some poor NFS configuration. Once accessed, subsequent access to that remote folder is rapid until after some time of inactivity when a renewed access is slow.

It’s there that I think the randomness of Orthanc’s back end storage is in conflict with the suboptimal remote disk configuration.

I currently have 8 separate python clients running in parallel to attempt to speed up DICOM insertion. That waxes and wanes from < 1 to sometimes 3-4 DICOM per second across all 8 clients. Interestingly, the Orthanc and Postgres never exceed a small fraction of the CPU usage on the host machine. It does not appear to be a CPU limitation.

Nor is it a bandwidth issue. I checked that. I also tried simply writing large files to disk and those write quickly. A fairer comparison would be to test writing to random locations on the remote disk, but I haven’t tried that.

I tried command line dcmtk storescu to push over one of the subfolders of DICOM, but encountered the same ingestion rate - slower than running storescu against an Orthanc with local storage.

Another layer of complexity - I’m running the Orthanc in a Docker container. I don’t have any docker mediated throttles on performance. I haven’t used any of the docker mechanisms to limit performance. The same setup run in docker on a local disk is fast.

I’m even exploring whether my own institution throttles the performance. The remote disk is hosted by our institution in an offsite location.

Again, no issues with Orthanc here - just a weird use case to report.

John.