Hi,
This is probably a simple question, but I just wanted to get some clarifications about how the indexing process works for the combination of the advanced storage and PostgreSQL as the indexing database.
Currently, I have Orthanc running with the AdvancedStorage plugin which is indexing several mounted directories.
I am wondering what the practical implications are for Orthanc and the PostgreSQL database when:
- Files are added/deleted from one of the indexed directories, or
- Entire directories are removed from the AdvancedStorage configuration file
Is Orthanc immediately aware of added/deleted files within a directory and this is reflected in the Orthanc GUI?
I know that when a directory is added to thethe configuration file and Orthanc is restarted that is is indexed and included. When the directory is removed from the configuration, are the entries removed from the databased and thus not shown in the GUI?
Thanks for your help!
Hi,
There is an “indexer” thread that cycles to all folders and subfolders and check if a file/folder is new or has been deleted since the last run. So, if you have a huge amount of files, cycling through all the files might take a long time and therefore there might be some large delays …
HTH,
Alain
Thanks for the response.
So, if some files are deleted from an indexed directory or an entire indexed directory is removed from the AdvancedStorage config, once detected, the relevant files will be removed from the SQL index database?
Out of curiosity, while using the AdvancedStorage plugin in Index mode only, if you delete a study from within OE2, does it only delete the study from PostgreSQL? I presume it doesn’t affect the source files on disk.
How does Orthanc know not to re-index that study while it is monitoring the folder where the deleted study still lives?
The indexer keeps track of all files that it has indexed so, as long as the file remains untouched on the disk, it will not be processed again.
1 Like
@alainmazy I just wanted to follow-up on this point. I do have a pretty large number of files (~25M) so it seems to take quite a while for any change in which directories are indexed to be reflected in Orthanc. This is expected based on your comment.
My question is, if Orthanc is restarted, does it start the cycling from the "beginning” or does it start from where it left off? I’m not sure if the indexer has a “memory” in this regard.
I removed one entire directory (containing ~1M or so files) from the AdvancedStorage config ( Folders ) and started Orthanc. I think the Orthanc instance was running for about 12 days before I had to restart Orthanc for a necessary reason. After 12 days, it had still not yet removed the ~1 million files. I was hesitant to stop and start Orthanc for fear of needing another few weeks for the changes to be reflected, but maybe this fear is unwarranted if the indexer picks up where it left off.
Any insights into how the indexer handles such a situation would be great. Thanks.
Sorry to bother, but any insights that can be shared @alainmazy or @jodogne?
Perhaps there is something I need to configure to provide more computational resources to the indexer to try and speed things up? Our Orthanc instance has been up for about 10 days and still these patients/studies remain.
I know our number of instances is fairly large, but I’m hoping there might be configurations so that it won’t always take several weeks for changes to be reflected in the database and OE2.
Hi @mattwrkntn
Everytime it is started, the indexer lists all files and starts from the beginning.
The only configuration you can tune wrt performance is AdvancedStorage.Indexer.ThrottleDelayMs which already defaults to 0ms.
I have actually no idea on how fast it is supposed to work with 25M filesand, unfortunately, it is not very verbose.
However, I have just added a new configuration AdvancedStorage.Indexer.EnableVerboseLogs to make the indexer more verbose (but you’ll get at least one log line for each file that is analyzed which means 25M in your case). Hopefully, you’ll have a better view of what’s going on your setup.
This should be available in the mainline binaries on Monday.
Hope this helps,
Alain.
Thanks for adding that new feature, @alainmazy.
Are the folder paths processed in the order they are listed in the AdvancedStorage JSON configuration? I am planning to add a new folder of data to be indexed, does placing it first in the `Folders` array get in indexed first?
{
"AdvancedStorage": {
"Enable": true,
"Indexer": {
"Enable": true,
"Folders": [
"/path/to/dataset1",
"/path/to/dataset2",
"/path/to/dataset3"
],
"TakeOwnership": false
}
}
}
Out of curiosity, does the Indexer write the array of Folders paths to the database (potentially with metadata)? It makes me wonder if, on restart, the Indexer could first diff the config folders array and database folders array and start scanning new or deleted paths first before scanning other paths?
A heuristic like the following might give a sense of “responsiveness”:
- Deleted paths are handled first (path removed from array)
- New paths are handled second (path added to array)
- Obviously changed paths handled third (path “last modified date” or size changed)
- Non-obviously changed paths are handled fourth (everything else)
But I don’t know how the indexer works so maybe this suggestion doesn’t work.
yes
A lot can be done to improve the plugin given that it was not originally designed to index millions of files. If you have specific needs (and funding
) , do not hesitate to get in touch with https://orthanc.team/.
And don’t forget that, as suggested a while ago in another thread, you can write your own indexing logic in python and call the Rest API to adopt/abandon instances. That is very likely the best approach.
Best,
Alain
Thanks again for the response. Could you please point me to the thread that discussed writing custom index logic in Python? I definitely need to read into this some more.
Note: I do have good intentions to contribute funding, when possible. I have been discussing with colleagues how we can build in such contributions to open-source maintainers in our grant budgets. My current role as a postdoc does not provide much ability to donate, at the moment.
I could not immediately find it but anyway, there were not much info.
You can actually write your own logic in python to:
- parse existing directories for new or deleted files and possibly remember what you have processed already
- as soon as a new file is detected, call the
plugins/advanced-storage/adopt-instance route
- as soon as a file is deleted, call the
plugins/advanced-storage/abandon-instance
Since you would write your own python code to monitor files, you can probably use system events instead of cycling through all the files, you can have multiple worker threads processing the new/deleted files in parallel and calling the Orthanc Rest API …

I am going to look into this some more, it sounds like building some custom Python logic would work well in my use case (i.e., files are almost NEVER modified, usually files are either added or deleted when whole directories are added or deleted).
What route(s) should be called if a file is modified, rather than added or deleted? Does it get abandoned and then adopted to re-index it?