Postgress index effectively disabled when searching for Greek names

thazir · April 12, 2023, 6:54am

Dear all

I have installed Orthanc version: 1.11.3 on Windows 10 along with the PostgreSQLplugin used for indexing only. I have sucessfully imported thousands of our prior images and studies using the dcmsend utility from dcmtk. Some of out patient names are Greek and so I have changed then “DefaultEncoding” to “Greek” and the C-Find results come back ok. The problem I am facing is that when I search, through the Orthanc explorer for example, for names staring with Greek characters, the delay is huge. There is no such problem with Latin names.

I have run Orthan with a verbove log and observed that when I search for latin names the database indexing seems to work fine, as indicated by the two following lines:

I0409 10:39:40.947163 ServerContext.cpp:1445] Number of candidate resources after fast DB filtering on main DICOM tags: 8
I0409 10:39:40.962692 ServerContext.cpp:1593] Number of matching resources: 8

But when I search for Greek names the db filtering seems to be turned off, as indicated by the following two lines:

I I0409 10:29:08.881887 ServerContext.cpp:1445] Number of candidate resources after fast DB filtering on main DICOM tags: 81092
I0409 10:30:39.915161 ServerContext.cpp:1593] Number of matching resources: 14

Furthermore I turned on the PostgreSQL query log on, to monitor what queries does Orthanc perform during these searches. It seems that during the Latin name searches a LIKE were part is included in the query to filter only the names of interest, as you can see from the following postgresql log lines, where I search for CHAZIRAKIS*:

2023-04-09 10:45:48.874 EEST [5788] LOG: execute 8ba40901-4b98-4017-93a9-2e0008835b15: SELECT studies.publicId, MIN(instances.publicId) FROM (SELECT studies.publicId, studies.internalId FROM Resources AS studies INNER JOIN DicomIdentifiers t0 ON t0.id = studies.internalId AND t0.tagGroup = 16 AND t0.tagElement = 16 WHERE studies.resourceType = 1 AND t0.value LIKE $1 ESCAPE '') studies INNER JOIN Resources series ON series.parentId = studies.internalId INNER JOIN Resources instances ON instances.parentId = series.internalId GROUP BY studies.publicId
2023-04-09 10:45:48.874 EEST [5788] DETAIL: parameters: $1 = ‘CHAZIRAKIS%’

But when I search for a Greek name, no matter what I input in the Orthanc explorer form, the “t0.value LIKE $1” vanishes and there is no relevant parameter set, as you can see from the following postgresql log lines:

2023-04-09 10:47:24.809 EEST [13968] LOG: execute ea722c26-ae6b-4ee9-acb3-4876b24ef15c: SELECT studies.publicId, MIN(instances.publicId) FROM (SELECT studies.publicId, studies.internalId FROM Resources AS studies INNER JOIN DicomIdentifiers t0 ON t0.id = studies.internalId AND t0.tagGroup = 16 AND t0.tagElement = 16 WHERE studies.resourceType = 1) studies INNER JOIN Resources series ON series.parentId = studies.internalId INNER JOIN Resources instances ON instances.parentId = series.internalId GROUP BY studies.publicId

After this query the whole list of 8000 results is examined one by one to find a match for the Greek name. The following repeated log lines indicate so:

2023-04-09 10:47:27.794 EEST [12348] LOG: execute d6d931a7-621c-48a6-a9cf-e3f6e2b4dfe0: SELECT internalId, resourceType FROM Resources WHERE publicId=$1
2023-04-09 10:47:27.794 EEST [12348] DETAIL: parameters: $1 = ‘387f9198-16c91180-3ced7ce1-cce95cae-e9437a4b’
2023-04-09 10:47:27.796 EEST [12348] LOG: execute e1d16d25-9c76-43cc-8329-4c7af04b3ee8: SELECT * FROM MainDicomTags WHERE id=$1
2023-04-09 10:47:27.796 EEST [12348] DETAIL: parameters: $1 = ‘112214’
2023-04-09 10:47:27.797 EEST [12348] LOG: execute 7e9165e8-54ff-4ab4-9bde-b89dba102a90: SELECT parentId FROM Resources WHERE internalId=$1
2023-04-09 10:47:27.797 EEST [12348] DETAIL: parameters: $1 = ‘112214’
2023-04-09 10:47:27.797 EEST [12348] LOG: execute e1d16d25-9c76-43cc-8329-4c7af04b3ee8: SELECT * FROM MainDicomTags WHERE id=$1
2023-04-09 10:47:27.797 EEST [12348] DETAIL: parameters: $1 = ‘112212’
2023-04-09 10:47:27.798 EEST [12348] LOG: execute 7e9165e8-54ff-4ab4-9bde-b89dba102a90: SELECT parentId FROM Resources WHERE internalId=$1
2023-04-09 10:47:27.798 EEST [12348] DETAIL: parameters: $1 = ‘112212’
2023-04-09 10:47:27.798 EEST [12348] LOG: execute e1d16d25-9c76-43cc-8329-4c7af04b3ee8: SELECT * FROM MainDicomTags WHERE id=$1
2023-04-09 10:47:27.798 EEST [12348] DETAIL: parameters: $1 = ‘112211’

The whole process takes minutes to complete and I do get the correct results in Orthanc explorer, but this unnatural delay in searches is too big and also breaks communications with dicom viewers. Do you have any ideas or suggestions on how to solve this problem?

Best Regards
Tony Chazirakis

alainmazy · April 13, 2023, 7:16pm

Hi Tony,

Sorry but I do not have many options to suggest besides playing with the DefaultEncdoing and maybe fine tuning the C-Find requests wrt SpecificCharacterSets.

Could you share a 2-3 sample files with Greek names for us to reproduce the issue. Please also share the details of the C-Find queries.

Please also note that we might need some time before we have time to investigate the issue.

Best regards,

Alain.

thazir · April 15, 2023, 5:30pm

Dear Alain

I have setup a small test case. I am sending you three dicom files. Two of them have Greek names and one has Latin name. I have performed a default installation of Orthanc, with the MySQL database. I am using “DefaultEncoding” : “Greek” but I don’t think this is the issue. I have tested the searches with C-Find’s using the proper, I think, specific character sets. Still I think I should not send you the details of my searches as they will compicate things. I have exactly the same behaviour from the Orthanc explorer, searching for patient name at http://localhost:8042/app/explorer.html . I noticed that the DB filtering seems not to work for French characters as well. I made the three searches below and the DB filtering is not proper for all non Latin names. The Orthanc explorer results are ok though, just the fast DB filtering is wrong. So this is a problem for large number of dicoms only.

Search for Patient Name: MY* (Latin Characters)
Expected Result: 1 study
Log File:
I0415 09:48:24.445925 ServerContext.cpp:1445] Number of candidate resources after fast DB filtering on main DICOM tags: 1
I0415 09:48:24.447615 ServerContext.cpp:1593] Number of matching resources: 1

Search for Patient Name: ΧΑ* (Greek Characters)
Expected Result: 1 study
Log File:
I0415 09:50:01.431316 ServerContext.cpp:1445] Number of candidate resources after fast DB filtering on main DICOM tags: 3
I0415 09:50:01.432313 ServerContext.cpp:1593] Number of matching resources: 2

Search for Patient Name: à* (French Characters)
Expected Result: 0 studies
Log File:
I0415 09:51:40.901450 ServerContext.cpp:1445] Number of candidate resources after fast DB filtering on main DICOM tags: 3
I0415 09:51:40.901450 ServerContext.cpp:1593] Number of matching resources: 0

Is the above adequate and enough for further investigation? Should I still proceed and send you details of C-Find requests? I understand it might take time to check. I will try to compile Orthanc myself, if I can, to make some checks as well.

Best
Tony

PS: I am attaching the Google Drive link to the Dicom files as I am not allowed to upload files here yet.

thazir · April 17, 2023, 3:10am

Dear Alain

I have build Orthanc and made a small investigation. I believe I found the source of the behaviour I am refering to.

I see that the search is performed in void ServerContext::Apply(…) located in ServerContext.cpp near line 1426. The fast DB filtering is probably performed by GetIndex().ApplyLookupResources(…) near line 1452 (inside ServerContext::Apply(…)). This is a call to StatelessDatabaseOperations::ApplyLookupResources(…) which calls StatelessDatabaseOperations::NormalizeLookup(…) which calls DicomTagConstraint::ConvertToDatabaseConstraint(…) which calls std::string ServerToolBox::NormalizeIdentifier(const std::string &value). This last NormalizeIdentifier removes all non ascii characters from the search string and thus disables the fast db filtering for non latin names.

The non fast db filtering still works though. I see that the filtering is made near line 1536 at ServerContext.cpp (if (fastLookup->IsMatch(dicom)) {…}). This calls the bool DatabaseLookup::IsMatch(…) function, which calls the bool DicomTagConstraint::IsMatch(const DicomMap& value) function which calls the bool DicomTagConstraint::IsMatch(const std::string& value) function which uses an DicomTagConstraint::NormalizedString instance which calls std::string Toolbox::ToUpperCaseWithAccents(const std::string& source) which makes some std::wstring conversion inside with boost.

So, the non Latin filtering works, but only in the slow way. It would be very usefull if this behaviour could be improved, as it poses a major problem for archives with non latin patient names.

Best
Tony

sdscotti · April 17, 2023, 6:45am

I don’t know if the is the issue, but I looked at that on my dev server and noticed that the body of the request is different for the Legacy Explorer vs. Explorer 2: “PatientName”:“ΧΑ*”} vs. “PatientName”:“*ΧΑ”}, the * is transposed ??

Searching for PatientName: *ΧΑ"

Explorer:

{“Level”:“Study”,“Expand”:true,“Limit”:101,“Query”:{“StudyDate”:“”,“PatientName”:“ΧΑ*”},“Full”:true}

Explorer 2:

{“Level”:“Study”,“Limit”:100,“Query”:{“PatientName”:“*ΧΑ”},“RequestedTags”:[“ModalitiesInStudy”],“Expand”:true}

If I search for *ΧΑ in Explorer 2, the body of the request is actually:

{“Level”:“Study”,“Limit”:100,“Query”:{“PatientName”:“ΧΑ*”},“RequestedTags”:[“ModalitiesInStudy”],“Expand”:true} and I think I get back the expected results ??

I0417 00:53:40.811641 HttpServer.cpp:1262] (http) POST /tools/find
I0417 00:53:40.816061 ServerContext.cpp:1445] Number of candidate resources after fast DB filtering on main DICOM tags: 4
I0417 00:53:40.823629 ServerContext.cpp:1593] Number of matching resources: 2

thazir · April 17, 2023, 9:46am

Dear Stephen

I have not build the explorer v2 plugin so I cannot debug the requests like you do. But I have tried the installed explorer v2 on my computer with the --verbose flag of Orthanc to check the fast DB filtering behaviour. As far as I can tell the problem persists. I get:

I0417 12:37:14.980286 ServerContext.cpp:1445] Number of candidate resources after fast DB filtering on main DICOM tags: 3
I0417 12:37:14.981287 ServerContext.cpp:1593] Number of matching resources: 2

But the fast DB filtering should have returned 2 results, not 3 (which is the whole study set of my archive). I believe you are facing the same issue because you are getting 4 candidate resources from fast DB filtering, while only 2 actually match.

Do you think my observations are reasonable, or should I try to build the explorer v2 plugin to investigate further?

Best
Tony

alainmazy · April 17, 2023, 11:21am

Hi Tony,

Your issue is in no way related to Orthanc Explorer 2.
Stephen: yes, OE2 adds wildcards before and after the text. This is a feature.

Tony, thanks to the info you’ve provided in previous messages, I should be able to investigate further but I can’t tell you when.

Best regards,

Alain.

thazir · April 17, 2023, 3:31pm

Thank you very much Alain

Be fine
Tony

alainmazy · April 24, 2023, 4:09pm

Hi Tony,

I have made some tests with your files (thanks for sharing them !).
I was actually able to patch the code to solve your issue but, unfortunately, I’m not able to push it in the Orthanc core because it would break the searches in all existing systems.

I keep it for further refactoring. We have some ideas to re-implement the “fast lookup part” that is actually becoming slow on very large DB. And, only at that time we’ll be able to use that new implementation. Unfortunately, this will very likely take 4-6 months before we can work on that.

For memory, here is the commit with the proposed patch.

Best regards,

Alain.

thazir · April 26, 2023, 12:55am

Very nice Alain

I am glad the issue has been solved. Thank’s a lot you for your support.

Best
Tony

alainmazy · April 26, 2023, 6:15am

Note that it’s not “solved”. It has been identified and we have a solution but we can not enable it (yet).