Timeout/caching issue with Webviewer behind Apache Proxy

John_Roberts1 · June 10, 2019, 6:06pm

This may well be an Apache issue, but I thought I would ask here in case any others have setup Orthanc and regularly use the web viewer behind an Apache proxy.

We have multiple Orthancs running in different Docker containers, with an Apache proxy to handle our login authentication and segregate our different research groups from each other’s Orthancs.

Quite often, I run into a case where pulling up the Web viewer page and clicking on “All patients” pulls up the standard patient browser, but with NO patients listed, even when I know there are patients on that Orthanc. I’ve generally interpreted this as some sort of timeout/caching issue.

I can force a “refresh” by temporarily calling the API directly (ex. https://myorthanc/patients). That seems to do the trick so that when you return to the patient browser menu, all the patients magically reappear.

I’ve googled and googled, but can’t seem to find the combination of either Orthanc and/or Apache settings which would stop this from happening. I eventually have to educate my users to use the API trick to refresh the browser.

Has anyone else faced this issue before? What simple configuration setting am I missing? Any ideas that might point me in the right direction?

Thanks,
John.

Thibault_Nelis · June 11, 2019, 8:42am

Hello,

Quite often, I run into a case where pulling up the Web viewer
page and clicking on "All patients" pulls up the standard patient
browser, but with NO patients listed, even when I know there are
patients on that Orthanc. I've generally interpreted this as some
sort of timeout/caching issue.

This is a good hunch (we've encountered timeout issues at the reverse
proxy level multiple times, more information below). Could you confirm
it first?

* Use your browser developer tools to check the exact response
(including status code and headers) from the Apache reverse-proxy, or
use another HTTP client with more diagnostics (e.g. a CLI client like
cURL). Perhaps share the results if you need help.

* Check the logs of the Apache reverse-proxy, and possibly increase the
Apache log level to get the details you want. In your case, you're
looking for messages that Apache timed out waiting for a response from
Orthanc (I'm not sure about the terminology in Apache actually, in
nginx for example they call the target hosts "upstreams"). Again, feel
free to share the results if you want.

* If you can find an setting or module to capture responses from a
target host (i.e. Orthanc) on Apache, enable it (though only on 500s
and 400s for example, lest you risk hitting your storage heavily).

These are the easy checks you can do immediately. If you have trouble
reproducing, try bigger and bigger studies; maybe produce a synthetic
one if needed.

If you want to go the extra mile or simply can't find much, consider:

* Capturing network traces with tcpdump/windump or with Wireshark
directly. Especially if you still can't reproduce, tcpdump (and I
assume Windump) has an option to do a "rolling capture" of a fixed
size, meaning you can keep it running forever and allocate it say a few
gigabytes. When the problem occurs, you can interrupt the capture at
your leisure and inspect it later. This will help observe all network
interactions on the server-side and help pinpoint the exact origin.

* Looking at network throughput charts, especially per-connection (TCP)
charts if you can get them. Be on the lookout for "roller coasters".

* Looking at storage I/O. Look for: either "roller coasters" or
evidence of long periods of saturation.

Some context on your hunch: Many reverse proxies "debounce" their read
timeout counters upon observing bytes being read from the peer.
However, just because the peer (i.e. Orthanc) isn't sending data
doesn't mean it isn't hard at work processing it. So, you might
sometimes see Orthanc preparing a response for a long while, the
reverse proxy abandoning the request and telling the client "I think it
failed" (often via a 502 "bad gateway" message), and then when Orthanc
is finally ready to send its answer nobody is listening anymore.

Normally we see this with the preparation of very large responses, but
depending on hardware resources and contention on the host there's no
reason it can't happen in other contexts. Examples: (1) The viewer
generates many small requests, maybe the host is simply overloaded in
some way and a few threads are starved of resources and cannot make
progress; (2) There are many patients, maybe they are being sorted,
requiring blocking the pipeline for preparation of the list.

If you end up increasing the read timeouts of your reverse proxy to
whatever seems appropriate in your scenario, please consider signaling
your interest in better performance for your scenario as there might be
opportunities there. In general, we don't consider it normal that end-
users need to increase arbitrary timeouts like that: it's generally a
sign of something scaling badly (e.g. "the more patients there are, the
longer it takes for the first byte in the response to arrive, and the
higher I need to set my read timeout").

I can force a "refresh" by temporarily calling the API directly
(ex. https://myorthanc/patients). That seems to do the trick so that
when you return to the patient browser menu, all the patients
magically reappear.

I would imagine the frontend would do the same request, so the fact
that it works sometimes and sometimes not is indeed indicative of a
hot/cold cache scenario as you suggested. It could just be the system
cache.

Hope this helps,

John_Roberts1 · June 20, 2019, 8:16pm

Thanks for the reply. I’ve started digging into this issue and making use of the javascript console in the browser to watch errors.

I should add that my Apache proxy is authenticating against our campus’s single sign on service (using the CSA setup). I have managed to duplicate the error (no patients initially appear in the patients list) at the same time that I see CORS errors in the javascript console. Performing my trick of calling the “/patients” URL directly and then going back to the browser window eliminates the ongoing CORS errors.

I don’t know if this is the issue, but I wonder whether my single sign on service is somehow generating CORS errors. I can see that the complaints come from javascript requests originating from the single-signon URL.

Why calling the “/patients” URL directly and returning back to the Explorer browser works, I don’t know.

I tried a few of the Access-Control-Allow-Origin settings in my proxy, but none seemed to help.

I’ll dig into the other suggestions Thibault made and come back in an update in the future.

John.

John_Roberts1 · November 6, 2019, 12:13am

As an update, I’m pretty convinced now that the problem is occurring somewhere in our single-sign on CSA mediated process and not directly an issue of the local apache proxy server or Orthanc. The problem does not occur when bypassing the single-signon and accessing the Orthanc directly.

I’ve just learned to live with the minor issue.

John.

jodogne · November 7, 2019, 4:07pm

Thanks for the update!