docker postgresql fail and shutdown after enable housekeeper plugin

Dkorthosurgery · May 14, 2022, 1:55pm

I tried to use the new housekeeper plugin,

i have this message in the logs,

W0514 14:58:19.291770 ServerContext.cpp:2424] W001: Accessing Dicom tags from storage when accessing study : 0010,0021

Ubuntu docker, postgresql storage plugin, osimis docker image

after several hours i found that the database fail and the result was

that could search the database but i was not able to view the images, ( orthanc stone viewer )

i have never such issue before,

unfortunately i lost postgresql logs, but there were administration issues and database fail

all the containers where up,

Restarting the computer solved the problem,

I will give another try with housekeeper plugin

any advice how i can investigate more this issue ?

alainmazy · May 16, 2022, 2:20pm

Hi,

Sorry but a single line of log does not give any information about what’s going on. So, only advices I can give for next time are:

document all operations you have performed on Orthanc configuration
collect all logs
investigate
share (ideally a full reproducible setup)

Best regards,

Alain.

Dkorthosurgery · May 16, 2022, 4:41pm

my setup
osimis image 22.5.2 two instances one write, one read
postgresql 13.6

the only change is that i added the housekeeper.json

{
  "Housekeeper": {

    // Enables/disables the plugin
    "Enable": true,

    // the Global Prooperty ID in which the plugin progress
    // is stored.  Must be > 1024 and must not be used by
    // another plugin
    "GlobalPropertyId": 1025,

    // Forces execution even if the plugin did not detect
    // any changes in configuration
    "Force": false,

    // Delay (in seconds) between reconstruction of 2 studies
    // This avoids overloading Orthanc with the housekeeping
    // process and leaves room for other operations.
    "ThrottleDelay": 5,

    // Runs the plugin only at certain period of time.
    // If not specified, the plugin runs all the time
    // Examples:
    // to run between 0AM and 6AM everyday + every night
    // from 8PM to 12PM and 24h a day on the weekend:
    // "Schedule": {
    //   "Monday": ["0-6", "20-24"],
    //   "Tuesday": ["0-6", "20-24"],
    //   "Wednesday": ["0-6", "20-24"],
    //   "Thursday": ["0-6", "20-24"],
    //   "Friday": ["0-6", "20-24"],
    //   "Saturday": ["0-24"],
    //   "Sunday": ["0-24"]
    // },

    // configure events that can trigger a housekeeping processing
    "Triggers" : {
      "StorageCompressionChange": true,
      "MainDicomTagsChange": true,
      "UnnecessaryDicomAsJsonFiles": true,
      "IngestTranscodingChange": true
    }
  }
}

the problem is in the database and not in orthanc and that is the reason it is difficult to investigate it,

the postgresql logs

022-05-16 07:27:28.015 UTC [1] LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.5432”
2022-05-16 07:27:28.024 UTC [26] LOG: database system was shut down at 2022-05-16 07:27:25 UTC
2022-05-16 07:27:28.036 UTC [1] LOG: database system is ready to accept connections
2022-05-16 07:41:34.699 UTC [1] LOG: received fast shutdown request
2022-05-16 07:41:34.705 UTC [1] LOG: aborting any active transactions
2022-05-16 07:41:34.706 UTC [52] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.706 UTC [54] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.706 UTC [46] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.706 UTC [45] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.707 UTC [36] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.708 UTC [37] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.708 UTC [1] LOG: background worker “logical replication launcher” (PID 32) exited with exit code 1
2022-05-16 07:41:34.708 UTC [50] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.708 UTC [49] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.708 UTC [34] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.709 UTC [47] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.710 UTC [44] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.712 UTC [43] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.714 UTC [39] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.715 UTC [41] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.717 UTC [51] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.723 UTC [42] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.726 UTC [38] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.726 UTC [53] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.730 UTC [33] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.735 UTC [35] FATAL: terminating connection due to administrator command
2022-05-16 07:41:34.742 UTC [27] LOG: shutting down
2022-05-16 07:41:34.820 UTC [1] LOG: database system is shut down

PostgreSQL Database directory appears to contain a database; Skipping initialization

2022-05-16 07:44:18.296 UTC [1] LOG: starting PostgreSQL 13.6 (Debian 13.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2022-05-16 07:44:18.298 UTC [1] LOG: listening on IPv4 address “0.0.0.0”, port 5432
2022-05-16 07:44:18.298 UTC [1] LOG: listening on IPv6 address “::”, port 5432
2022-05-16 07:44:18.512 UTC [1] LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.5432”
2022-05-16 07:44:18.643 UTC [27] LOG: database system was shut down at 2022-05-16 07:41:34 UTC
2022-05-16 07:44:19.057 UTC [1] LOG: database system is ready to accept connections
2022-05-16 07:47:11.560 UTC [1] LOG: received fast shutdown request
2022-05-16 07:47:11.563 UTC [1] LOG: aborting any active transactions
2022-05-16 07:47:11.564 UTC [40] FATAL: terminating connection due to administrator command
2022-05-16 07:47:11.564 UTC [41] FATAL: terminating connection due to administrator command
2022-05-16 07:47:11.564 UTC [43] FATAL: terminating connection due to administrator command
2022-05-16 07:47:11.566 UTC [34] FATAL: terminating connection due to administrator command
2022-05-16 07:47:11.566 UTC [35] FATAL: terminating connection due to administrator command
2022-05-16 07:47:11.566 UTC [36] FATAL: terminating connection due to administrator command
2022-05-16 07:47:11.567 UTC [1] LOG: background worker “logical replication launcher” (PID 33) exited with exit code 1
2022-05-16 07:47:11.568 UTC [37] FATAL: terminating connection due to administrator command
2022-05-16 07:47:11.568 UTC [42] FATAL: terminating connection due to administrator command
2022-05-16 07:47:11.572 UTC [38] FATAL: terminating connection due to administrator command
2022-05-16 07:47:11.576 UTC [39] FATAL: terminating connection due to administrator command
2022-05-16 07:47:11.584 UTC [28] LOG: shutting down
2022-05-16 07:47:11.628 UTC [1] LOG: database system is shut down

PostgreSQL Database directory appears to contain a database; Skipping initialization

2022-05-16 07:47:13.363 UTC [1] LOG: starting PostgreSQL 13.6 (Debian 13.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2022-05-16 07:47:13.364 UTC [1] LOG: listening on IPv4 address “0.0.0.0”, port 5432
2022-05-16 07:47:13.364 UTC [1] LOG: listening on IPv6 address “::”, port 5432
2022-05-16 07:47:13.367 UTC [1] LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.5432”
2022-05-16 07:47:13.375 UTC [26] LOG: database system was shut down at 2022-05-16 07:47:11 UTC
2022-05-16 07:47:13.384 UTC [1] LOG: database system is ready to accept connections
2022-05-16 07:48:08.201 UTC [1] LOG: received fast shutdown request
2022-05-16 07:48:08.205 UTC [1] LOG: aborting any active transactions
2022-05-16 07:48:08.207 UTC [34] FATAL: terminating connection due to administrator command
2022-05-16 07:48:08.207 UTC [35] FATAL: terminating connection due to administrator command
2022-05-16 07:48:08.207 UTC [1] LOG: background worker “logical replication launcher” (PID 32) exited with exit code 1
2022-05-16 07:48:08.208 UTC [33] FATAL: terminating connection due to administrator command
2022-05-16 07:48:08.217 UTC [36] FATAL: terminating connection due to administrator command
2022-05-16 07:48:08.225 UTC [37] FATAL: terminating connection due to administrator command
2022-05-16 07:48:08.240 UTC [27] LOG: shutting down
2022-05-16 07:48:08.336 UTC [1] LOG: database system is shut down

PostgreSQL Database directory appears to contain a database; Skipping initialization

2022-05-16 07:49:17.880 UTC [1] LOG: starting PostgreSQL 13.6 (Debian 13.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2022-05-16 07:49:17.881 UTC [1] LOG: listening on IPv4 address “0.0.0.0”, port 5432
2022-05-16 07:49:17.881 UTC [1] LOG: listening on IPv6 address “::”, port 5432
2022-05-16 07:49:17.919 UTC [1] LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.5432”
2022-05-16 07:49:17.955 UTC [26] LOG: database system was shut down at 2022-05-16 07:48:08 UTC
2022-05-16 07:49:18.040 UTC [1] LOG: database system is ready to accept connections
2022-05-16 07:54:27.707 UTC [38] ERROR: could not serialize access due to read/write dependencies among transactions
2022-05-16 07:54:27.707 UTC [38] DETAIL: Reason code: Canceled on conflict out to pivot 9110570, during read.
2022-05-16 07:54:27.707 UTC [38] HINT: The transaction might succeed if retried.
2022-05-16 07:54:27.707 UTC [38] CONTEXT: SQL statement “UPDATE GlobalIntegers SET value = value + 1 WHERE key = new.resourceType + 2”
PL/pgSQL function countresourcestrackerfunc() line 4 at SQL statement
SQL statement “INSERT INTO Resources VALUES (DEFAULT, 3, instance, seriesKey) RETURNING internalId”
PL/pgSQL function createinstance(text,text,text,text) line 66 at SQL statement
2022-05-16 07:54:27.707 UTC [38] STATEMENT: SELECT * FROM CreateInstance($1, $2, $3, $4)
2022-05-16 12:44:35.393 UTC [42] ERROR: could not serialize access due to read/write dependencies among transactions
2022-05-16 12:44:35.393 UTC [42] DETAIL: Reason code: Canceled on identification as a pivot, during conflict in checking.
2022-05-16 12:44:35.393 UTC [42] HINT: The transaction might succeed if retried.
2022-05-16 12:44:35.393 UTC [42] STATEMENT: INSERT INTO ServerProperties VALUES ($1, $2, $3)
2022-05-16 14:16:59.668 UTC [34] ERROR: could not serialize access due to read/write dependencies among transactions
2022-05-16 14:16:59.668 UTC [34] DETAIL: Reason code: Canceled on identification as a pivot, during conflict in checking.
2022-05-16 14:16:59.668 UTC [34] HINT: The transaction might succeed if retried.
2022-05-16 14:16:59.668 UTC [34] STATEMENT: INSERT INTO ServerProperties VALUES ($1, $2, $3)
2022-05-16 15:49:44.412 UTC [39] ERROR: could not serialize access due to read/write dependencies among transactions
2022-05-16 15:49:44.412 UTC [39] DETAIL: Reason code: Canceled on identification as a pivot, during conflict in checking.
2022-05-16 15:49:44.412 UTC [39] HINT: The transaction might succeed if retried.
2022-05-16 15:49:44.412 UTC [39] STATEMENT: INSERT INTO ServerProperties VALUES ($1, $2, $3)

Also appears several hours later ( two nights with housekeeper enabled i had the database error )

I will give another try next weeked and look at /housekeeper/status

many thanks, kyriakos

Στις Δευ 16 Μαΐ 2022 στις 5:21 μ.μ., ο/η Alain Mazy am@orthanc.team έγραψε:

alainmazy · May 18, 2022, 6:42am

These errors in postgres logs are very common. They just warn about conflicting transactions that are retried right after. I ran a 3 hour test yesterday with 10 HTTP clients calling the /studies/…/reconstruct routes and there were plenty of these errors although Orthanc was still running fine.

On another hand, I must admit that we have one system in production that is not using the Housekeeper but where the /studies/…/reconstruct route is called regularly. On this system, we observe Orthanc being stuck every few days and the last entry in the logs is alway a call to /reconstruct. I’ve been trying to reproduce the issue in order to debug it but with no success yet (hence my test with 10 parallel HTTP clients yesterday). I’ll keep on working on this. Our workaround till now is to add a health check and restart the container when Orthanc is not responding

Best regards,

Alain.

user14 · July 4, 2022, 9:22pm

i still have the same problem with osimis/orthanc 22.6.4 orthanc and housekeeper plugin

in my case restart the container does not resolve the problem, i am doing a full restart of the pc that is running ubuntu linux
what is health check ?
i clone the server and the files and i try to run it in docker with marjano linux in another pc and in the main pc to run housekeeper plugin three hours during the night to see what happens.
many thanks,
kyriacos

Στις Τετάρτη 18 Μαΐου 2022 στις 9:42:43 π.μ. UTC+3, ο χρήστης a…@orthanc.team έγραψε:

alainmazy · July 5, 2022, 6:37am

Hi,

Here’s a sample health-check setup: https://bitbucket.org/osimis/orthanc-setup-samples/src/master/docker/health-check/

HTH

Alain.