This is an attempt to add some simple monitoring so if the Pagure task queue starts backing up, we know about it.
---
commit f207778a0e5aceba6c18885be7478317aaeab4b0 Author: Ricky Elrod codeblock@fedoraproject.org Date: Fri May 26 23:09:02 2017 +0000
add simple monitoring for pagure's celery redis queue
Signed-off-by: Ricky Elrod codeblock@fedoraproject.org
diff --git a/roles/nagios_client/files/scripts/check_redis_queue.sh b/roles/nagios_client/files/scripts/check_redis_queue.sh new file mode 100644 index 0000000..ca1f186 --- /dev/null +++ b/roles/nagios_client/files/scripts/check_redis_queue.sh @@ -0,0 +1,23 @@ +#!/usr/bin/env bash + +. /usr/lib64/nagios/plugins/utils.sh + +if [[ "$#" -ne 3 ]]; then + echo "Arguments: key warn crit" + exit $STATE_UNKNOWN +fi + +tasks="$(redis-cli llen "$1" | awk '{print $1}')" + +check_range $tasks $2:$3 +status=$? + +if [[ "$status" == "$STATE_OK" ]]; then + echo "OK: $1 queue has $tasks tasks" +elif [[ "$status" == "$STATE_WARNING" ]]; then + echo "WARNING: $1 queue has $tasks tasks" +elif [[ "$status" == "$STATE_CRITICAL" ]]; then + echo "CRITICAL: $1 queue has $tasks tasks" +fi + +exit $status diff --git a/roles/nagios_client/tasks/main.yml b/roles/nagios_client/tasks/main.yml index 714be36..22ddd4f 100644 --- a/roles/nagios_client/tasks/main.yml +++ b/roles/nagios_client/tasks/main.yml @@ -73,6 +73,7 @@ - check_osbs_builds.py - check_osbs_api.py - check_ipa_replication + - check_redis_queue.sh when: not inventory_hostname.startswith('noc') tags: - nagios_client @@ -146,6 +147,7 @@ - check_koschei_watcher_proc.cfg - check_testcloud.cfg - check_mirrorlist_docker_proxy.cfg + - check_celery_redis_queue.cfg notify: - restart nrpe tags: diff --git a/roles/nagios_client/templates/check_celery_redis_queue.cfg.j2 b/roles/nagios_client/templates/check_celery_redis_queue.cfg.j2 new file mode 100644 index 0000000..56279f3 --- /dev/null +++ b/roles/nagios_client/templates/check_celery_redis_queue.cfg.j2 @@ -0,0 +1 @@ +command[check_celery_redis_queue]=/usr/lib64/nagios/plugins/check_redis_queue.sh celery 5 10 diff --git a/roles/nagios_server/files/nagios/services/pagure_redis.cfg b/roles/nagios_server/files/nagios/services/pagure_redis.cfg new file mode 100644 index 0000000..d5387d0 --- /dev/null +++ b/roles/nagios_server/files/nagios/services/pagure_redis.cfg @@ -0,0 +1,6 @@ +define service { + host_name pagure01.fedoraproject.org + service_description Redis/celery queue + check_command check_by_nrpe!check_celery_redis_queue + use defaulttemplate +}
+1
On May 26, 2017 7:12 PM, "Ricky Elrod" codeblock@fedoraproject.org wrote:
This is an attempt to add some simple monitoring so if the Pagure task queue starts backing up, we know about it.
commit f207778a0e5aceba6c18885be7478317aaeab4b0 Author: Ricky Elrod codeblock@fedoraproject.org Date: Fri May 26 23:09:02 2017 +0000
add simple monitoring for pagure's celery redis queue Signed-off-by: Ricky Elrod <codeblock@fedoraproject.org>
diff --git a/roles/nagios_client/files/scripts/check_redis_queue.sh b/roles/nagios_client/files/scripts/check_redis_queue.sh new file mode 100644 index 0000000..ca1f186 --- /dev/null +++ b/roles/nagios_client/files/scripts/check_redis_queue.sh @@ -0,0 +1,23 @@ +#!/usr/bin/env bash
+. /usr/lib64/nagios/plugins/utils.sh
+if [[ "$#" -ne 3 ]]; then
- echo "Arguments: key warn crit"
- exit $STATE_UNKNOWN
+fi
+tasks="$(redis-cli llen "$1" | awk '{print $1}')"
+check_range $tasks $2:$3 +status=$?
+if [[ "$status" == "$STATE_OK" ]]; then
- echo "OK: $1 queue has $tasks tasks"
+elif [[ "$status" == "$STATE_WARNING" ]]; then
- echo "WARNING: $1 queue has $tasks tasks"
+elif [[ "$status" == "$STATE_CRITICAL" ]]; then
- echo "CRITICAL: $1 queue has $tasks tasks"
+fi
+exit $status diff --git a/roles/nagios_client/tasks/main.yml b/roles/nagios_client/tasks/main.yml index 714be36..22ddd4f 100644 --- a/roles/nagios_client/tasks/main.yml +++ b/roles/nagios_client/tasks/main.yml @@ -73,6 +73,7 @@
- check_osbs_builds.py
- check_osbs_api.py
- check_ipa_replication
when: not inventory_hostname.startswith('noc') tags:
- check_redis_queue.sh
- nagios_client
@@ -146,6 +147,7 @@
- check_koschei_watcher_proc.cfg
- check_testcloud.cfg
- check_mirrorlist_docker_proxy.cfg
notify:
- check_celery_redis_queue.cfg
tags:
- restart nrpe
diff --git a/roles/nagios_client/templates/check_celery_redis_queue.cfg.j2 b/roles/nagios_client/templates/check_celery_redis_queue.cfg.j2 new file mode 100644 index 0000000..56279f3 --- /dev/null +++ b/roles/nagios_client/templates/check_celery_redis_queue.cfg.j2 @@ -0,0 +1 @@ +command[check_celery_redis_queue]=/usr/lib64/nagios/ plugins/check_redis_queue.sh celery 5 10 diff --git a/roles/nagios_server/files/nagios/services/pagure_redis.cfg b/roles/nagios_server/files/nagios/services/pagure_redis.cfg new file mode 100644 index 0000000..d5387d0 --- /dev/null +++ b/roles/nagios_server/files/nagios/services/pagure_redis.cfg @@ -0,0 +1,6 @@ +define service {
- host_name pagure01.fedoraproject.org
- service_description Redis/celery queue
- check_command check_by_nrpe!check_celery_redis_queue
- use defaulttemplate
+} _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists. fedoraproject.org
On Fri, May 26, 2017 at 07:12:11PM -0400, Ricky Elrod wrote:
This is an attempt to add some simple monitoring so if the Pagure task queue starts backing up, we know about it.
+1 for me
Pierre
commit f207778a0e5aceba6c18885be7478317aaeab4b0 Author: Ricky Elrod codeblock@fedoraproject.org Date: Fri May 26 23:09:02 2017 +0000
add simple monitoring for pagure's celery redis queue Signed-off-by: Ricky Elrod <codeblock@fedoraproject.org>
diff --git a/roles/nagios_client/files/scripts/check_redis_queue.sh b/roles/nagios_client/files/scripts/check_redis_queue.sh new file mode 100644 index 0000000..ca1f186 --- /dev/null +++ b/roles/nagios_client/files/scripts/check_redis_queue.sh @@ -0,0 +1,23 @@ +#!/usr/bin/env bash
+. /usr/lib64/nagios/plugins/utils.sh
+if [[ "$#" -ne 3 ]]; then
- echo "Arguments: key warn crit"
- exit $STATE_UNKNOWN
+fi
+tasks="$(redis-cli llen "$1" | awk '{print $1}')"
+check_range $tasks $2:$3 +status=$?
+if [[ "$status" == "$STATE_OK" ]]; then
- echo "OK: $1 queue has $tasks tasks"
+elif [[ "$status" == "$STATE_WARNING" ]]; then
- echo "WARNING: $1 queue has $tasks tasks"
+elif [[ "$status" == "$STATE_CRITICAL" ]]; then
- echo "CRITICAL: $1 queue has $tasks tasks"
+fi
+exit $status diff --git a/roles/nagios_client/tasks/main.yml b/roles/nagios_client/tasks/main.yml index 714be36..22ddd4f 100644 --- a/roles/nagios_client/tasks/main.yml +++ b/roles/nagios_client/tasks/main.yml @@ -73,6 +73,7 @@
- check_osbs_builds.py
- check_osbs_api.py
- check_ipa_replication
when: not inventory_hostname.startswith('noc') tags:
- check_redis_queue.sh
- nagios_client
@@ -146,6 +147,7 @@
- check_koschei_watcher_proc.cfg
- check_testcloud.cfg
- check_mirrorlist_docker_proxy.cfg
notify:
- check_celery_redis_queue.cfg
tags:
- restart nrpe
diff --git a/roles/nagios_client/templates/check_celery_redis_queue.cfg.j2 b/roles/nagios_client/templates/check_celery_redis_queue.cfg.j2 new file mode 100644 index 0000000..56279f3 --- /dev/null +++ b/roles/nagios_client/templates/check_celery_redis_queue.cfg.j2 @@ -0,0 +1 @@ +command[check_celery_redis_queue]=/usr/lib64/nagios/plugins/check_redis_queue.sh celery 5 10 diff --git a/roles/nagios_server/files/nagios/services/pagure_redis.cfg b/roles/nagios_server/files/nagios/services/pagure_redis.cfg new file mode 100644 index 0000000..d5387d0 --- /dev/null +++ b/roles/nagios_server/files/nagios/services/pagure_redis.cfg @@ -0,0 +1,6 @@ +define service {
- host_name pagure01.fedoraproject.org
- service_description Redis/celery queue
- check_command check_by_nrpe!check_celery_redis_queue
- use defaulttemplate
+} _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org
infrastructure@lists.fedoraproject.org