Tollef Fog Heen's blog

tfheen Thu, 27 Jun 2013 - Getting rid of NSCA using Python and Chef

NSCA is a tool used to submit passive check results to nagios. Unfortunately, an incompatibility was recently introduced between wheezy clients and old servers. Since I don't want to upgrade my server, this caused some problems and I decided to just get rid of NSCA completely.

The server side of NSCA is pretty trivial, it basically just adds a timestamp and a command name to the data sent by the client, then changes tabs into semicolons and stuffs all of that down Nagios' command pipe.

The script I came up with was:

#! /usr/bin/python
# -* coding: utf-8 -*-

import time
import sys

# format is:
# [TIMESTAMP] COMMAND_NAME;argument1;argument2;…;argumentN
#
# For passive checks, we want PROCESS_SERVICE_CHECK_RESULT with the
# format:
#
# PROCESS_SERVICE_CHECK_RESULT;<host_name>;<service_description>;<return_code>;<plugin_output>
#
# return code is 0=OK, 1=WARNING, 2=CRITICAL, 3=UNKNOWN
#
# Read lines from stdin with the format:
# $HOSTNAME\t$SERVICE_NAME\t$RETURN_CODE\t$TEXT_OUTPUT

if len(sys.argv) != 2:
    print "Usage: {0} HOSTNAME".format(sys.argv[0])
    sys.exit(1)
HOSTNAME = sys.argv[1]

timestamp = int(time.time())
nagios_cmd = file("/var/lib/nagios3/rw/nagios.cmd", "w")
for line in sys.stdin:
    (_, service, return_code, text) = line.split("\t", 3)
    nagios_cmd.write(u"[{timestamp}] PROCESS_SERVICE_CHECK_RESULT;{hostname};{service};{return_code};{text}\n".format
                     (timestamp = timestamp,
                      hostname = HOSTNAME,
                      service = service,
                      return_code = return_code,
                      text = text))

The reason for the hostname in the line (even though it's overridden) is to be compatible with send_nsca's input format.

Machines submit check results over SSH using its excellent ForceCommand capabilities, the Chef template for the authorized_keys file looks like:

<% for host in @nodes %>
command="/usr/local/lib/nagios/nagios-passive-check-result <%= host[:hostname] %>",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa <%= host[:keys][:ssh][:host_rsa_public] %> <%= host[:hostname] %>
<% end %>

The actual chef recipe looks like:

nodes = []
search(:node, "*:*") do |n|
  # Ignore not-yet-configured nodes                                                                       
  next unless n[:hostname]
  next unless n[:nagios]
  next if n[:nagios].has_key?(:ignore)
  nodes << n
end
nodes.sort! { |a,b| a[:hostname] <=> b[:hostname] }
print nodes

template "/etc/ssh/userkeys/nagios" do
  source "authorized_keys.erb"
  mode 0400
  variables({
              :nodes => nodes
            })
end

cookbook_file "/usr/local/lib/nagios/nagios-passive-check-result" do
  mode 0555
end

user "nagios" do
  action :manage
  shell "/bin/sh"
end

To submit a check, hosts do:

printf "$HOSTNAME\t$SERVICE_NAME\t$RET\t$TEXT\n" | ssh -i /etc/ssh/ssh_host_rsa_key -o BatchMode=yes -o StrictHostKeyChecking=no -T nagios@$NAGIOS_SERVER
[10:09] | tech | Getting rid of NSCA using Python and Chef

Tollef Fog Heen <tfheen@err.no>