how to properly create my own custom items and triggers for zabbix 4 - postgresql

:)
I have Zabbix 4.4.1 installed on Ubuntu 19.10.
I have a postgresql plugin configured and working properly so it checks my database metrics.
I have a a table that I want to check a timestamp column for the last inserted row. column name is insert_time.
if the last inserted row have a insert time of more then 5 minutes to product warning and 10 minutes to product error.
I'm new to zabbix.. all I did so far is for googling, not sure if that's the way to go.. it's probably not cause it's not working :)
ok so first thing I did is created a bash files at /etc/zabbix/mytools, get-last-insert-time.sh.
I perform the query and send the output to zabbix_sender with the following template:
#!/bin/bash
PGPASSWORD=<PASSWORD> -U <USER> <DB> -t -c "<RELEVANT QUERY>" | awk '{$1=$1};1' | tr -d "\n" | xargs -I {} /usr/bin/zabbix_sender -z $ZABBIXSERVER -p $ZABBIXPORT -s $ZABBIXAGENT -k "my.pgsql.cdr.last_insert_time" -o {}
is there a way to test this step? how can I make sure that zabbix_sender receives that information? is there some kind of.. zabbix_sender_sniffer of some sort ? :)
next.. I created a configuration files at /etc/zabbix/zabbix_agentd.d called get-last-insert-time.conf with the following template:
UserParameter=my_pgsql_cdr_last_insert_time,/etc/zabbix/mytools/get-last-insert-time.sh;echo $?
here the key is my_pgsql_cdr_last_insert_time while the key in zabbix_sender is my.pgsql.cdr.last_insert_time. as far as I understand these should be two different keys.
why?!
then I created a template and attached it to the relevant host and I created 2 items for it:
item for insert time with the key my.pgsql.cdr.last_insert_time and of type Zabbix Trapper
a Zabbix Agent item with the key my_pgsql_cdr_last_insert_time Type of information: text.
is that the type of information for timestamp ?
now on Overview -> latest data I see:
CDR last insert time with no data
and Run my database trappers insert time that is... the text is disabled? it's in gray.. and there is also no data.
so before I begin to create an alert. what did I do wrong ?
any information regarding this issue would be greatly appreciated.
update
thanks Jan Garaj for this valuable information.
I was expecting that creating such a trigger should be easier then what I found on google, glad to see I was correct.
I edited my bash scripts to return seconds since epoch, since it's from postgresql it returns float, so I configured the items as float. I do see in latest data that the items receive the proper values.
I created triggers, I made sure that the warning trigger depends on the critical trigger so they won't both appear on the same time.
for example I created this trigger {cdrs:pgsql.cdr.last_insert_time.fuzzytime(300)}=0 so if the last insert time is greater then 5 minutes to return a critical error. the problem is that it returns a critical error.. always! even when it shouldn't. I couldn't find a way to debug this. so besides actually getting the triggers to work properly everything else is well configured.
any ideas ?
update 2
when I configured the script to return a timestamp, I changed it to a different timezone instead of living it as it is, which actually compared the data with current time + 2 hours in the future :)
I found that out while going to latest data, checking the timestamp and converting it to actual time. so everything works now thanks a lot!

It looks over complicated, because you are mixing sender with agent approach. Simpler approach - agent only:
UserParameter=pgsql.cdr.last_insert_time,/etc/zabbix/mytools/get-last-insert-time.sh
Script /etc/zabbix/mytools/get-last-insert-time.sh returns last insert Unix timestamp only, e.g.1574111464 (no new line and don't use zabbix_sender in the script). Keep in mind, that zabbix agent uses zabbix user usually, so you need to configure proper script (exec) permissions, eventually env variables.
Test it with zabbix-get from the Zabbix server, e.g.:
zabbix_get -s <HOST IP> -p 10050 -k "pgsql.cdr.last_insert_time"
For any issue on the agent side: increase log agent level and watch agent logs
When you sort agent part, then create template with item key pgsql.cdr.last_insert_time and Numeric (unsigned) type. Trigger can use fuzzytime(60) function.

Related

pg_restore -F d -j {n} internal steps and estimation of time

I'm running a looooong pg_restore process of a database with 70 tables and 800Gb. The process is taking 5 days now. I'm monitoring some aspects of the process to evaluate how long will it take but I've some things missing and this is why I'm asking.
I run pg_dump with parameters -F d -j 10 the dump took about 12 hours. I noticed each one of the 10 threads took responsibility of a single table from start to end. After ending of processing a single table, the same process (pid) started with another table not taken by another process.
Running pg_restore is taking much longer (5 days and still working). The main reason is that I'm restoring to a NAS external drive mounted using nfs and that drive is very slow compared to a local hard drive. This is NOT a problem, I'll migrate the information back from the NAS to the original hard drive once I format the hard drive again and install the new operating system.
I'm doing two things to monitor progress:
In a separate terminal I launch du -sh /var/lib/pgsql and evaluate the disk space consumed in the new installation. It has to reach, more or less, the same space the original database was using.
In a separate terminal I launch ps -fu postgress and I see several pg_restore processes running. Each one of then linked with another process with this shape postgress: postress {dbname} [local] {command} where {dbname} is the database name, and {command} varies. Initially, there was the COPY command I think that was used to restore the table content. I also saw some CREATE INDEX commands for re-creating the indexes of that table, and now I see ALTER TABLE commands, don't know exactly for what.
At this time, all processes are just doing ALTER TABLE and the overall used space almost matches the initial space, but the process does not ends (and it is taking 5 days now).
So I'm asking if someone has more experience and can tell me what pg_restore is doing with the ALTER_TABLE command and if there is any other mechanism to estimate how long will it take.
Thanks!
Ignacio
The ALTER TABLE statements at the end of a pg_restore create primary and unique keys as well as foreign key constraints. They could also be attaching partitions, but that is normally very fast.
Look into pg_stat_progress_create_index if you have a recent enough PostgreSQL version (you didn't say), then you can monitor the progress of primary and unique key indexes being created.

Is it possible to run automatically a query after postgres restarts?

i would like to run a small query every time postgres is restarted, is this possible?
I have found that is possible to do that every time psql is launched, using .psqlrc but that does not address my need
thanks
For those coming here looking for a solution, as of today (2020, postrgresql 12) it is not possible to configure postgres to always run a given script/query.
You can of course use your own launch script but you can not prevent, it seems, other people restarting their way if they have the right permissions.

new Sphinx version attempts a non-existing connection

I recently upgraded sphinx to version 2.2.11 on Ubuntu.
THen I started getting daily emails where a process is attempting to connect and generating this error:
ERROR: index 'test1stemmed': sql_connect: Access denied for user 'test'#'localhost'
ERROR: index 'test1': sql_connect: Access denied for user 'test'#'localhost'
The email warning has a topic which I assume is the info regarding the root of the problem
. /etc/default/sphinxsearch && if [ "$START" = "yes" ] && [ -x /usr/bin/indexer ]; then /usr/bin/indexer --quiet --rotate --all; fi
so /etc/default/sphinxsearch does have the start variable as yes.
but the /usr/bin/indexer is total gibberish.
Such a user never existed on the system AFAIK.
It would be interesting to know how this process got generated, but more importantly
How can this process be safely stopped?
I've seen that happen, it comes from the Sphinx install 'package'. Whoever setup that package, created a cron task that does that indexer --all command, that just tries to reindex every index (once a day IIRC). The package maintainer thought they being helpful :)
From https://packages.ubuntu.com/bionic/ppc64el/sphinxsearch/filelist
looks like it might be in
/etc/cron.d/sphinxsearch
You could remove that cron task, if dont want it.
Presumably you already have some other process for actually updating your actual real 'live' indexes. (either dedicated cron tasks, or maybe use RT indexes or whatever)
Also it seems you still have these 'test' indexes in your sphinx.conf. Maybe left over from the initial installation. Installing a new package I dont think would overwrite sphinx.conf to add them later?
May want to clear them out of your sphinx.conf if don't use them, could simplify the file.
(although possibly still want to get of the --all cron, which just blindly reindexes everything daily!)

Scheduling a stored procedure in PostgreSQL 9.2.8

I've a simple stored procedure which calculates values from 1 table and insert it in another. I want to schedule it to run once in a day.
I came across pg_cron but it looks like it will only work for version 9.5 and above.
How to to schedule this sp or it;s select statement select * from stored_procedure_name() in postgres
as mentioned by #AlexM I started looking in Cron and found few useful links to do this outside of postgresql.
crontab in linux with 20 useful examples helps me out in understanding the structure for creating a new entry in the crontab.
edit the crontab file and added the following entry in it. As it's in the same server so no need to pass credentials for the postgresql
00 00 * * * psql -c "select query here;"
Unfortunately, the previous comments are correct. There is no scheduler within PostgreSQL nor any of the supplied utilities. Your only option is to use an external scheduler.

missing chunk number 0 for toast value 37946637 in pg_toast_2619

Main Issue:
Getting "ERROR: missing chunk number 0 for toast value 37946637 in pg_toast_2619" while selecting from tables.
Steps that led to the issue:
- Used pg_basebackup from a Primary db and tried to restore it onto a Dev host.
- Did a pg_resetxlog -f /${datadir} and started up the Dev db.
- After starting up the Dev db, when I query a varchar column, I keep getting:
psql> select text_col_name from big_table;
ERROR: missing chunk number 0 for toast value 37946637 in pg_toast_2619
This seems to be happening for most varchar columns in the restored db.
Has anyone else seen it?
Does anyone have ideas of why it happens and how to fix it?
pg_resetxlog is a bit of a last resort utility which you should prefer not to use. Easiest way to make a fully working backup is to use pg_basebackup with the -X s option. That is an uppercase X. What this does is that basebackup opens two connections. One to copy all the data files and one to receive all of the wal that is written during the duration of the backup. This way you cannot run into the problem that parts of the wal you need are already deleted.
I tried a few things since by original question. I can confirm that the source of my error "ERROR: missing chunk number 0 for toast value 37946637 in pg_toast_2619" was doing a pg_resetxlog during the restore process.
I re-did the restore today but this time, applied the pg_xlog files from Primary using recovery.conf. The restored db started up fine now and all queries are running as expected.