I have timescaledb version 2.2.1 configured on a few servers ( all single-node deployments) and it has been working great for the most part.
On one of the servers, however, which is a relatively less powerful machine and with a large 100TB mounted NAS drive, the compression jobs I scheduled seemed to stop working once I set them for a large DB.
It did work for the smaller databases earlier, but when I created the hypertables on the largest DB ( total size of 13TB - with one table at 9.7TB itself ) and set up the appropriate compression policy, it just never triggered, even after I manually altered the job with the alter_job command. The same thing happened to the other DBS ( timescaledbs enabled ). The scheduled jobs stopped working on them too around the same time ( It has 29th Sept as the last successful finish date - ie. 20 days ago ).
I have tried manually calling the job and it only compresses one chunk at a time. So I had to manually compress them currently as a quick fix.
Can anyone please help? I cannot seem to find any resource regarding this.
SELECT compress_chunk(i,if_not_compressed=>true) from
show_chunks('oa_odds_historic', older_than => INTERVAL '10 day') i ;
code: SELECT alter_job(job_id, next_start =>now())
-- select *
FROM timescaledb_information.jobs
WHERE proc_name = 'policy_compression';
server spec :
timescaledb jobs:
Thanks Jonatasdp.
Unfortunately, it was due to not having enough background workers -
timescaledb.max_background_workers ,which I increased to accomodate the total number of jobs ( = number of hypertables which had compression policies enabled ) on the server.
Related
My setup
Postgres 11 running on an AWS EC2 t4g.xlarge instance (4 vCPU, 16GB) running Amazon Linux.
Set up to take a nightly disk snapshot (my workload doesn't require high reliability).
Database has table xtc_table_1 with ~6.3 million rows, about 3.2GB.
Scenario
To test some new data processing code, I created a new test AWS instance from the nightly snapshot of my production instance.
I create a new UNLOGGED table, and populate it with INSERT INTO holding_table_1 SELECT * FROM xtc_table_1;
It takes around 2 min 24 sec for the CREATE statement to execute.
I truncate holding_table_1 and run the CREATE statement again, and it completes in 30 sec. The ~30 second timing is consistent for successive truncates and creates of the table.
I think this may be because of some caching of data. I tried restarting Postgres service, then rebooting the AWS instance (after stopping postgres with sudo service postgresql stop), then stopping and starting the AWS instance. However, it's still ~30 sec to create the table.
If I rebuild a new instance from the snapshot, the first time I run the CREATE statement it's back to the ~2m+ time.
Similar behavior for other tables xtc_table_2, xtc_table_3.
Hypothesis
After researching and finding this answer, I wonder if what's happening is that the disk snapshot contains some WAL data that is being replayed the first time I do anything with xtc_table_n. And that subsequently, because Postgres was shut down "nicely" there is no WAL to playback.
Does this sound plausible?
I don't know enough about Postgres internals to be sure. I would have imagined that any WAL playback would happen on starting up postgres, but maybe it happens at the individual table level the first time a table is touched?
Knowing the reason is more than just theoretical; I'm using the test instance to do some tuning on some processing code, and need to be confident in having a consistent baseline to measure from.
Let me know if more information is needed about my setup or what I'm doing.
#jellycsc's suggestion was correct; adding more info here in case it's helpful to anyone else.
The problem I was encountering was not a postgres issue at all, but because of the way AWS handles volumes and snapshots.
From this page:
For volumes that were created from snapshots, the storage blocks must
be pulled down from Amazon S3 and written to the volume before you can
access them. This preliminary action takes time and can cause a
significant increase in the latency of I/O operations the first time
each block is accessed. Volume performance is achieved after all
blocks have been downloaded and written to the volume.
I used the fio utility as described in the linked AWS page to initialize the restored volume, and first-time performance was consistent with subsequent query times.
On my primary I ran a VACUUM then an ANALYZE on all databases, then when I check pg_stat_user_tables, the last_analyze column shows a current timestamp which is great.
When I check my replication instance, there are no values in the last_analyze column. I was assuming this timestamp would also eventually populate? Is this known behaviour?
The reason I ask is that after that VACUUM/ANALYZE on the primary, I'm running into some extremely slow queries on the replication instance. I ran an EXPLAIN plan prior to the VACUUM/ANALYZE on a query and it ran in 5 seconds... now it's taking 65 seconds. The EXPLAIN shows it's not using a lot of indexes that it should be.
PostgreSQL has two different stats systems. One records data about the distribution of values in the columns, this is transactional. It propagates to the replica via the WAL.
The other system records data about turn over on the tables and data on when the last vac/an was done. This system is used to determine when to schedule new vac/an (to prevent the first system from getting too out of date). This one is not transactional, and does not propagate to the replica.
So the replica has the latest column value distribution statistics (as soon as the WAL replays, anyway), but it doesn't know how recent they are.
I have a scala job which had few CDC records ( 10K records) and as part of a merge job, it does member matching ( 4 million) from other Hive table which has 4 million members , provider matching from other Hive table which has 1 million providers and a bunch of other stuff ( recycle logic/rejection logic) which usually takes around 30 min to complete.
With the same volume I have added action ( count(*)) after every module to understand which one takes more time, however after using action it got completed in 6 min. Usually as per best practices we should not use action frequently, however I don't understand what's making the job run fast ? any link with explanation would be helpful.
Is it that since the resource may have got released after every module execution due to action, it's making overall job run fast.
My cluster is 6 node with 13 cores machine and 64GB memory, however there are other processes also running ..so it's usually overutilized.
We're experiencing some constant outages in our back-end that seem to correlate with peaks of high CPU usage for our Cloud SQL Postgres instance (v9.6)
Taking a look to the cloudsql.googleapis.com/postgres.log, those high CPU peaks seems to also correlate to when the database is running an automatic vacuum of table cloudsqladmin.public.heartbeat
We haven't found any documentation on what this table is and why is running autovacuum so often (our own tables doesn't seem to be affected by it).
Is this normal? Should we tune the values for the autovacuum? Thanks in advance.
By looking at your graphs there is no correlation between the CPU and the cloudsqladmin.public.heartbeat autovacuum.
Lets start by what the cloudsqladmin.public.heartbeat table is, this is a table used by the Cloud SQL High Availability process, this is better explained here:
Each second, the primary instance writes to a system database as a
heartbeat signal.
So the table is used internally to keep track of your instance's health. The autovacuum is triggered based on the doc David shared.
Now, if the Vacuum process generated the CPU spike, you would see the spike every minute/second.
So, straight answers to your questions:
Is this normal? : Yes, the autovacuum and the cloudsqladmin.public.heartbeat table are completely normal from a Cloud SQL internal perspective, they should not impact in any way the Instance.
Should we tune the values for the autovacuum? : No need for that, as mentioned, this process is not the one impacting the CPU Instance, you can hide the similar logs including "cloudsqladmin.public.heartbeat" and analyze the ones left on the time the Spike was presented.
It is worth looking at the backup processes triggered too (there could be one on the same time) Cloud SQL > Instance Details > Backups, but of course, that's a different topic than the one described here :) .
Here's a recommendation that seems very relevant to your situation: https://www.netiq.com/documentation/cloud-manager-2-5/ncm-install/data/vacuum.html
Some followups at the bottom
I have a test installation of Spark and Cassandra where I have 6 nodes with 128GiB of RAM and 16 CPU cores each. Each node runs Spark and Cassandra. I set up my keyspace with the SimpleStrategy and a replication factor of 3 (i.e., fairly standard).
My table is very simple, like this:
create table if not exists mykeyspace.values (channel_id timeuuid, day int, time bigint, value double, primary key ((channel_id, day), time)) with clustering order by (time asc)
time is simply a unix timestamp in nanoseconds (the measuring devices the values come from are that precise and this precision is wanted), day is this timestamp in days (i.e., days since 1970-01-01).
I now inserted about 200 GiB of values for about 400 channels and tested a very simple thing - calculate the 10-minute average of every channel:
sc.
cassandraTable("mykeyspace", "values").
map(r => (r.getLong("time"), r.getUUID("channel_id"), r.getDouble("value"))).
map(t => (t._1 / 600L / 1000000000L, t._2) -> (t._3, 1.0)).
reduceByKey((a, b) => (a._1 + b._1) -> (a._2 + b._2)).
map(t => (t._1._1 * 600L * 1000000000L, t._1._2, t._2._1 / t._2._2))
when I now do this calculation, even without saving the result (i.e., by using a simple count()) this takes a VERY long time and I have a very bad read performance.
When I do top on the nodes, Cassandra's java process takes about 800% CPU, which is OK because this is about half the load the node can take; the other half is taken by Spark.
However, I noticed a strange thing:
When I run iotop I expect to see a lot of disk read, but I see a lot of disk WRITE instead, all of which comes from kworker.
When I do iostat -x -t 10, I also see a lot of writes going on.
Swap is disabled.
When I run a similar calculation directly on the CSV files the data came from, which are stored in HDFS and loaded via sc.newAPIHadoopFile with a custom input format, the process finishes much faster (the calculation takes about an hour with Cassandra but about 5 minutes with files from HDFS).
So where can I start troubleshooting and tuning?
Followup 1
With the help of RussS (see comments) I discovered that logging was set to DEBUG. I disabled this, set logging to ERROR, and also disabled GC logging, but this did not change anything at all.
I also tried keyBy as the very same user pointed out, but this also did not change anything.
I also tried doing it locally, I tried it once from .net and once from Scala, and here, the database is accessed as expected, i.e., no writes.
Followup 2
I think I got it. For once, I didn't see the forest for the trees, because the hour I stated earlier for 200GiB is still about 56 MiB/s throughput. Since the hardware I run my installation on is far from optional (it is a high performance server which runs Microsoft HyperV which in turn runs the nodes virtually, and the hard disks of this machine are quite slow) this is indeed a throughput I expect. Since the host is just one machine with one RAID array where the disks of the nodes are virtual HDDs, I can't expect the performance to magically go through the roof.
I also tried to run Spark standalone which improves the performance a bit (I now get about 75 MiB/s), and also the constant writes are gone with this - I only get occasional spikes I expect because of shuffling.
For the CSV files being much faster, the reason is that the raw data in CSV is about 50 GiB and my custom FileInputFormat that reads it, does it line by line, and is also using a very fast string-to-double parser which only knows the US format but is faster than Java's parseDouble or Scala's toDouble. With this special tweaking I get about 170MiB/s in YARN mode.
So I suppose I should, for once, improve my CQL queries to limit the data that gets read, and try to tweak some YARN settings.