Loading tables with partitions in ora2pg - postgresql

I am having issues with bringing in selective data. I have a table that has 32 partitions and only want to import data for 2 partitions at a time. So i used the below directive in my *.conf file but when i execute ora2pg it brings in all the partitions. I also tried to use -t PARTITION option to exclude the ones i don't need but it doesn't bring any of them.
REPLACE_QUERY PH*** [SELECT * FROM PH** WHERE TIMECREATED between '2021-03-01' and '2021-06-01']
$ ora2pg -t COPY -o data.sql -b ./data -c $HOME/o2p/oracle_service_name/config/ora2pg.conf -l data_ext.log -t PARTITION -e 'PARTITION[INVOICES_Q1 NVOICES_Q10 INVOICES_Q11 INVOICES_Q12 INVOICES_Q13 INVOICES_Q14 INVOICES_Q15 INVOICES_Q16 INVOICES_Q17 INVOICES_Q18 INVOICES_Q19 INVOICES_Q2 INVOICES_Q20 INVOICES_Q21 INVOICES_Q22 INVOICES_Q23 INVOICES_Q26 INVOICES_Q27 INVOICES_Q28 INVOICES_Q29 INVOICES_Q3 INVOICES_Q30 INVOICES_Q31 INVOICES_Q32 INVOICES_Q4 INVOICES_Q5 INVOICES_Q6 INVOICES_Q7 INVOICES_Q8]'
[========================>] 0/0 partitions (100.0%) end of output.

Related

Kafkacat consume between timestamp giving wrong results when counting records

I want to count the number of messages in a given Kafka topic between two timestamps. I tried doing this using kafkacat, using the following command:
# START_DATE = 01.04.2022 02:00:00Z
# END_DATE = 01.04.2022 02:05:00Z
$ kafkacat -C -b broker:9092 -t mytopic -o s#1648778400000 -o e#1648778700000 -p 0 -f '[ts %T] [partition %p] [offset %o] %k\n' -e -c 1
In fact, this is the same approach that is listed as the answer in a very similar question.
According to kafkacat --help:
Consumer options:
-o <offset> Offset to start consuming from:
beginning | end | stored |
<value> (absolute offset) |
-<value> (relative offset from end)
s#<value> (timestamp in ms to start at)
e#<value> (timestamp in ms to stop at (not included))
Correspondingly, I would expect the above command to give me the first record that has a timestamp greater than s#<value> and smaller than e#<value>. However, it instead gives me a record that has a timestamp prior to s#<value> (in fact, it just gives me the first record in partition 0):
# output of above command
[ts 1648692486141] [partition 0] [offset 2] 643b0013-b3e1-47a5-a9d3-7478c0e91ca4
Am I misunderstanding the consumer options s#<value> and e#<value>?
Kafkacat version:
Version 1.5.0 (JSON, librdkafka 1.2.1 builtin.features=gzip,snappy,ssl,sasl,regex,lz4,sasl_gssapi,sasl_plain,sasl_scram,plugins,sasl_oauthbearer)
Additionally, I'm seeing some odd behaviour even with just s#<value>. For example:
kafkacat -C -b broker:9092 -t mytopic -o s#1648778400000 -p 0 -f '[ts %T] [partition %p] [offset %o] %k\n' -e -c 1
should, as I understand it, output the first record with record.timestamp ≥ 1648778400000. The actual output is different:
[ts 1648692486141] [partition 0] [offset 2] 643b0013-b3e1-47a5-a9d3-7478c0e91ca4
and contains a timestamp prior to the one I set (31.03.2022 02:08:06Z vs. 01.04.2022 02:00:00Z).
This output is the same when I tested using docker run edenhill/kcat:1.7.1 (the above was an Ubuntu kafkacat)
I don't think you can provide -o multiple times. Therefore, your options include
-o e#1648778700000 -p 0 -c 1
To read one message from partition 0, which is less than timestamp 1648778700000
To properly consume between timestamps, find the offsets for the start timestamp, commit them to a consumer group, then start a consumer in the group with your end timestamp

how to do pg_dump for a main partition table without related partition tables

I am trying to do pg_dump with just main partition table but seeing the entire existing partitioned tables.
pg_dump -T ‘public.”2021”’ -s -O -x master_database -h host -U user > master.sql

How do I prevent PSQL from outputting the number of rows?

Currently I can almost get a CSV from psql by simply running
psql -A -t -F'\t' -c "SELECT ...;" > myfile.csv
However it returns the number of rows at the end of the file. I can fix his with head -n -1
psql -A -t -F'\t' | head -n -1 | -c "SELECT ...;"
But with very large files seems like overkill. Is here a flag in psql where I can turn off number of records returned?
There is a number of common ways to get a CSV from PostrgeSQL (see e.g. this question). However not all of these ways are appropriate when working with Redshift, partly because Amazon Redshift is based on Postgres 8.0.2.
One can try to use --pset="footer=off" option to prevent psql from outputting the number of rows. Also, please consult 8.0.26 psql documentation.

wget --warc-file --recursive, prevent writing individual files

I run wget to create a warc archive as follows:
$ wget --warc-file=/tmp/epfl --recursive --level=1 http://www.epfl.ch/
$ l -h /tmp/epfl.warc.gz
-rw-r--r-- 1 david wheel 657K Sep 2 15:18 /tmp/epfl.warc.gz
$ find .
./www.epfl.ch/index.html
./www.epfl.ch/public/hp2013/css/homepage.70a623197f74.css
[...]
I only need the epfl.warc.gz file. How do I prevent wget to creating all the individual files?
I tried as follows:
$ wget --warc-file=/tmp/epfl --recursive --level=1 --output-document=/dev/null http://www.epfl.ch/
ERROR: -k or -r can be used together with -O only if outputting to a regular file.
tl;dr Add the options --delete-after and --no-directories.
Option --delete-after instructs wget to delete each downloaded file immediately after its download is complete. As a consequence, the maximum disk usage during execution will be the size of the WARC file plus the size of the single largest downloaded file.
Option --no-directories prevents wget from leaving behind a useless tree of empty directories. By default wget creates a directory tree that mirrors the one on the host, and downloads each file into the appropriate directory of the mirrored tree. wget does this even when the downloaded file is temporary due to --delete-after. To prevent that, use option --no-directories.
The below demonstrates the result, using your given example (slightly altered).
$ cd $(mktemp -d)
$ wget --delete-after --no-directories \
--warc-file=epfl --recursive --level=1 http://www.epfl.ch/
...
Total wall clock time: 12s
Downloaded: 22 files, 1.4M in 5.9s (239 KB/s)
$ ls -lhA
-rw-rw-r--. 1 chadv chadv 1.5M Aug 31 07:55 epfl.warc
If you forget to use --no-directories, you can easily clean up the tree of empty directories with find -type d -delete.
For individual files (without --recursive) the option -O /dev/null will make wget not to create a file for the output. For recursive fetches /dev/null is not accepted (don't know why). But why not just write all the output concatenated into one single file via -O tmpfile and delete this file afterwards?

How to dump a postgres db excluding one specific table?

I'd like to use pg_dump to backup postgres database content. I only want to ignore one specific table containing cached data of several hundred GB.
How could I achieve this with pg_dump?
According to the docs, there is an option to --exclude-table which excludes tables from the dump by matching on a pattern (i.e. it allows wildcards):
-T table
--exclude-table=table Do not dump any tables matching the table pattern. The pattern is interpreted according to the same rules as for
-t. -T can be given more than once to exclude tables matching any of several patterns.
When both -t and -T are given, the behavior is to dump just the tables
that match at least one -t switch but no -T switches. If -T appears
without -t, then tables matching -T are excluded from what is
otherwise a normal dump.
There are a few examples here.
You can also add the same in the script
#!/bin/bash
CKUPNUM=3
BACKUPDIR=/home/utrade/dbbackup
DBBACKUP_FILENAME=Database_dump.sql
TARFILE=Database_dump_$(date +%d%h%y).tgz
#####Variables Set
DBUSER=mutrade
DBPASSWD=utrade123
DBNAME=mutradedb
cd $BACKUPDIR
export PGPASSWORD=$DBPASSWD
/usr/pgsql-11/bin/pg_dump -f $DBBACKUP_FILENAME $DBNAME --exclude-table-data='appmaster.ohlc_*' -U $DBUSER
tar czf $TARFILE $DBBACKUP_FILENAME
rm -f $DBBACKUP_FILENAME
#removing old/Extra backups
backups_count=`ls -lrt | wc -l`
if [[ $backups_count -gt $BACKUPNUM ]]
then
find $BACKUPDIR -mtime +30 -type f -delete
fi