Is there any similar function with MySQL trigger in IBM Netezza? - triggers

As you can see on the title, I want to know the similar function with MySQL's trigger function. What actually I want to do is importing data from IBM Netezza Databases using sqoop incremental mode. Below is the sqoop scripts what I'm going to use.
sqoop job --create dhjob01 -- import --connect jdbc:netezza://10.100.3.236:5480/TEST \
--username admin --password password \
--table testm \
--incremental lastmodified \
--check-column 'modifiedtime' --last-value '1995-07-18' \
--target-dir /user/dhlee/nz_sqoop_test \
-m 1
As the official Sqoop documentation says, I can gather data from RDBs with incremental mode by making a sqoop import job and execute it recursively.
Anyway the point is, I need a function like MySQL trigger so that I can update the modified date whenever tables in Netezza are updated. And if you have any great idea that I can gather the data incrementally, please tell me. Thank you.

Unfortunately there isn't anything similar to triggers available. I would recommend modifying the relevant UPDATE commands to include setting a column to CURRENT_TIMESTAMP

In Netezza you have something even better:
- Deleted records is still possible to see http://dwgeek.com/netezza-recover-deleted-rows.html/
- the INSERT- and DELETE-TXID are a rising number (and visible on all records as described above)
- updates are really a delete plus an insert
Can you follow me?

enter image description here
This is the screen shot that I've got after I inserted and deleted some rows.

Related

How to run a sql statements sequentially?

I am running a set of SQL statements sequentially in the AWS Redshift Query editor.
sql-1
sql-2
sql-3
....
sql-N
However, in the Redshift Query editor, I cannot run multiple SQL statements. So currently I am running SQL statements one by one manually.
What is the alternative approach for me? For me looks like, I can use the DBeaver.
Is there more programmatic approach i.e. just by using a simple bash script?
If you have a Linux instance that can access the cluster you can use the psql command line tool. For example:
yum install postgresql
psql -h my-cluster.cjmul6ivnpa4.us-east-2.redshift.amazonaws.com \
-p 5439 \
-d my_db \
-f my_sql_script.sql
We recently announced a way to schedule queries: https://aws.amazon.com/about-aws/whats-new/2020/10/amazon-redshift-supports-scheduling-sql-queries-by-integrating-with-amazon-eventbridge/ And even more recently published this blog post that walks you through all the steps using the Console or the CLI: https://aws.amazon.com/blogs/big-data/scheduling-sql-queries-on-your-amazon-redshift-data-warehouse/ Hope these links help.

Sqoop import from redshift

Just as the title says, trying to move some data from Redshift to S3 via Sqoop:
sqoop-import -Dmapreduce.job.user.classpath.first=true --connect "jdbc:redshift://redshiftinstance.us-east-1.redshift.amazonaws.com:9999/stuffprd;database=ourDB;user=username;password=password;" --table ourtable -m 1 --as-avrodatafile --target-dir s3n://bucket/folder/folder1/
All drivers are in the proper folders however the error being throw is:
ERROR tool.BaseSqoopTool: Got error creating database manager: java.io.IOException: No manager for connect string:
Not sure if you already have got the answer to this, but you need to add the following to your sqoop command:
--driver com.amazon.redshift.jdbc42.Driver
--connection-manager org.apache.sqoop.manager.GenericJdbcManager
I can't help with the error but I recommend you not do it this way. Sqoop will try retrieve the table as SELECT * and all results will have to pass through the leader node. This will be much slower than using UNLOAD to export the data directly to S3 in parallel. You can then convert the unloaded text files to Avro using Sqoop.

REstore only triggers postgres

I made a backup of a database of postgres, it is not the first time I do it, I used this command:
pg_dump db -f /backup/agosto_31.sql
And I do the restore with this:
psql -d August_31 -f August_31.sql
But this time I did not import any trigger, and there are many. I checked in the file August_31.sql and they are. How could I import them again? Only the triggers.
Thanks everyone, greetings!
There are not any possibility to import only triggers. But, because your dump (backup file) is in SQL format (plain text), you can cut trigger definition manually in any editor. For this case is practical dump data and schema to separate files.
Is possible so import (restore from backup) fails, probably due broken dependencies. Check psql output if there are some errors. psql has nice possibility to stop on first error:
psql -v ON_ERROR_STOP=1
Use this option. Without error specification is not possible to help more.

Sqoop export duplicates

Will sqoop export create duplicates when the number of mappers is higher than the number of blocks in the source hdfs location?
My source hdfs directory has 24 million records and when I do a sqoop export to Postgres table, it somehow creates duplicate records. I have set the number of mappers as 24. There are 12 blocks in the source location.
Any idea why the sqoop is creating duplicates?
Sqoop Version: 1.4.5.2.2.9.2-1
Hadoop Version: Hadoop 2.6.0.2.2.9.2-1
Sqoop Command Used-
sqoop export -Dmapred.job.queue.name=queuename \
--connect jdbc:postgresql://ServerName/database_name \
--username USER --password PWD \
--table Tablename \
--input-fields-terminated-by "\001" --input-null-string "\\\\N" --input-null-non-string "\\\\N" \
--num-mappers 24 -m 24 \
--export-dir $3/penet_baseline.txt -- --schema public;
bagavathi you mentioned that duplicate rows were seen in target table and when you tried to add PK constraint, it failed due to PK violation, further, the source does not have duplicate rows. One possible scenario is that your Target table could already have records which maybe because of a previous incomplete sqoop job. Please check whether target table has key which is also in source.
One workaround for this scenario is, use parameter "--update-mode allowinsert". In your query, add these parameters, --update-key --update-mode allowinsert. This will ensure that if key is already present in table then the record will get updated else if key is not present then sqoop will do an insert.
No sqoop does not export records twice and it has nothing to do with the number of mappers and the number of blocks.
Look at pg_bulkload connector of sqoop for faster data transfer between hdfs and postgres.
pg_bulkload connector is a direct connector for exporting data into PostgreSQL. This connector uses pg_bulkload. Users benefit from functionality of pg_bulkload such as fast exports bypassing shared bufferes and WAL, flexible error records handling, and ETL feature with filter functions.
By default, sqoop-export appends new rows to a table; each input record is transformed into an INSERT statement that adds a row to the target database table. If your table has constraints (e.g., a primary key column whose values must be unique) and already contains data, you must take care to avoid inserting records that violate these constraints. The export process will fail if an INSERT statement fails. This mode is primarily intended for exporting records to a new, empty table intended to receive these results.
If you have used sqoop incremental mode then there many be some duplicate records on HDFS ,before running export to postgres , collect all unique records based on max(date or timestamp column) in one table and then do export .
I think it has to work

How to import data from PostgreSQL to Hadoop?

I'm just a beginner in Hadoop and one of my colleges asked me for help in migrating some of PostgreSQL tables to Hadoop. Since I don't have much experience with PostgreSQL (I know databases though), I am not sure what would be the best way for this migration to happen. One of my ideas was to export the tables as gson data and then to process them from the Hadoop, as in this example: http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform. Are there better ways to import data (tables & databases) from PostgreSQL to Hadoop?
Sqoop (http://sqoop.apache.org/) is a tool precisely made for this. Go through the documentation, sqoop provides the best and the easiest way to transfer your data.
Use the below command. It is working for me.
sqoop import --driver=org.postgresql.Driver --connect jdbc:postgresql://localhost/your_db --username you_user --password your_password --table employees --target-dir /sqoop_data -m 1