Binary avro type mapping to Postgres? - postgresql

I have the following avro definition for my Nifi flow, where i'm reading a from a BLOB database column. I'm mapping the 'xxPZPVSTREAM' column as a 'bytes' type in my avro definition
{
"namespace":"a.b.c",
"name":"pc_history",
"type":"record",
"fields": [
{"name":"COMMITDATETIME","type":["null",{"type":"long","logicalType":"timestamp-millis"}]},
....
{"name":"xxPZPVSTREAM","type":["bytes","null"]},
{"name":"xxx","type":["string","null"]}
]
}
When i attempt to write the mapped data to a Postgres database i get this error
org.postgresql.util.PSQLException: Can’t infer the SQL type to use for an instance of [Ljava.lang.Byte;. Use setObject() woth an explicit Types values to specify the type to use.
Can i add extra meta information to the avro definition to allow the Nifi processor to correctly map this binary column?

You didn't say which processor you're using, but this should be supported by PutDatabaseRecord. That processor is what you'd want to use for this as it should map byte array fields to a blob. If it doesn't, then please join the nifi-dev mailing list and let us know.

Related

Azure Data Factory DataFlow Error: Key partitioning does not allow computed columns

We have a generic dataflow that works for many tables, the schema is detected at runtime.
We are trying to add a Partition Column for the Ingestion or Sink portion of the delta.
We are getting error:
Azure Data Factory DataFlow Error: Key partitioning does not allow computed columns
Job failed due to reason: at Source 'Ingestion'(Line 7/Col 0): Key partitioning does not allow computed columns
Can we pass the partition column as a parameter to a generic dataflow?
Can we pass the partition column as a parameter to a generic dataflow?
I tried your scenario and got similar error.
There is a limitation of key partition method is we cannot apply any calculation to the partition column while declaring it. Instead, this must be created in advanced, either using derived column or read in from source.
To resolve this, you can try following steps -
First, I created a pipeline parameter with datatype string and gave column name as value.
Click on Dataflow >> Go to Parameter >> In value of parameter select Pipeline expression >> and pass the above created parameter.
OUTPUT:
It is taking it as partition key column and partitioning data accordingly.
Reference : How To Use Data Flow Partitions To Optimize Spark Performance In Data Factor

how i will map data in data factory source sqlwh destination blob

my source is SQLDB
SINK :BLOB
SQL table have columns
in target file which i have creating blob initially no Header right. so customer given some Predefined Names so that data from sql column sholud be mapped those fileds.
in copy activity at mapping i need to map WITH proper data type and name which customer given
defaut its coming but i need ti map as i stated
HoW will i resolve it can some one help me
You can simply edit the sink header names, since its a TSV anyways
For addressing DataType mapping,
See, Data type mapping
Currently such data type conversion is supported when copying between
tabular data. Hierarchical sources/sinks are not supported, which
means there is no system-defined data type conversion between source
and sink interim types.

Gorm Jsonb type stored as bytea

I'm using a locally hosted postgres DB to test queries to a postgres DB in production. The production database has an info field of type jsonb; and I'm trying to mimic this schema locally when using gorm's AutoMigrate. The model I've defined is below:
import "github.com/jinzhu/gorm/dialects/postgres"
type Event struct {
...
Info postgres.Jsonb
...
}
But when I query JSON attributes, e.g. stmt.Where("info->>'attr' = value"), I get the following error:
...
Message:"operator does not exist: bytea ->> unknown", Detail:"", Hint:"No operator matches the given name and argument type(s). You might need to add explicit type casts.",
...
This query works however in the production environment. It seems that the Info field is being stored as bytea instead of jsonb. I'm aware that I can do stmt.Where("encode(info, "escape")::jsonb->>'attr' = value"), but I'd prefer to mimic the production environment more closely (if possible) than change the query to support these unit tests.
I've tried using type tags in the model (e.g. gorm:"type=jsonb") as well as defining my own JSON type implmementing the valuer, scanner, and GormDataTypeInterface as suggested here. None of these approaches have automigrated the type as jsonb.
Is there any way to ensure AutoMigrate creates a table with type jsonb? Thanks!
I was facing the same problem, type JsonB is automigrated to bytea. I solved it by adding the tag gorm:"type:jsonb". It's also mentioned in your question, but you're using gorm:"type=jsonb", which is not correct.

Issue with UUID datatype when laoding postgres table using Apache NiFi

Database
Postgres 9.6
Contains several tables that have a UUID column (containing the ID of each record)
NiFi
Latest release (1.7.1)
Uses Avro 1.8.1 (as far as I know)
Problem description
When scheduling the tables using the ExecuteSQL processor, the following error message occurs:
ExecuteSQL[id=09033e32-e840-1aed-3062-6e8cbc5551ba] ExecuteSQL[id=09033e32-e840-1aed-3062-6e8cbc5551ba] failed to process session due to createSchema: Unknown SQL type 1111 / uuid (table: country, column: id) cannot be converted to Avro type; Processor Administratively Yielded for 1 sec: java.lang.IllegalArgumentException: createSchema: Unknown SQL type 1111 / uuid (table: country, column: id) cannot be converted to Avro type
Note that the flowfiles aren't removed from the incoming queue, nor sent to the 'failure' relationship, resulting in an endless loop of failing attempts.
Attempts to fix issue
I tried enabling the Use Avro Logical Types property of the ExecuteSQL processor, but the same error occurred.
Possible but not preferred solutions
I currently perform a SELECT * from each table. A possible solution (I think) would be to specify each column, and have the query cast the uuid to a string. Though this could work, I'd strongly prefer not having to list every column separately.
A last note
I did find this Jira ticket: https://issues.apache.org/jira/browse/AVRO-1962
However, I'm not sure how to interpret this. Is it implemented or not? Should it work or not?
I believe UUID is not a standard JDBC type and is specific to Postgres.
The JDBC types class shows that SQL type 1111 is "OTHER":
/**
* The constant in the Java programming language that indicates
* that the SQL type is database-specific and
* gets mapped to a Java object that can be accessed via
* the methods <code>getObject</code> and <code>setObject</code>.
*/
public final static int OTHER = 1111;
So I'm not sure how NiFi could know what to do here because it could be anything depending on the type of DB.
Have you tried creating a view where you define the column as ::text?
SELECT
"v"."UUID_COLUMN"::text AS UUID_COLUMN
FROM
...

Datatype conversion of Parquet using spark sql - Dynamically without specify a column name explicityly

I am looking for a way to handle the data type conversion dynamically. SparkDataframes , i am loading the data into a Dataframe using a hive SQL and storing into dataframe and then writing to a parquet file. Hive is unable to read some of the data types and i wanted to convert the decimal datatypes to Double . Instead of specifying a each column name separately Is there any way we can dynamically handle the datatype. Lets say in my dataframe i have 50 columns out of 8 are decimals and need to convert all 8 of them to Double datatype Without specify a column name. can we do that directly?
There is no direct way to do this convert data type here are some ways,
Either you have to cast those columns in hive query .
or
Create /user case class of data types you required and populate data and use it to generate parquet.
or
you can read data type from hive query meta and use dynamic code to get case one or case two to get. achieved
There are two options:
1. Use the schema from the dataframe and dynamically generate query statement
2. Use the create table...select * option with spark sql
This is already answered and this post has details, with code.