Only saving files without null value on Nifi - postgresql

an absolute newbie here trying out Nifi and postgresql on docker compose.
I have a sample CSV file with 4 columns.
I want to split this CSV file into two
based on whether if it contains a row with null value or not.
Grade ,BreedNm ,Gender ,Price
C++ ,beef_cattle ,Female ,10094
C++ ,milk_cow ,Female ,null
null ,beef_cattle ,Male ,12704
B++ ,milk_cow ,Female ,16942
for example, above table should be split into two tables each containing row 1,4 and 2,3
and save each of them into a Postgresql table.
Below is what I have tried so far.
I was trying to
split flowfile into 2 and only save rows without null value on left side and with null values on right side.
Write each of them into a table each named 'valid' and 'invalid'
but I do not know how to split the csv file and save them as a psql table through Nifi.
Can anyone help?

What you could do is use a RouteOnContent with the "Content Must Contain Match" factor, with the match being null. Therefore, anything that matches null would be routed that way, and anything not matching null would be routed a different way. Not sure if it's possible the way you're doing it, but that is 1 possibility. The match could be something like (.*?)null

I used QueryRecord processor with two SQL statements each sorting out the rows with null value and the other without the null value and it worked as intended!

Related

ADF Lookup/Join Returning Longest String

I am running into an odd issue where ADF is only returning matches for the longest string in a column when joining or looking up.
Example: Left table has multiple values of "len" and "length". Right table has ID 1 for "len" and 2" for "length". However, no matter how I set up the join or lookup data flow activity, it will only return ID 2 for "length" and NULL for ID 1 for "len".
If all values are of the same string length (e.g. "len","pen","abc", etc.) it will find a match for all records.
Any ideas?
Well I'm not sure if this qualifies as "user error", but I used dummy files to test whether it was ADF causing the issue or my data and I was able to successfully lookup all data with the dummy files.
This prompted me to troubleshoot why ADF wouldn't match against my SQL tables.
I ended up modifying the source lookup table by trimming the string column (even though the table does not have spaces or tabs in the column) and ADF found all matches.
A bit underwhelming, but putting this here in case anyone else runs across the same issue.

Postgres COALESCE inside nullif for 2 different fields

I am new to SQL and POSTGRES and had a quick question. Right now I have 2 different tables one with car info and one with partial car info and I would like to sort on car.vin OR partial_car.vin depending if either exists and sending all nulls/empty strings to the end of the sort. Currently my ORDER BY statement looks like:
ORDER BY nullif(coalesce(car.vin, partial_car.partial_vin), '') asc nulls last limit 50 offset 0
My expectation for this is that coalesce will take the first non null value and use that for sorting or it will return null and send that to the end. My results so far I haven't been able to make sense of. There are null values being placed in between actual values etc.. If I make this change coalesce(car.vin, '') again I see it work properly. Anyone have an ideas as to why this is the behavior? Let me know if you need something more from me.
It was human error on my end. The object being sent to client was not being populated properly with partial data. So sorting was correct but was seeing blanks due to those values not being present.

Getting Redshift error 1214 during copy

I have the following table in redshift:
Column | Type
id integer
value varchar(255)
I'm trying to copy in (using the datapipeline's RedshiftCopyActivity), and the data has the line 1,maybe as the entry trying to be added, but I get back the error 1214:Delimiter not found, and the raw_field_data value is maybe. Is there something I'm missing in the copy parameters?
The entire csv is three lines that goes:
1,maybe
2,no
3,yes
You may want to take a look at the similar question Redshift COPY command delimiter not found.
Make sure your RedshiftCopyActivity configuration includes FORMAT AS CSV from https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-csv.
Be sure your input data has your configured delimiter between every field, even in the case of nulls.
Be sure you do not have any trailing blank lines.
You can run the following SQL (from the linked question) to see more specific details of what row is causing the problem.
SELECT le.starttime,
d.query,
d.line_number,
d.colname,
d.value,
le.raw_line,
le.err_reason
FROM stl_loaderror_detail d,
JOIN stl_load_errors le
ON d.query = le.query
ORDER BY le.starttime DESC;

SSIS, load file into table with '' (nothing) to get NULL

I have file with 150 columns and most of them are nothing, represented like 2 consecutive delimiters (100,,,,,,200) in this case comma is delimiter so Column0=100, Column1='', Column2='', etc...
What is the fastest (afraid to say mass) way to put this into target table, (and target table has custom DDL can not change, and I have to put in correct way have NULL for date, not 1900-1-1).
I have series of this files, I can go with SSIS and put isNothing ? NULL : Column1 for each of 150 columns, but maybe there is better way?
I tried to load this file into NEW table and got intermediate table with same Nothing, so now I can compose sql with CASE'ing too.
THanks for you help and sharing your knowledge.
M
Did you tried to check the "Retain null value" box in the flat file source ?
Hope this Help,

db2 import csv with null date

I run this
db2 "IMPORT FROM C:\my.csv OF DEL MODIFIED BY COLDEL, LOBSINFILE DATEFORMAT=\"D/MM/YYYY\" SKIPCOUNT 1 REPLACE INTO scratch.table_name"
However some of my rows have a empty date field so I get this error
SQL3191N which begins with """" does not match the user specified DATEFORMAT, TIMEFORMAT, or TIMESTAMPFORMAT. The row will be rejected.
My CSV file looks like this
"XX","25/10/1985"
"YY",""
"ZZ","25/10/1985"
I realise if I insert charater instead of a blank string I could use NULL INDICATORS paramater.
However I do not have access to change the CSV file. Is there a way to ignore import a blank string as a null?
This is an error in your input file. DB2 differentiates between a NULL and a zero-length string. If you need to have NULL dates, a NULL would have no quotes at all, like:
"AA",
If you can't change the format of the input file, you have 2 options:
Insert your data into a staging table (changing the DATE column to a char) and then using SQL to populate the ultimate target table
Write a program to parse ("fix") the input file and then import the resulting fixed data. You can often do this without having to write the entire file out to disk – your program could write to a named pipe, and the DB2 IMPORT (and LOAD) utility is capable of reading from named pipes.
I'm not aware of anything. Yes, ideally that date field should be null.
Probably the best thing to do would be load the data into a scratch/temp table where that isn't a date column - just leave it as character data (it looks like you're already using a scratch table anyways). It should be trivial after that to use a CASE statement to transform the information into a null date if the value is blank, when doing your INSERT to the real table.