ADF Lookup/Join Returning Longest String - azure-data-factory

I am running into an odd issue where ADF is only returning matches for the longest string in a column when joining or looking up.
Example: Left table has multiple values of "len" and "length". Right table has ID 1 for "len" and 2" for "length". However, no matter how I set up the join or lookup data flow activity, it will only return ID 2 for "length" and NULL for ID 1 for "len".
If all values are of the same string length (e.g. "len","pen","abc", etc.) it will find a match for all records.
Any ideas?

Well I'm not sure if this qualifies as "user error", but I used dummy files to test whether it was ADF causing the issue or my data and I was able to successfully lookup all data with the dummy files.
This prompted me to troubleshoot why ADF wouldn't match against my SQL tables.
I ended up modifying the source lookup table by trimming the string column (even though the table does not have spaces or tabs in the column) and ADF found all matches.
A bit underwhelming, but putting this here in case anyone else runs across the same issue.

Related

Only saving files without null value on Nifi

an absolute newbie here trying out Nifi and postgresql on docker compose.
I have a sample CSV file with 4 columns.
I want to split this CSV file into two
based on whether if it contains a row with null value or not.
Grade ,BreedNm ,Gender ,Price
C++ ,beef_cattle ,Female ,10094
C++ ,milk_cow ,Female ,null
null ,beef_cattle ,Male ,12704
B++ ,milk_cow ,Female ,16942
for example, above table should be split into two tables each containing row 1,4 and 2,3
and save each of them into a Postgresql table.
Below is what I have tried so far.
I was trying to
split flowfile into 2 and only save rows without null value on left side and with null values on right side.
Write each of them into a table each named 'valid' and 'invalid'
but I do not know how to split the csv file and save them as a psql table through Nifi.
Can anyone help?
What you could do is use a RouteOnContent with the "Content Must Contain Match" factor, with the match being null. Therefore, anything that matches null would be routed that way, and anything not matching null would be routed a different way. Not sure if it's possible the way you're doing it, but that is 1 possibility. The match could be something like (.*?)null
I used QueryRecord processor with two SQL statements each sorting out the rows with null value and the other without the null value and it worked as intended!

Power Query - Appending two tables but the other table might be empty depending on the situation - throws an error in that case

I am working on a solution that involves merging two queries in Power Query to retrieve a single data table back to Excel. The first query is always populated but the other query comes from an ERP and might be empty (empty table) from time to time.
Appending the two queries involves making the header names the same in the two queries before the appending takes place. As the second query sometimes results in an empty table, the error arises in the steps when Power Query is modifying the header names in the second table (it cannot modify the header names as there are no headers).
"Error message: Expression.Error: The column 'PartMtl_Company' of the table wasn't found.
Details: PartMtl_Company" where the PartMtl_Company is the leftmost column in my table.
I am kind of thinking that I would need to evaluate whether the second table is empty and skip the renaming steps if that is the case. I assume merging the populated first table with an empty table would cause no problem and would only result in the first table. I have tried to look around for a suitable M-code but have not come across such.
I'm thinking you might be able to use Table.RowCount to solve this. Something along the lines of:
= if Table.RowCount(Table2) > 0 then...
You would modify the headers only if there is data in the second table. Same goes for the appending of the tables: you would only append if there is data in the second table, since you won't have renamed any headers otherwise.
Thank you Marc! That did the trick.
In the end, I wrote some in the lines of
= if Table.RowCount(Table2) > 0 then... (code that works on a non-empty table) ...else Table2
, which returns the empty table if it is empty to begin with. Appending the second table into the first table did not throw an error but returned only the first table like planned.

Trying different functions

in the first function, I'm making the job column lowercase and then searching through but it's not finding any data. Why? Thanks. Just FYI since you don't have the database, all records in the JOB column are uppercase (that's why isn't returning anything), but that's also why I'm making it lowercase first.
In the second function, I'm trying to concat only ename with specific criteria --anything that has an r in the ENAME column (there are multiple records with the r in it), but isn't working (no data found), why? How do I get it done? Thanks.
SELECT LOWER(JOB) FROM EMP
WHERE JOB = LOWER('MANAGER');
SELECT CONCAT('My name is ',ename)
FROM EMP
WHERE ENAME LIKE '%r%';
I tested both of your SQL statements and they work fine for me. Are you sure the records are in db? Are you sure the names of the rows are correct?
EDIT : OK, so the name of column is in lower case but in your WHERE its in uppercase. Thats all :)

Add contents of one column into another without overwriting

I can copy the contents of one column to another using the sql UPDATE easily. But I need to do it without deleting the content already there, so in essence I want to append a column to another without overwriting the other's original content.
I have a column called notes then for some unknown reason after several months I added another column called product_notes and after 2 days realised that I have two sets of notes I urgently need to merge.
Usually when making a note we just add to any note already there with a form. I need to put these two columns like that, keeping any note in the first column eg
Column notes = Out of stock Pete 040618--- ordered 200 units Jade
050618 --- 200 units received Lila 080618
and
Column product_notes = 5 units left Dave 120618 --- unit 10724 unacceptable quality noted in list Dave 130618
I need to put them together with our spacer of --- without losing the first column's content so the result needs to be like this for my test case:
Column notes = Out of stock Pete 040618--- ordered 200 units Jade
050618 --- 200 units received Lila 080618 --- 5 units left Dave 120618 --- unit 10724 unacceptable quality noted in list Dave 130618
It's simple -
update table1 set notes = notes || '---' || product_notes;
The solution provided by #MaheshHViraktamath is fine, but the problem with simple string concatenation is that if any of the items being concatenated are NULL, the whole result becomes NULL.
Another potential issue is if either field is empty. In that case you might get a result of field a--- or ---field b.
To guard against the first scenario (without putting checks in the WHERE clause) you can use CONCAT_WS like so: CONCAT_WS('---', notes, product_notes). This will combine the two (or however many you put in there) fields with the first parameter, i.e. '---'. If either of those two fields are NULL, the separator won't be used, so you won't get a result with the separator prepended or appended.
There are two issues with the above: if both fields are NULL, the result isn't NULL but an empty string. To handle this case just put it in a NULLIF: NULLIF(CONCAT_WS('---', notes, product_notes), '') so that NULL is returned if both fields are NULL.
The other issue is if either field is empty, the separator will still be used. To guard against this scenario (and only you will know whether it's a scenario worth guarding against, or if this is even desired, based on your data), put each field in a NULLIF as well: NULLIF(CONCAT_WS('---', NULLIF(notes, ''), NULLIF(product_notes, '')), '')
As a result you get: UPDATE your_table SET notes = NULLIF(CONCAT_WS('---', NULLIF(notes, ''), NULLIF(product_notes, '')), '');

Getting Redshift error 1214 during copy

I have the following table in redshift:
Column | Type
id integer
value varchar(255)
I'm trying to copy in (using the datapipeline's RedshiftCopyActivity), and the data has the line 1,maybe as the entry trying to be added, but I get back the error 1214:Delimiter not found, and the raw_field_data value is maybe. Is there something I'm missing in the copy parameters?
The entire csv is three lines that goes:
1,maybe
2,no
3,yes
You may want to take a look at the similar question Redshift COPY command delimiter not found.
Make sure your RedshiftCopyActivity configuration includes FORMAT AS CSV from https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-csv.
Be sure your input data has your configured delimiter between every field, even in the case of nulls.
Be sure you do not have any trailing blank lines.
You can run the following SQL (from the linked question) to see more specific details of what row is causing the problem.
SELECT le.starttime,
d.query,
d.line_number,
d.colname,
d.value,
le.raw_line,
le.err_reason
FROM stl_loaderror_detail d,
JOIN stl_load_errors le
ON d.query = le.query
ORDER BY le.starttime DESC;