Talend split fields from one to double

Talend split fields from one to double - talend

I have mysql table with text field "email", which can contain "user#example.com" and "user1#example.com;user2#example.com;user3#example.com".
| Name | Email |
| user | user#example.com |
| user1 | user1#example.com;user2#example.com;user3#example.com |
How can i do output with Talend such this:
| Name | Email |
| user | user#example.com |
| user1 | user1#example.com |
| user1 | user2#example.com |
| user1 | user3#example.com

The tNormalize component does exactly this. You can provide a character for separation, in your case ; and get rows as result afterwards.
EDIT
AxelH pointed out that it is possible as well to use a String for separation, this is not a Character.

Related

Fill empty rows of set with non empty value

Note: I've already gone over related questions like following that don't address my query
SQL: how to pick one row for each set of rows with duplicate value in one column?
Fill missing values with first non-null following value in Redshift
I have a sparse, unclean dataset like this
| id | operation | title | channel_type | mode |
|-----|-----------|----------|--------------|------|
| abc | Start | | | |
| abc | Start | recovery | | Link |
| abc | Start | recovery | SMS | |
| abc | Set | | Email | |
| abc | Verify | | Email | |
| pqr | Start | | | OTP |
| pqr | Verfiy | sign_in | Push | |
| pqr | Verify | | | |
| xyz | Start | sign_up | | Link |
and I need to fill up empty rows of each id with non-empty data available from other rows
| id | operation | title | channel_type | mode |
|-----|-----------|----------|--------------|------|
| abc | Start | recovery | SMS | Link |
| abc | Start | recovery | SMS | Link |
| abc | Start | recovery | SMS | Link |
| abc | Set | recovery | Email | Link |
| abc | Verify | recovery | Email | Link |
| pqr | Start | sign_in | Push | OTP |
| pqr | Verfiy | sign_in | Push | OTP |
| pqr | Verify | sign_in | Push | OTP |
| xyz | Start | sign_up | | Link |
notes
some ids can have a certain field as empty in all rows
and while most ids will have same non-empty values for each field, edge cases could have different values. For such groups, filling up any non-empty value in all rows is acceptable. [this is too rare in my dataset and can be ignored]
another extra bit of pattern is that certain fields are mostly only present only against rows of certain operations, for e.g. mode is only present against operation='Start' rows
I've tried grouping rows by id while performing listagg over title, channel_type and mode columns, followed by coalesce, something along the lines of this:
WITH my_data AS (
SELECT
id,
operation,
title,
channel_type,
mode
FROM
my_db.my_table
),
list_aggregated_data AS (
SELECT
id,
listagg(title) AS titles,
listagg(channel_type) AS channel_types,
listagg(mode) AS modes
FROM
my_data
GROUP BY
id
),
coalesced_data AS (
SELECT DISTINCT
id,
coalesce(titles) AS title,
coalesce(channel_types) AS channel_type,
coalesce(modes) AS mode
FROM
list_aggregated_data
),
joined_data AS (
SELECT
md.id,
md.operation,
cd.title,
cd.channel_type,
cd.mode
FROM
my_data AS md
LEFT JOIN
coalesced_data AS cd ON cd.id = md.id
)
SELECT
*
FROM
joined_data
ORDER BY
id,
operation
But for some reason this is resulting in concatenation of values (presumably from coalesce operation), where I get
| id | operation | title | channel_type | mode |
|-----|-----------|------------------|--------------|------|
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Set | recoveryrecovery | Email | Link |
| abc | Verify | recoveryrecovery | Email | Link |
| pqr | Start | sign_in | Push | OTP |
| pqr | Verfiy | sign_in | Push | OTP |
| pqr | Verify | sign_in | Push | OTP |
| xyz | Start | sign_up | | Link |
What's the correct way to approach this problem?

I'd start with the first_value() window function with the ignore nulls option. You will partition by the first 2 columns and will need to work out the edge cases with some data massaging, likely in the order by clause of the window function.

Weird ghost records in PostgreSQL - what are they?

I have a very weird issue on our postgresql DB. I have a table called "statement" which has some strange records in it.
Using the command line console psql, I query select * from customer.statement where type in ('QUOTE'); and get 12 rows back. 7 rows look normal, 5 are missing all data except a single column which is a nullable column but seems to hold real values entered by the user. psql tells me that 7 rows were returned even though there are 12. Most of the other columns are not nullable. The weird records look like this:
select * from customer.statement where type = 'QUOTE';
id | issuer_id | recipient_id | recipient_name | recipient_reference | source_statement_id | catalogue_id | reference | issue_date | due_date | description | total | currency | type | tax_level | rounding_mode | status | recall_requested | time_created | time_updated | time_paid
------------------+------------------+------------------+----------------+---------------------+---------------------+--------------+-----------+------------+------------+------------------------------------------------------------------+-----------+----------+-------+-----------+---------------+-----------+------------------+----------------------------+----------------------------+-----------
... 7 valid records removed ...
| | | | | | | | | | Build bulkheads and sheet with plasterboard. +| | | | | | | | | |
| | | | | | | | | | Patch all patches. +| | | | | | | | | |
| | | | | | | | | | Set and sand all joints ready for painting. +| | | | | | | | | |
| | | | | | | | | | Use wall angle on bulkhead in main bedroom. +| | | | | | | | | |
| | | | | | | | | | Build nib and sheet and set in entrance | | | | | | | | | |
(7 rows)
If I run the same query using pgAdmin, I don't see those weird records.
Anyone know what these are?

The plus sign before the separator (+|) indicates a newline character in the displayed string value in psql. So no additional rows, just the same row continued with line breaks. The final line of output in your quote confirms as much: (7 rows).
In pgAdmin you don't see the extra lines as long as you don't increase the height of the field (or copy / paste the content somewhere), but there are multiple lines as well.
Try in psql and in pgAdmin:
test=# SELECT E'This\nis\na\ntest.' AS multi_line, 'foo' AS single_line;
multi_line | single_line
--------------+-------------
This +| foo
is +|
a +|
test. |
(1 row)
The manual about psql:
linestyle
Sets the border line drawing style to one of ascii, old-ascii, or unicode. [...] The default setting is ascii. [...]
ascii style uses plain ASCII characters. Newlines in data are shown using a + symbol in the right-hand margin. [...]

PostgreSQL Order by first 3 letter of field string

I have a column table like this in my table PostgreSQL database
| Product|
|--------|
| A10A |
| A13 |
| A12B |
and i want to order it by the first like this:
| Product|
|--------|
| A10A |
| A12B |
| A13 |
If i use the normal order by im not getting the exact same result that i want

MondoDB, export to CSV with a "join" on a value

So I know the command to export via Mongo shells is
mongoexport --host localhost --db dbname --collection name --csv > test.csv
but I also need to do a "join" (in quotes because I know Mongo doesn't actually do joins so to speak". I haven't found documentation on doing this in mongo shell.
I have collections called users and another called details.
I want to export: users (name,email) and details (city,country) where users (place [object ID]) = details (_id [object ID]).
Tables look like this:
users
+----+------+---------------+-----------------------+
| id | name | email | place |
+----+------+---------------+-----------------------+
| 1 | bob | bob#bob.com | ObjectId("123456abc") |
| 2 | mark | mark#mark.com | ObjectId("654321abc") |
| 3 | dave | dave#dave.com | ObjectId("987655abc") |
+----+------+---------------+-----------------------+
details
+----+-------+---------------+-----------------------+
| id | city | country | _id |
+----+-------+---------------+-----------------------+
| 1 | Perth | Australia | ObjectId("123456abc") |
| 2 | Tokyo | Japan | ObjectId("654321abc") |
| 3 | NY | United States | ObjectId("987655abc") |
+----+-------+---------------+-----------------------+
And the result I'm trying to achieve is this:
+----+------+---------------+-------+---------------+
| id | name | email | city | country |
+----+------+---------------+-------+---------------+
| 1 | bob | bob#bob.com | Perth | Australia |
| 2 | mark | mark#mark.com | Tokyo | Japan |
| 3 | dave | dave#dave.com | NY | United States |
+----+------+---------------+-------+---------------+

Using named fields to determine ranges with vsum in Emacs org-table-mode, impossible?

I have been trying to simplify a semi-complex table that I have by adding named fields, without a problem, until I get to the vsum operator. I had the formula set to $M=vsum($3..#-4) which works, however I am continuously having to add and remove items from those fields, which changes the column numbering. This results in me having to change the field specifications of the vsum range after every update/change. I thus tried naming the top field and bottom fields with the thought of supplying the named variables to vsum, giving me a table similar to the following:
| / | <> | <> |
|---+--------+---------|
| | Title1 | Title 2 |
|---+--------+---------|
| _ | | START |
| | name | 1000 |
| | name | 3456 |
| | name | 123 |
| ^ | | END |
|---+--------+---------|
| _ | | MT |
| # | Total | #ERROR |
| # | | |
|---+--------+---------|
#+TBLFM: $MT=vsum($START..$END)
This is the debug formula output from the above table:
Substitution history of formula
Orig: vsum($START..$END)
$xyz-> vsum((1000)..(123))
#r$c-> vsum((1000)..(123))
$1-> vsum((1000)..(123))
-----------^
Error: Expected `)'
I have tried embrasing the named field variables in parenthesis, and several other ways but have thus far not been able to get this to work. I am hoping I am just missing something and being blind, but perhaps this is not possible to do?
I have also tried the sum-up function with no success as well. Thank you in advance for your assistance.

The following solution works by using #II and #III to refer to all entries between the second and third hline.
| / | <> | <> |
|---+--------+---------|
| | Title1 | Title 2 |
|---+--------+---------|
| | name | 1000 |
| | name | 3456 |
| | name | 123 |
|---+--------+---------|
| _ | | MT |
| # | Total | 4579 |
| # | | |
|---+--------+---------|
#+TBLFM: $MT=vsum(#II..#III)
Documentation: http://orgmode.org/manual/References.html#References

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Talend split fields from one to double - talend

The tNormalize component does exactly this. You can provide a character for separation, in your case ; and get rows as result afterwards. EDIT AxelH pointed out that it is possible as well to use a String for separation, this is not a Character.

Related

Fill empty rows of set with non empty value

Weird ghost records in PostgreSQL - what are they?

PostgreSQL Order by first 3 letter of field string

MondoDB, export to CSV with a "join" on a value

Using named fields to determine ranges with vsum in Emacs org-table-mode, impossible?

Categories

Resources