String concateation in AWS Glue Athena? - string-concatenation

I need to find a string concatenation function in AWS Glue inside GROUP BY, so far tried
SELECT CONCAT('wo', 'rd');
for which I need concatenation inside GROUP BY, written in meta language:
SELECT CONCAT(field_word WITH SEPARATOR '>' ORDER BY Order1) FROM myData GROUP BY ID;
SELECT STRING_AGG(field_word WITH SEPARATOR '>' ORDER BY Order1) FROM myData GROUP BY ID;
but the earlier do not work.
How can I concatenate strings inside AWS Glue athena?

You can combine array_agg with array_join:
presto> SELECT array_join(array_agg(e), ',', 'NULL')
-> FROM (VALUES 'wo', 'rd') t(e);
_col0
-------
wo,rd
(1 row)

Related

Redshift how to split a stringified array into separate parts

Say I have a varchar column let's say religions that looks like this: ["Christianity", "Buddhism", "Judaism"] (yes it has a bracket in the string) and I want the string (not array) split into multiple rows like "Christianity", "Buddhism", "Judaism" so it can be used in a WHERE clause.
Eventually I want to use the results of the query in a where clause like this:
SELECT ...
FROM religions
WHERE name in
(
<this subquery>
)
How can one do this?
You can use the function JSON_PARSE to convert the varchar string into an array. Then you can use the strategy described in Convert varchar array to rows in redshift - Stack Overflow to convert the array to separate rows.
You can do the following.
Create a temporary table with sequence of numbers
Using the sequence and split_part function available in redshift, you can split the values based on the numbers generated in the temporary table by doing a cross join.
To replace the double quote and square brackets, you can use the regexp_replace function in Redshift.
create temp table seq as
with recursive numbers(NUMBER) as
(
select 1 UNION ALL
select NUMBER + 1 from numbers where NUMBER < 28
)
select * from numbers;
select regexp_replace(split_part(val,',',seq.number),'[]["]','') as value
from
(select '["christianity","Buddhism","Judaism"]' as val) -- You can select the actual column from the table here.
cross join
seq
where seq.number <= regexp_count(val,'[,]')+1;

postgresql : regexp_substr - get sub string between occurrence of delimiters

I have these strings:
[{"Name":"id","Value":"Window_Ex_kebklipecbcegiocpa_widget_open"
[{"Name":"id","Value":"Window_Ex_kebklipecbcegiocpa_widget_close"
[{"Name":"id","Value":"Window_Ex_kebklipecbcegiocpa_widget_mid_value"
and I'm trying to extract only the parts after the third _, until the end of the string (which ends always with ")
widget_open
widget_close
widget_mid_value
I'm using postgresql, and wanted to use the regexp_substr syntax, in order to extract it.
Thanks!
regexp_replace(data::text,'^([^_]+_){3}','')
You can try
select regexp_replace(data::text,'^([^_]+_){3}','')
from (
select 'one_two_three_four s'::text as data
union select 'a_bb_ccc_dddd_eeee_ffff'
) data

Concatenate a column from multiple rows into a single formatted string

I have rows like so:
roll_no
---------
0690543
0005331
0760745
0005271
And I want string like this :
"0690543.pdf" "0005331.pdf" "0760745.pdf" "0005271.pdf"
I have tried concat but unable to do so
You can use an aggregate function like string_agg, after first mangling the quotes and the .pdf extension to your column data. Use a space as your delimiter:
SELECT string_agg('"'||roll_no||'.pdf "', ' ') from myTable
SqlFiddle here

How to group by more than 64 keys in BigQuery

Using Google-BigQuery, I created a query with almost 100 fields, grouping by 96 of them:
SELECT
field1,field2,(...),MAX(field100) as max100
FROM dataset.table1
GROUP BY field1,field2,(...),field96
and I got this error
Error: Maximum number of keys in GROUP BY clause is 64, query has 96 GROUP BY keys.
so, there is no chance to group by more than 64 fields using google-bigquery. Any suggestion?
If some of these fields are strings, and there is a character which cannot appear in them (say, ':'), then you could concatenate them together and group by concatenation, i.e.
SELECT CONCAT(field1, ':', field2, ':', field3) as composite_field, ...
FROM dataset.table
GROUP BY 1, 2, ..., 64
In order to recover the original fields later, you could use
SELECT
regexp_extract(composite_field, r'([^:]*):') field1,
regexp_extract(composite_field, r'[^:]*:([^:]*)') field2,
regexp_extract(composite_field, r'[^:]*:[^:]*:(.*)') field3,
...
FROM (...)
It seems that is an internal limit, not documented.
Another solution that I have developed is similar to the Mosha's solution.
You can add an extra column called, for example, hashref. That new column is computed by all the columns that you would like to group by, separated with a pipe for example and applying md5 or sha256 to the line.
Then you can group by with the new hashref and for the other columns you just apply the min() function, that is also an aggregator.
line = name + "|" + surname + "|" + age
hashref = md5(line)
... and then ...
SELECT hashref, min(name), min(surname)
FROM mytable
GROUP BY hashref

Postgres query: array_to_string with empty values

I am trying to combine rows and concatenate two columns (name, vorname) in a Postgres query.
This works good like this:
SELECT nummer,
array_to_string(array_agg(name|| ', ' ||vorname), '\n') as name
FROM (
SELECT DISTINCT
nummer, name, vorname
FROM myTable
) AS m
GROUP BY nummer
ORDER BY nummer;
Unfortunately, if "vorname" is empty I get no results although name has a value.
Is it possible get this working:
array_to_string(array_agg(name|| ', ' ||vorname), '\n') as name
also if one column is empty?
Use coalesce to convert NULL values to something that you can concatenate:
array_to_string(array_agg(name|| ', ' ||coalesce(vorname, '<missing>')), '\n')
Also, you can concatenate strings directly without collecting them to an array by using the string_agg function.
If you have 9.1, then you can use third parameter for array_to_string - null string
array_to_string(array_agg(name), ',', '<missing>') from bbb