SPLIT_PART with a negative value [Postgres 9.5] - postgresql

I need to use the split_part function on that query:
CREATE TABLE client_group_by_group_test AS SELECT *, SPLIT_PART( groupe,
',', 1 ) AS group1, SPLIT_PART(SPLIT_PART(groupe,',',2),',',-1) AS
group2, SPLIT_PART(SPLIT_PART(groupe,'',3),'',-3) AS group3,
SPLIT_PART(groupe,'',-4) AS group4 FROM planification_client
but it gives me the following error:
ERROR: field position must be greater than zero
So, how can I deal with negative values here?
Can this kind reverse(split_part(reverse(col_A), '_'::text, 1)) of statement work? I'm referencing to that question.
EDIT: I'm completely stuck with this query.
More details: I have one column with "server name" and another with its different groups separated with coma.
server name| group |
-----------+------------------------------+
XPTERTBIEP9|GRNW_SPO_S_F_H, GRNW_SPO_S_I_J|
The output I need to get is if the server has multiple groups, they need to be in the different column like group1, group2...
server name| group |group1 |group 2
-----------+------------------------------+--------------+--------------
XPTERTBIEP9|GRNW_SPO_S_F_H, GRNW_SPO_S_I_J|GRNW_SPO_S_F_H|GRNW_SPO_S_I_J

If the negative number is supposed to indicated the offset from the end, a two step approach might be better:
CREATE TABLE client_group_by_group_test
AS
SELECT ...,
agroups[1] as group1,
agroups2[cardinality(agroups2) - 1] as groups2,
agroups3[cardinality(agroups3) - 3] as groups3,
agroups[cardinality(agroups) - 4] as group4
from (
select *,
string_to_array(groupe, ',') as agroups,
(string_to_array(string_to_array(groupe, ','), ',')[2]) as agroups2,
(string_to_array(string_to_array(groupe, ','), ',')[3]) as agroups3,
from planification_client
) t
Note that you need to list the desired columns in the outer most SELECT to exclude the intermediate "agroups" columns.

Related

Replacing null values by average of values grouped by concatenated categories in Teradata

Suppose that I have a lot of NULL values (missing values) in a column named 'score'. I want to replace them by a specific average not from all the values of the column 'score' but by groups that I built with a crosscategory from two concatenated categories:
This kind of query works for getting averages by groups:
SELECT
category1 || ' > ' || category2 AS crosscategory,
ROUND(CAST(AVG(score) AS FLOAT), 2) AS score_avg
FROM DatabaseName.TableName
GROUP BY crosscategory
ORDER BY score_avg;
This one works to replace NULL values by a constant:
SELECT
NVL(score, 0) AS score_without_missing_values
FROM DatabaseName.TableName
The problem that I cannot solve now is how to articulate the replacement of NULL values with a constant here the averages computed with the functions AVG and GROUP BY.
Thank you very much for your help!
Seems you want a Group Average:
SELECT
t.*,
coalesce(score, AVG(score) OVER (PARTITION BY category1, category2)) AS score_avg
FROM DatabaseName.TableName AS t
I removed the ROUND/CAST, because AVG returns FLOAT by default and ROUND in probably not needed (if you need it, you might better cast to a DECIMAL).

Need help in parsing column value based on value in other column

I have two columns, COL1 and COL2. COL1 has value like 'Birds sitting on $1 and enjoying' and COL2 has value like 'the.location_value[/tree,\building]'
I need to update third column COL3 with values like 'Birds sitting on /tree and enjoying'
i.e. $1 in 1st column is replaced with /tree
which is the 1st word from list of comma separated words with in square brackets [] in COL2 i.e. [/tree,\building]
I wanted to know the best suitable combination of string function in postgresql to use to achieve this.
You need to first extract the first element from the comma separated list, to do that, you can use split_part() but you first need to extract the actual list of values. This can be done using substring() with a regular expression:
substring(col2 from '\[(.*)\]')
will return /tree,\building
So the complete query would be:
select replace(col1, '$1', split_part(substring(col2 from '\[(.*)\]'), ',', 1))
from the_table;
Online example: http://rextester.com/CMFZMP1728
This one should work with any (int) number after $:
select t.*, c.col3
from t,
lateral (select string_agg(case
when o = 1 then s
else (string_to_array((select regexp_matches(t.col2, '\[(.*)\]'))[1], ','))[(select regexp_matches(s, '^\$(\d+)'))[1]::int] || substring(s from '^\$\d+(.*)')
end, '' order by o) col3
from regexp_split_to_table(t.col1, '(?=\$\d+)') with ordinality s(s, o)) c
http://rextester.com/OKZAG54145
Note:it is not the most efficient though. It splits col2's values (in the square brackets) each time for replacing $N.
Update: LATERAL and WITH ORDINALITY is not supported in older versions, but you could try a correlating subquery instead:
select t.*, (select array_to_string(array_agg(case
when s ~ E'^\\$(\\d+)'
then (string_to_array((select regexp_matches(t.col2, E'\\[(.*)\\]'))[1], ','))[(select regexp_matches(s, E'^\\$(\\d+)'))[1]::int] || substring(s from E'^\\$\\d+(.*)')
else s
end), '') col3
from regexp_split_to_table(t.col1, E'(?=\\$\\d+)') s) col3
from t

How to group by more than 64 keys in BigQuery

Using Google-BigQuery, I created a query with almost 100 fields, grouping by 96 of them:
SELECT
field1,field2,(...),MAX(field100) as max100
FROM dataset.table1
GROUP BY field1,field2,(...),field96
and I got this error
Error: Maximum number of keys in GROUP BY clause is 64, query has 96 GROUP BY keys.
so, there is no chance to group by more than 64 fields using google-bigquery. Any suggestion?
If some of these fields are strings, and there is a character which cannot appear in them (say, ':'), then you could concatenate them together and group by concatenation, i.e.
SELECT CONCAT(field1, ':', field2, ':', field3) as composite_field, ...
FROM dataset.table
GROUP BY 1, 2, ..., 64
In order to recover the original fields later, you could use
SELECT
regexp_extract(composite_field, r'([^:]*):') field1,
regexp_extract(composite_field, r'[^:]*:([^:]*)') field2,
regexp_extract(composite_field, r'[^:]*:[^:]*:(.*)') field3,
...
FROM (...)
It seems that is an internal limit, not documented.
Another solution that I have developed is similar to the Mosha's solution.
You can add an extra column called, for example, hashref. That new column is computed by all the columns that you would like to group by, separated with a pipe for example and applying md5 or sha256 to the line.
Then you can group by with the new hashref and for the other columns you just apply the min() function, that is also an aggregator.
line = name + "|" + surname + "|" + age
hashref = md5(line)
... and then ...
SELECT hashref, min(name), min(surname)
FROM mytable
GROUP BY hashref

Selecting count of values in multiple columns using two tables

I'm still new to tsql and trying to figure out how to build this query.
I have two tables. One called mirror which has an official list of all campuses and is used to populate a drop down list of campuses for users on a webform. They then have 5 choices they can select, which then populates another table with their request when they submit the form(Request). ie. CampusChoice1, CampusChoice2..etc.
I am trying to build a page to display the end results of all the collected data. After some reading I'm thinking I might need to use PIVOT to make this happen but I can't get my head to see the query.
I can make a rudimentary query for each choice1-5, but I kind of wanted them all together will nulls or zeros where some campuses were not chosen.
Something like
--Simple count on single col
SELECT CampusChoice1, COUNT(*) as '#'
FROM Request
Group By CampusChoice1
Or
--But this doesn't give the results I want, since it does not account for all the POSSIBLE choices.
SELECT CampusChoice1, COUNT() as '#',
CampusChoice2, COUNT() as '#',
CampusChoice3, COUNT() as '#',
CampusChoice4, COUNT() as '#',
CampusChoice5, COUNT(*) as '#'
FROM Operations.dbo.TransferRequest
Group By CampusChoice1, CampusChoice2, CampusChoice3, CampusChoice4, CampusChoice5
Any ideas how I could show this? Am I on the right track at least with the PIVOT table?
Not sure if I understood your question correctly, but assuming that you have this:
CampusChoice | Other data ...
------------------------------
CampusChoice1 | ...
CampusChoice2 | ...
CampusChoice1 | ...
Then for the example above with only 3 rows you want this end result:
CampusChoice1 | 2 | CampusChoice2 | 1 | CampusChoice3 | 0 | ...
The T-SQL to achieve this is:
select
'CampusChoice1',
sum( case when CampusChoice = 'CampusChoice1' then 1 else 0 end ) '#',
'CampusChoice2',
sum( case when CampusChoice = 'CampusChoice2' then 1 else 0 end ) '#',
'CampusChoice3',
sum( case when CampusChoice = 'CampusChoice3' then 1 else 0 end ) '#',
...
from
...
Use the sum combined with the case to sum 1's for each row for CampusChoice1 and 0's for each row not CampusChoice1, repeating this for each CampusChoiceN.

SQL basic full-text search

I have not worked much with TSQL or the full-text search feature of SQL Server so bear with me.
I have a table nvarchar column (Col) like this:
Col ... more columns
Row 1: '1'
Row 2: '1|2'
Row 3: '2|40'
I want to do a search to match similar users. So if I have a user that has a Col value of '1' I would expect the search to return the first two rows. If I had a user with a Col value of '1|2' I would expect to get Row 2 returned first and then Row 1. If I try to match users with a Col value of '4' I wouldn't get any results. I thought of doing a 'contains' by splitting the value I am using to query but it wouldn't work since '2|40' contains 4...
I looked up the documentation on using the 'FREETEXT' keyword but I don't think that would work for me since I essentially need to break up the Col values into words using the '|' as a break.
Thanks,
John
You should not store values like '1|2' in a field to store 2 values. If you have a maximum of 2 values, you should use 2 fields to store them. If you can have 0-many values, you should store them in a new table with a foreign key pointing to the primary key of your table..
If you only have max 2 values in your table. You can find your data like this:
DECLARE #s VARCHAR(3) = '1'
SELECT *
FROM <table>
WHERE #s IN(
PARSENAME(REPLACE(col, '|', '.'), 1),
PARSENAME(REPLACE(col, '|', '.'), 2)
--,PARSENAME(REPLACE(col, '|', '.'), 3) -- if col can contain 3
--,PARSENAME(REPLACE(col, '|', '.'), 4) -- or 4 values this can be used
)
Parsename can handle max 4 values. If 'col' can contain more than 4 values use this
DECLARE #s VARCHAR(3) = '1'
SELECT *
FROM <table>
WHERE '|' + col + '|' like '%|' + #s + '|%'
Need to mix this in with a case for when there is no | but this returns the left and right hand sides
select left('2|10', CHARINDEX('|', '2|10') - 1)
select right('2|10', CHARINDEX('|', '2|10'))