CHARINDEX Function is not supported by Tableau - tableau-api

col1=LEFT([col2], CHARINDEX('_', [col2]) - 1)
I am trying to join on columns and I should match col1 data equal to col2 data, but in col2 it should check the characters before the delimiter value '_'
What can I use in the join condition instead on Charindex in Tableau, as this is not supported by Tableau.
col1
abc
dcb
col2
abc_123
dcb_123

You can define a calculated field to use in your joins. The split() function is available and is designed for exactly this purpose. That is you can use split() to obtain the substring before the underscore, which you can use a join key

Related

Casting rows to arrays in PostgreSQL

I need to query a table as in
SELECT *
FROM table_schema.table_name
only each row needs to be a TEXT[] with array values corresponding to column values casted to TEXT coming in the same order as in SELECT * so assuming the table has columns a, b and c I need the result to look like
SELECT ARRAY[a::TEXT, b::TEXT, c::TEXT]
FROM table_schema.table_name
only it shouldn't explicitly list columns by name. Ideally it should look like
SELECT as_text_array(a)
FROM table_schema.table_name AS a
The best I came up with looks ugly and relies on "hstore" extension
WITH columnz AS ( -- get ordered column name array
SELECT array_agg(attname::TEXT ORDER BY attnum) AS column_name_array
FROM pg_attribute
WHERE attrelid = 'table_schema.table_name'::regclass AND attnum > 0 AND NOT attisdropped
)
SELECT hstore(a)->(SELECT column_name_array FROM columnz)
FROM table_schema.table_name AS a
I am having a feeling there must be a simpler way to achieve that
UPDATE 1
Another query that achieves the same result but arguably as ugly and inefficient as the first one is inspired by the answer by #bspates. It may be even less efficient but doesn't rely on extensions
SELECT r.text_array
FROM table_schema.table_name AS a
INNER JOIN LATERAL ( -- parse ROW::TEXT presentation of a row
SELECT array_agg(COALESCE(replace(val[1], '""', '"'), NULLIF(val[2], ''))) AS text_array
FROM regexp_matches(a::text, -- parse double-quoted and simple values separated by commas
'(?<=\A\(|,) (?: "( (?:[^"]|"")* )" | ([^,"]*) ) (?=,|\)\Z)', 'xg') AS t(val)
) AS r ON TRUE
It is still far from ideal
UPDATE 2
I tested all 3 options existing at the moment
Using JSON. It doesn't rely on any extensions, it is short to write, easy to understand and the speed is ok.
Using hstore. This alternative is the fastest (>10 times faster than JSON approach on a 100K dataset) but requires an extension. hstore in general is very handy extension to have through.
Using regex to parse TEXT presentation of a ROW. This option is really slow.
A somewhat ugly hack is to convert the row to a JSON value, then unnest the values and aggregate it back to an array:
select array(select (json_each_text(to_json(t))).value) as row_value
from some_table t
Which is to some extent the same as your hstore hack.
If the order of the columns is important, then using json and with ordinality can be used to keep that:
select array(select val
from json_each_text(to_json(t)) with ordinality as t(k,val,idx)
order by idx)
from the_table t
The easiest (read hacky-est) way I can think of is convert to a string first then parse that string into an array. Like so:
SELECT string_to_array(table_name::text, ',') FROM table_name
BUT depending on the size and type of the data in the table, this could perform very badly.

PostgreSQL select uniques from three different columns

I have one large table 100m+ rows and two smaller ones 2m rows ea. All three tables have a column of company names that need to be sent out to an API for matching. I want to select the strings from each column and then combine into a single column of unique strings.
I'm using a version of this response, but unsurprisingly the performance is very slow. Combined 2 columns into one column SQL
SELECT DISTINCT
unnest(string_to_array(upper(t.buyer) || '#' || upper(a.aw_supplier_name) || '#' || upper(b.supplier_source_string), '#'))
FROM
tenders t,
awards a,
banking b
;
Any ideas on a more performant way to achieve this?
Update: the banking table is the largest table with 100m rows.
Assuming PostgreSQL 9.6 and borrowing the select from rd_nielsen's answer, the following should give you a comma delimited string of the distinct names.
WITH cte
AS (
SELECT UPPER(T.buyer) NAMES
FROM tenders T
UNION
SELECT UPPER(A.aw_supplier_name) NAMES
FROM awards A
UNION
SELECT UPPER(b.supplier_source_string) NAMES
FROM banking b
)
SELECT array_to_string(ARRAY_AGG(cte.names), ',')
FROM cte
To get just a list of the combined names from all three tables, you could instead union together the selections from each table, like so:
select
upper(t.buyer)
from
tenders t
union
select
upper(a.aw_supplier_name)
from
awards a
union
select
upper(b.supplier_source_string)
from
banking b
;

Error while using regexp_split_to_table (Amazon Redshift)

I have the same question as this:
Splitting a comma-separated field in Postgresql and doing a UNION ALL on all the resulting tables
Just that my 'fruits' column is delimited by '|'. When I try:
SELECT
yourTable.ID,
regexp_split_to_table(yourTable.fruits, E'|') AS split_fruits
FROM yourTable
I get the following:
ERROR: type "e" does not exist
Q1. What does the E do? I saw some examples where E is not used. The official docs don't explain it in their "quick brown fox..." example.
Q2. How do I use '|' as the delimiter for my query?
Edit: I am using PostgreSQL 8.0.2. unnest() and regexp_split_to_table() both are not supported.
A1
E is a prefix for Posix-style escape strings. You don't normally need this in modern Postgres. Only prepend it if you want to interpret special characters in the string. Like E'\n' for a newline char.Details and links to documentation:
Insert text with single quotes in PostgreSQL
SQL select where column begins with \
E is pointless noise in your query, but it should still work. The answer you are linking to is not very good, I am afraid.
A2
Should work as is. But better without the E.
SELECT id, regexp_split_to_table(fruits, '|') AS split_fruits
FROM tbl;
For simple delimiters, you don't need expensive regular expressions. This is typically faster:
SELECT id, unnest(string_to_array(fruits, '|')) AS split_fruits
FROM tbl;
In Postgres 9.3+ you'd rather use a LATERAL join for set-returning functions:
SELECT t.id, f.split_fruits
FROM tbl t
LEFT JOIN LATERAL unnest(string_to_array(fruits, '|')) AS f(split_fruits)
ON true;
Details:
What is the difference between LATERAL and a subquery in PostgreSQL?
PostgreSQL unnest() with element number
Amazon Redshift is not Postgres
It only implements a reduced set of features as documented in its manual. In particular, there are no table functions, including the essential functions unnest(), generate_series() or regexp_split_to_table() when working with its "compute nodes" (accessing any tables).
You should go with a normalized table layout to begin with (extra table with one fruit per row).
Or here are some options to create a set of rows in Redshift:
How to select multiple rows filled with constants in Amazon Redshift?
This workaround should do it:
Create a table of numbers, with at least as many rows as there can be fruits in your column. Temporary or permanent if you'll keep using it. Say we never have more than 9:
CREATE TEMP TABLE nr9(i int);
INSERT INTO nr9(i) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9);
Join to the number table and use split_part(), which is actually implemented in Redshift:
SELECT *, split_part(t.fruits, '|', n.i) As fruit
FROM nr9 n
JOIN tbl t ON split_part(t.fruits, '|', n.i) <> ''
Voilá.

SELECT * except nth column

Is it possible to SELECT * but without n-th column, for example 2nd?
I have some view that have 4 and 5 columns (each has different column names, except for the 2nd column), but I do not want to show the second column.
SELECT * -- how to prevent 2nd column to be selected?
FROM view4
WHERE col2 = 'foo';
SELECT * -- how to prevent 2nd column to be selected?
FROM view5
WHERE col2 = 'foo';
without having to list all the columns (since they all have different column name).
The real answer is that you just can not practically (See LINK). This has been a requested feature for decades and the developers refuse to implement it. The best practice is to mention the column names instead of *. Using * in itself a source of performance penalties though.
However, in case you really need to use it, you might need to select the columns directly from the schema -> check LINK. Or as the below example using two PostgreSQL built-in functions: ARRAY and ARRAY_TO_STRING. The first one transforms a query result into an array, and the second one concatenates array components into a string. List components separator can be specified with the second parameter of the ARRAY_TO_STRING function;
SELECT 'SELECT ' ||
ARRAY_TO_STRING(ARRAY(SELECT COLUMN_NAME::VARCHAR(50)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME='view4' AND
COLUMN_NAME NOT IN ('col2')
ORDER BY ORDINAL_POSITION
), ', ') || ' FROM view4';
where strings are concatenated with the standard operator ||. The COLUMN_NAME data type is information_schema.sql_identifier. This data type requires explicit conversion to CHAR/VARCHAR data type.
But that is not recommended as well, What if you add more columns in the long run but they are not necessarily required for that query?
You would start pulling more column than you need.
What if the select is part of an insert as in
Insert into tableA (col1, col2, col3.. coln) Select everything but 2 columns FROM tableB
The column match will be wrong and your insert will fail.
It's possible but I still recommend writing every needed column for every select written even if nearly every column is required.
Conclusion:
Since you are already using a VIEW, the simplest and most reliable way is to alter you view and mention the column names, excluding your 2nd column..
-- my table with 2 rows and 4 columns
DROP TABLE IF EXISTS t_target_table;
CREATE TEMP TABLE t_target_table as
SELECT 1 as id, 1 as v1 ,2 as v2,3 as v3,4 as v4
UNION ALL
SELECT 2 as id, 5 as v1 ,-6 as v2,7 as v3,8 as v4
;
-- my computation and stuff that i have to messure, any logic could be done here !
DROP TABLE IF EXISTS t_processing;
CREATE TEMP TABLE t_processing as
SELECT *, md5(t_target_table::text) as row_hash, case when v2 < 0 THEN true else false end as has_negative_value_in_v2
FROM t_target_table
;
-- now we want to insert that stuff into the t_target_table
-- this is standard
-- INSERT INTO t_target_table (id, v1, v2, v3, v4) SELECT id, v1, v2, v3, v4 FROM t_processing;
-- this is andvanced ;-)
INSERT INTO t_target_table
-- the following row select only the columns that are pressent in the target table, and ignore the others.
SELECT r.* FROM (SELECT to_jsonb(t_processing) as d FROM t_processing) t JOIN LATERAL jsonb_populate_record(NULL::t_target_table, d) as r ON TRUE
;
-- WARNING : you need a object that represent the target structure, an exclusion of a single column is not possible
For columns col1, col2, col3 and col4 you will need to request
SELECT col1, col3, col4 FROM...
to omit the second column. Requesting
SELECT *
will get you all the columns

SqlAlchemy: count of distinct over multiple columns

I can't do:
>>> session.query(
func.count(distinct(Hit.ip_address, Hit.user_agent)).first()
TypeError: distinct() takes exactly 1 argument (2 given)
I can do:
session.query(
func.count(distinct(func.concat(Hit.ip_address, Hit.user_agent))).first()
Which is fine (count of unique users in a 'pageload' db table).
This isn't correct in the general case, e.g. will give a count of 1 instead of 2 for the following table:
col_a | col_b
----------------
xx | yy
xxy | y
Is there any way to generate the following SQL (which is valid in postgresql at least)?
SELECT count(distinct (col_a, col_b)) FROM my_table;
distinct() accepts more than one argument when appended to the query object:
session.query(Hit).distinct(Hit.ip_address, Hit.user_agent).count()
It should generate something like:
SELECT count(*) AS count_1
FROM (SELECT DISTINCT ON (hit.ip_address, hit.user_agent)
hit.ip_address AS hit_ip_address, hit.user_agent AS hit_user_agent
FROM hit) AS anon_1
which is even a bit closer to what you wanted.
The exact query can be produced using the tuple_() construct:
session.query(
func.count(distinct(tuple_(Hit.ip_address, Hit.user_agent)))).scalar()
Looks like sqlalchemy distinct() accepts only one column or expression.
Another way around is to use group_by and count. This should be more efficient than using concat of two columns - with group by database would be able to use indexes if they do exist:
session.query(Hit.ip_address, Hit.user_agent).\
group_by(Hit.ip_address, Hit.user_agent).count()
Generated query would still look different from what you asked about:
SELECT count(*) AS count_1
FROM (SELECT hittable.user_agent AS hittableuser_agent, hittable.ip_address AS sometable_column2
FROM hittable GROUP BY hittable.user_agent, hittable.ip_address) AS anon_1
You can add some variables or characters in concat function in order to make it distinct. Taking your example as reference it should be:
session.query(
func.count(distinct(func.concat(Hit.ip_address, "-", Hit.user_agent))).first()