sql query to break down count of every combination - postgresql

I need a Postgresql Query that returns the count of every type of combination of record.
For example, I have a table T with columns A, B, C, D, E and other columns that are not of importance:
Table T
--------------
A | B | C | D | E
The query should return a table R with the values from columns A, B, C, D, and a count for how many times each configuration occurs with the specified E value.
Table R
---------------
A | B | C | D | count
When all of the counts for each record are added together, it should equal the total number of records in the original table.
It seems like a very simple problem, but due to my lack of SQL knowledge, I cannot figure out how to do this.
The only solution I can think of is this:
select a, b, c, d, count(*)
from T
where e = 'abc'
group by a, b, c, d
But when adding the counts up from this query, it is way more than the count of the original table. It seems like count(*) shouldn't be used, or i'm just totally going about this the wrong way. I'd really appreciate any advice as to how I should go about this. Thank you all.

NULL values couldn't possibly fool you. Consider this demo:
WITH t(a,b,c,d) AS (
VALUES
(1,2,3,4)
,(1,2,3,NULL)
,(2,2,3,NULL)
,(2,2,3,NULL)
,(2,2,3,4)
,(2,NULL,NULL,NULL)
,(NULL,NULL,NULL,NULL)
)
SELECT a, b, c, d, count(*)
FROM t
GROUP BY a, b, c, d
ORDER BY a, b, c, d;
a | b | c | d | count
---+---+---+---+-------
1 | 2 | 3 | 4 | 1
1 | 2 | 3 | | 1
2 | 2 | 3 | 4 | 1
2 | 2 | 3 | | 2
2 | | | | 1
| | | | 1
There must be some other misunderstanding here.

I figured it out, it was something really stupid. I forgot to specify the where 'E' = 'ABC' clause in the select count(*) when comparing the count. Thanks anyway for your help guys!

Related

Given a row representing a path, union a total column

Say I have a table like the following table that represents a path from 1 -> 2 -> 3 -> 4 -> 5:
+------+----+--------+
| from | to | weight |
+------+----+--------+
| a | b | 1 |
| b | c | 2 |
| c | d | 1 |
| d | e | 1 |
| e | f | 3 |
+------+----+--------+
Each row knows where it came from and where it is going
I would like to union a total row that takes the starting name, ending name, and a total weight like so:
+------+----+--------+
| from | to | weight |
+------+----+--------+
| a | f | 8 |
+------+----+--------+
The first table is a result of a CTE expression, and I can easily get the total of the previous query with SUM, but I'm unable to get the LAST_VALUE to work in a similar way to:
WITH RECURSIVE cte AS (
...
)
SELECT *
FROM cte
UNION ALL
SELECT 'total', FIRST_VALUE(from), LAST_VALUE(to), SUM(weight)
FROM cte
The FIRST_VALUE and LAST_VALUE functions require OVER clauses which seem to add unnecessary complications to what I would expect, so I think I am going the wrong direction with that. Any ideas on how to achieve this?
So I made a strange solution that:
Selects the first from value (partitioned by TRUE)
Selects the last to value (partitioned by TRUE again)
Cross joins the sum of all weights, limited to 1
WITH RECURSIVE cte AS (
...
)
SELECT *
FROM cte
UNION ALL (
SELECT FIRST_VALUE(from) OVER (PARTITION BY TRUE), LAST_VALUE(to) OVER (PARTITION BY TRUE), total
FROM cte
CROSS JOIN (
SELECT SUM(weight) as total
FROM cte
) tmp
LIMIT 1
);
Is it hacky? Yes. Does it work? Also yes. I'm sure there are better solutions, and I would love to hear them.

Is there an equivalent PostgresSQL window function (or alternate procedure) for the aggregate function bool_or()?

Given the following data:
select a,b from newtable;
a | b
---+---
a | f
a | f
a | f
b | f
b | f
b | t
(6 rows)
The statement
select a, bool_or(b) from newtable group by a;
a | bool_or
---+---------
a | f
b | t
will produce a single row per distinct value (as expected from an aggregate function).
I was looking for an equivalent window function but seems that there is no such function in PostgreSQL. Is there any way to get the same result? Just to be clear I was looking for this result:
a | bool_or
---+---------
a | f
a | f
a | f
b | t
b | t
b | t
Although the bool_or() is not explicitly listed in the PostgreSQL documentation page for window functions you can still use aggregate functions like bool_or() or any built-in function over windows.
It says so in the window function documentation:
any built-in or user-defined general-purpose or statistical aggregate
can be used as a window function
So to get the desired result use:
select a, bool_or(b) over w from newtable window w as (partition by a) ;
a | bool_or
---+---------
a | f
a | f
a | f
b | t
b | t
b | t
(6 rows)

postgresql find similar word groups

I have a table1 containing a column A, where ~100,000 strings (varchar) are stored. Unfortunately, each string has multiple words which are seperated with spaces. Further they have different length, i.e. one string can consist of 3 words while an other string contains 7 words.
Then I have a column B stored in a second table2 which contains only 100 strings in the same manner. Hence, multiple words per string, seperated by spaces.
The target is, to look how likely a record of Column B is matching with probably multiple records of column A based on the words. The result should also have a ranking. I was thinking of using full text search in a loop but I don't know how to do this, or if there is a proper way to achieve this?
I don't know if you can "tturn" table to a dictionary to use full text for ranking here. But you can query it with some primityve ranking quite easily, eg:
t=# with a(a) as (values('a b c'),('a c d'),('b e f'),('r b t'),('q w'))
, b(i,b) as (values(1,'a b'), (2,'e'), (3,'b'))
, p as (select unnest(string_to_array(b.b,' ')) arr,i from b)
select a phrases,arr match_words,count(1) over (partition by arr) words_in_matches, count(1) over (partition by i) matches,i from a left join p on a.a like '%'||arr||'%';
phrases | match_words | words_in_matches | matches | i
---------+-------------+------------------+---------+---
r b t | b | 6 | 5 | 1
a b c | b | 6 | 5 | 1
b e f | b | 6 | 5 | 1
a b c | a | 2 | 5 | 1
a c d | a | 2 | 5 | 1
b e f | e | 1 | 1 | 2
r b t | b | 6 | 3 | 3
a b c | b | 6 | 3 | 3
b e f | b | 6 | 3 | 3
q w | | 1 | 1 |
(10 rows)
phrases are rows from your big table.
match_words are tokens from your small table (splitted by spaces)
words_in_matches amount of tokens in phrases
matches is amount of matches in big table phrases from small table phrases
i index of phrase from small table
So you can order by third or fourth column to get some sort of ranking...

What is the kdb+ capital C type?

While trying to query a kdb instance I ran into some type conversion problems (using qPython). When getting the meta data of the table using meta <tablename> it returns the following:
c | t f a
-----------| -----
time | t
sym | s g
OrderID | C
ClOrderID | g
OrigClOrdID| g
SecurityID | s
Symbol | s
Side | c
OrderQty | f
CumQty | f
LeavesQty | f
AvgPx | f
Currency | s
Commission | f
CommType | c
CommValue | f
Account | s
MsgType | s
OrdStatus | s
OrderTime | t
Now the OrderID column is the one causing me some trouble. I've looked at the kdb docs I can find te c type which indicates the column type is a char, but I can't find anything on the (capital) C type.
I've tried treating it like a char, but that didn't work.
Any ideas on what this C means?
Capital types are nested lists - so the OrderID column in is a list, where each element of the list is a list of type character e.g
q)meta ([]OrderID:("hello";"there");charlist:"ht")
c | t f a
--------| -----
OrderID | C
charlist| c
Also while defining an empty table, it is not possible to define the column type as a char list.
q)t:([] oid:();price:`float$();side:`char$() )
q)meta t
c | t f a
-----| -----
oid | //type undefined
price| f
side | c
q)meta t upsert `oid`price`side!("123";100.01;"B")
c | t f a
-----| -----
oid | C
price| f
side | c
Care must be taken while inserting the first record in this case, otherwise, it might take some unintentional structure.
Example if the oid is inserted as a symbol rather than char list.
q)show meta t upsert `oid`price`side!(`$"123";100.01;"B")
c | t f a
-----| -----
oid | s
price| f
side | c

postgres counting one record twice if it meets certain criteria

I thought that the query below would naturally do what I explain, but apparently not...
My table looks like this:
id | name | g | partner | g2
1 | John | M | Sam | M
2 | Devon | M | Mike | M
3 | Kurt | M | Susan | F
4 | Stacy | F | Bob | M
5 | Rosa | F | Rita | F
I'm trying to get the id where either the g or g2 value equals 'M'... But, a record where both the g and g2 values are 'M' should return two lines, not 1.
So, in the above sample data, I'm trying to return:
$q = pg_query("SELECT id FROM mytable WHERE ( g = 'M' OR g2 = 'M' )");
1
1
2
2
3
4
But, it always returns:
1
2
3
4
Your query doesn't work because each row is returned only once whether it matches one or both of the conditions. To get what you want use two queries and use UNION ALL to combine the results:
SELECT id FROM mytable WHERE g = 'M'
UNION ALL
SELECT id FROM mytable WHERE g2 = 'M'
ORDER BY id
Result:
1
1
2
2
3
4
you might try a UNION along these lines:
"SELECT id FROM mytable WHERE ( g = 'M') UNION SELECT id FROM mytable WHERE ( g2 = 'M')"
Hope this helps, Martin
SELECT id FROM mytable WHERE g = 'M'
UNION
SELECT id FROM mytable WHERE g2 = 'M'