SQLite Manager: how to select columns in a table, keeping the count of the original occurrence of their values? - select

I am new to databases, and I am using the SQLite Manager Firefox add-on for a table like this:
rowid | col1 | col2 | col3
1 | N | Y | N
2 | N | N | N
3 | N | Y | Y
4 | N | Y | N
and I would like to reduce it to a table with a smaller number of columns (for instance 2) in the following way: each row should represent one possible combination of values (N,Y) of the 2 selected columns, associated with a count. This count should represent the number of rows in the original table where the other column assumed different values but the selected 2 columns have the given combination of values.
To be clear, if I select column 2 and 3, I would like to obtain:
col2 | col3 | count
Y | N | 2
N | N | 1
Y | Y | 1
while if I select columns 1 and 2:
col1 | col2 | count
N | Y | 3
N | N | 1
I have tried to use a combination of commands such as COUNT and GROUP BY, but without reaching my goal. For instance, I`ve tried to use:
SELECT *, COUNT (*) AS count FROM table_test
GROUP BY col2, col3
but it seems to work only for col2, giving me the # times col2=Y and the # times col2=N, but not in combination with col3...
Do you have any suggestion?

Related

Efficient way to retrieve all values from a column that start with other values from the same column in PostgreSQL

For the sake of simplicity, suppose you have a table with numbers like:
| number |
----------
|123 |
|1234 |
|12345 |
|123456 |
|111 |
|1111 |
|2 |
|700 |
What would be an efficient way of retrieving the shortest numbers (call them roots or whatever) and all values derived from them, eg:
| root | derivatives |
--------------------------------
| 123 | 1234, 12345, 123456 |
| 111 | 1111 |
Numbers 2 & 700 are excluded from the list because they're unique, and thus have no derivatives.
An output as the above would be ideal, but since it's probably difficult to achieve, the next best thing would be something like below, which I can then post-process:
| root | derivative |
-----------------------
| 123 | 1234 |
| 123 | 12345 |
| 123 | 123456 |
| 111 | 1111 |
My naive initial attempt to at least identify roots (see below) has been running for 4h now with a dataset of ~500k items, but the real one I'd have to inspect consists of millions.
select number
from numbers n1
where exists(
select number
from numbers n2
where n2.number <> n1.number
and n2.number like n1.number || '_%'
);
This works if number is an integer or bigint:
select min(a.number) as root, b.number as derivative
from nums a
cross join lateral generate_series(1, 18) as gs(power)
join nums b
on b.number / (10^gs.power)::bigint = a.number
group by b.number
order by root, derivative;
EDIT: I moved a non-working query to the bottom. It fails for reasons outlined by #Morfic in the comments.
We can do a similar and simpler join using like for character types:
select min(a.number) as root, b.number as derivative
from numchar a
join numchar b on b.number like a.number||'%'
and b.number != a.number
group by b.number
order by root, derivative;
Updated fiddle.
Faulty Solution Follows
If number is a character type, then try this:
with groupings as (
select number,
case
when number like (lag(number) over (order by number))||'%' then 0
else 1
end as newgroup
from numchar
), groupnums as (
select number, sum(newgroup) over (order by number) as groupnum
from groupings
), matches as (
select min(number) over (partition by groupnum) as root,
number as derivative
from groupnums
)
select *
from matches
where root != derivative;
There should be only a single sort on groupnum in this execution since the column is your table's primary key.
db<>fiddle here

PostgresQL for each row, generate new rows and merge

I have a table called example that looks as follows:
ID | MIN | MAX |
1 | 1 | 5 |
2 | 34 | 38 |
I need to take each ID and loop from it's min to max, incrementing by 2 and thus get the following WITHOUT using INSERT statements, thus in a SELECT:
ID | INDEX | VALUE
1 | 1 | 1
1 | 2 | 3
1 | 3 | 5
2 | 1 | 34
2 | 2 | 36
2 | 3 | 38
Any ideas of how to do this?
The set-returning function generate_series does exactly that:
SELECT
id,
generate_series(1, (max-min)/2+1) AS index,
generate_series(min, max, 2) AS value
FROM
example;
(online demo)
The index can alternatively be generated with RANK() (example, see also #a_horse_­with_­no_­name's answer) if you don't want to rely on the parallel sets.
Use generate_series() to generate the numbers and a window function to calculate the index:
select e.id,
row_number() over (partition by e.id order by g.value) as index,
g.value
from example e
cross join generate_series(e.min, e.max, 2) as g(value);

Aggregate all combinations of rows taken k at a time

I am trying to calculate an aggregate function for a field for a subset of rows in a table. The problem is that I'd like to find the mean of every combination of rows taken k at a time --- so for all the rows, I'd like to find (say) the mean of every combination of 10 rows. So:
id | count
----|------
1 | 5
2 | 3
3 | 6
...
30 | 16
should give me
mean of ids 1..10; ids 1, 3..11; ids 1, 4..12, and so so. I know this will yield a lot of rows.
There are SO answers for finding combinations from arrays. I could do this programmatically by taking 30 ids 10 at a time and then SELECTing them. Is there a way to do this with PARTITION BY, TABLESAMPLE, or another function (something like python's itertools.combinations())? (TABLESAMPLE by itself won't guarantee which subset of rows I am selecting as far as I can tell.)
The method described in the cited answer is static. A more convenient solution may be to use recursion.
Example data:
drop table if exists my_table;
create table my_table(id int primary key, number int);
insert into my_table values
(1, 5),
(2, 3),
(3, 6),
(4, 9),
(5, 2);
Query which finds 2 element subsets in 5 element set (k-combination with k = 2):
with recursive recur as (
select
id,
array[id] as combination,
array[number] as numbers,
number as sum
from my_table
union all
select
t.id,
combination || t.id,
numbers || t.number,
sum+ number
from my_table t
join recur r on r.id < t.id
and cardinality(combination) < 2 -- param k
)
select combination, numbers, sum/2.0 as average -- param k
from recur
where cardinality(combination) = 2 -- param k
combination | numbers | average
-------------+---------+--------------------
{1,2} | {5,3} | 4.0000000000000000
{1,3} | {5,6} | 5.5000000000000000
{1,4} | {5,9} | 7.0000000000000000
{1,5} | {5,2} | 3.5000000000000000
{2,3} | {3,6} | 4.5000000000000000
{2,4} | {3,9} | 6.0000000000000000
{2,5} | {3,2} | 2.5000000000000000
{3,4} | {6,9} | 7.5000000000000000
{3,5} | {6,2} | 4.0000000000000000
{4,5} | {9,2} | 5.5000000000000000
(10 rows)
The same query for k = 3 gives:
combination | numbers | average
-------------+---------+--------------------
{1,2,3} | {5,3,6} | 4.6666666666666667
{1,2,4} | {5,3,9} | 5.6666666666666667
{1,2,5} | {5,3,2} | 3.3333333333333333
{1,3,4} | {5,6,9} | 6.6666666666666667
{1,3,5} | {5,6,2} | 4.3333333333333333
{1,4,5} | {5,9,2} | 5.3333333333333333
{2,3,4} | {3,6,9} | 6.0000000000000000
{2,3,5} | {3,6,2} | 3.6666666666666667
{2,4,5} | {3,9,2} | 4.6666666666666667
{3,4,5} | {6,9,2} | 5.6666666666666667
(10 rows)
Of course, you can remove numbers from the query if you do not need them.

postgresql find similar word groups

I have a table1 containing a column A, where ~100,000 strings (varchar) are stored. Unfortunately, each string has multiple words which are seperated with spaces. Further they have different length, i.e. one string can consist of 3 words while an other string contains 7 words.
Then I have a column B stored in a second table2 which contains only 100 strings in the same manner. Hence, multiple words per string, seperated by spaces.
The target is, to look how likely a record of Column B is matching with probably multiple records of column A based on the words. The result should also have a ranking. I was thinking of using full text search in a loop but I don't know how to do this, or if there is a proper way to achieve this?
I don't know if you can "tturn" table to a dictionary to use full text for ranking here. But you can query it with some primityve ranking quite easily, eg:
t=# with a(a) as (values('a b c'),('a c d'),('b e f'),('r b t'),('q w'))
, b(i,b) as (values(1,'a b'), (2,'e'), (3,'b'))
, p as (select unnest(string_to_array(b.b,' ')) arr,i from b)
select a phrases,arr match_words,count(1) over (partition by arr) words_in_matches, count(1) over (partition by i) matches,i from a left join p on a.a like '%'||arr||'%';
phrases | match_words | words_in_matches | matches | i
---------+-------------+------------------+---------+---
r b t | b | 6 | 5 | 1
a b c | b | 6 | 5 | 1
b e f | b | 6 | 5 | 1
a b c | a | 2 | 5 | 1
a c d | a | 2 | 5 | 1
b e f | e | 1 | 1 | 2
r b t | b | 6 | 3 | 3
a b c | b | 6 | 3 | 3
b e f | b | 6 | 3 | 3
q w | | 1 | 1 |
(10 rows)
phrases are rows from your big table.
match_words are tokens from your small table (splitted by spaces)
words_in_matches amount of tokens in phrases
matches is amount of matches in big table phrases from small table phrases
i index of phrase from small table
So you can order by third or fourth column to get some sort of ranking...

T-SQL. HOW to create a table with a sequence of values

I have a table with a list of names and indices. For example like this:
ID | Name | Index
1 | Value 1 | 3
2 | Value 2 | 4
...
N | Value N | NN
I need to create a new table, where every value from field "Name" will be repeat repeated as many times as the "Index" field is specified. For example like this:
ID | Name_2 | ID_2
1 | Value 1 | 1
2 | Value 1 | 2
3 | Value 1 | 3
4 | Value 2 | 1
5 | Value 2 | 2
6 | Value 2 | 3
7 | Value 2 | 4
...
N | Value N | 1
N+1| Value N | 2
...
I have no idea how to write a cycle to get such result. Please, give me an advice.
Here is solution to repeat the rows based on a column value
declare #order table ( Id int, name varchar(20), indx int)
Insert into #order
(Id, name, indx)
VALUES
(1,'Value1',3),
(2,'Value2',4),
(3,'Value3',2)
;WITH cte AS
(
SELECT * FROM #order
UNION ALL
SELECT cte.[ID], cte.name, (cte.indx - 1) indx
FROM cte INNER JOIN #order t
ON cte.[ID] = t.[ID]
WHERE cte.indx > 1
)
SELECT ROW_NUMBER() OVER(ORDER BY name ASC) AS Id, name as [name_2], 1 as [Id_2]
FROM cte
ORDER BY 1