Permutations with ordering and restrictions - postgresql

I have some symbols and I need to arrange them and determine the number of ways in which they can be arranged:
(x11,x12,x13); (x21,x22,x23); (y11,y12,y13); (y21, y22, y23); (y31,y32,y33); (z11, z12, z13)
Note that while calculating the permutations, the order within the round brackets must be maintained. For instance, one possible order could be x11, x21, x12, x22, x13, x23 ... Note that here x13 occurs after x12 which occurs after x11.
How do we determine the total number of permutations with this restriction?

One way would be to create 2 tables and do a cartesian join:
create table letters (letter char(1));
create table numbers (num numeric);
insert into letters values ('A');
insert into letters values ('B');
insert into letters values ('C');
insert into numbers values (1);
insert into numbers values (2);
insert into numbers values (3);
select letter, num
from letters, numbers;
With more code, you can do this without the "hard tables" and instead hardcode the data within an array, then have postgres use it the same as a table-- just depends on how much data we're talking about.
Then again, I could be completely misunderstanding the problem you're trying to solve as the number of permutations in my example is "number of rows in letters table" times "Number of rows in numbers table"

Related

Delete rows from a Table when 2 specific columns are both zeros

I'm trying to delete rows of a table, when both values of 2 specific columns are equal to zeros. I've tried to use ismember(col1 & col2, 0),:)=[]; but it deletes the rows when only one of the column is zero.
Ideally, i would also like to do the opposite: delete every rows where the cells of these 2 columns aren't both zero.
I know it would be easier if I wasn't using a table, unfortunately there is some needed variables that aren't numeric.
It would be great if you know a way to do what i need with a table.
Cheers

Postgres: Get percentile of number not necessarily in table column

Imagine I have a column my_variable of floats in my a my_table. I know how to convert each of the rows in this my_variable column into percentiles, but my question is: I have a number x that is not necessarily in the table. Let's call it 7.67. How do I efficiently compute where 7.67 falls in that distribution of my_variable? I would like to be able to say "7.67 is in the 16.7th percentile" or "7.67 is larger than 16.7% of rows in my_variable." Note that 7.67 is not something taken from the column, but I'm inputting it in the SQL query itself.
I was thinking about ordering my_variable in ascending order and counting the number of rows that fall below the number I specify and dividing by the total number of rows, but is there a more computationally efficient way of doing this, perhaps?
If your data does not change too often, you can use a materialized view or a different table, call it percentiles, in which you store 100 or 1.000 (depending on the precision you need). This table should have a descending index on the value column.
Each row contains the minimum value to reach a certain percentile and the percentile itself.
Then you just need to get the first row that have value greater than the given data and read the percentile value.
In you example the table will contain 1.000 rows, and you could have someting like:
Percentile value
16.9 7.71
16.8 7.69
16.7 7.66
16.6 7.65
16.5 7.62
And your query could be something like:
SELECT TOP 1 Percentile FROM percentiles where 7.67 < value ORDER BY value desc
This is a valid solution if the number of SELECTs you make is much bigger than the numbers of updates in the my_table table.
I ended up doing:
select (avg(dummy_var::float))
from (
select case when var_name < 3.14 then 1 else 0 end as dummy_var
from table_name where var_name is not null
)
Where var_name was the variable of interest, table_name was the table of interest, and 3.14 was the number of interest.

D trigger scheme test

I want to test my D trigger scheme and I don't know how to do that. I don't even know how "truth table" works and maybe someone can help/explain how to make a test for my D trigger scheme?
A logical table works as follows:
Each row represents a case, each column represents a statement. A cell represents the logical value of a statement. It is wise to differentiate mentally atomic columns from molecular columns. In your case, you have three atomic columns, they represent the cases. These are the first three columns in your logical table. The other columns are molecular columns, that is, their value is composed by other columns.
D is said to be the value of XOR-ing the third and the fifth column. If you look in each row the value of the third and the fifth column in the table, you will see that D4 is true (1) if they differ and it is false (0) if they are similar.
I am not sure what do you mean by testing the D trigger scheme, but if by that you mean that we should test whether the formula matches to the values, then we can say that yes, it matches, since the general concept of XOR is matched in all cases in your logical table.

Fitting code-chunks to a probability distribution function on two variables

I am sorry for this complicated problem, but, I will try my best to explain myself.
This is basically a Hidden Markov Model question. I have two columns of data. The data in these two columns are independent of each other, however, together represent a specific movement which can be character-coded. I assign a character in 3rd column by putting conditions on column1 and column2 entry. Note: the characters are finite (~10-15).
For example:-
if (column1(i)>0.5) && (column2(i)<15)
column3(i)='D';
I end up with a string something like this
AAAAADDDDDCCCCCFFFFAAAACCCCCFFFFFFDDD
So, each of the character gets repeated but not of constant lengths (e.g first time A's appear 5 times while second time A's appear 4 times only).
Now, let us take the first chunk of A's (AAAAA), each A containing an ordered pair of column1 and column2 values. Now, comparing with the second chunk of A's (AAAA), the values of column1 and column2 should be similar to those of first chunk. Usually, the values in each columns would be either increasing or decreasing or constant throughout a chunk. And the values of the columns in both chunks should be similar. For example, column1 goes from -1 to -5 in 5 unequal samples but in second chunk it goes from -1.2 to -5.1 in 4 unequal steps.
What I want is a fitting of a probability distribution over column1 and column2 values (independently) for each set of repeated characters (e.g. for A's and then D's then C's then F's and then again A's).
And the final goal is following:-
given n elements in column1, column2, column3, I want to predict what is (n+1) element is going to be in column 3, how many times it is going to repeat itself (with probability e.g. 70% chance it is going to repeat itself 4 times and 20% chance it is going to repeat itself 5 times). Also, what is the probability distribution of column1 and column2 is going to be for the predicted character.
Please feel free to ask questions if I fail to explain it well.

Why does round sometimes produce extra digits? [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 3 years ago.
Occasionally, when selecting aggregate data (usually AVG()) from a database, I'll get a repeating decimal such as:
2.7777777777777777
When I apply ROUND(AVG(x), 2) to the value, I sometimes get a result like:
2.7800000000000002
I happen to know that the actual sample has 18 rows with SUM(x) = 50. So this command should be equivalent:
SELECT ROUND(50.0/18, 2)
However, it produces the expected result (2.78). Why does rounding sometimes produce wrong results with aggregate functions?
In order to verify the above result, I wrote a query against a dummy table:
declare #temp table(
x float
)
insert into #temp values (1.0);
insert into #temp values (2.0);
insert into #temp values (1.0);
insert into #temp values (1.0);
insert into #temp values (1.0);
insert into #temp values (1.0);
insert into #temp values (1.0);
insert into #temp values (8.0);
insert into #temp values (9.0);
select round(avg(x), 2),
sum(x),count(*)
from #temp
I'm aware of the gotchas of floating point representation, but this simple case seems not to be subject to those.
Decimal numbers don't always (or even usually) map 1:1 to an exact representation in floating point.
In your case, the two closest numbers that double precision floating point can represent are;
40063D70A3D70A3D which is approximately 2.77999999999999980460074766597
40063D70A3D70A3E which is approximately 2.78000000000000024868995751604
There exists no double precision number between those two numbers, in this case the database chose the higher value which - as you see - rounds to 2.7800000000000002.
Your table uses floats. When given floats as parameters, both AVG() and ROUND() return floats. Floats cannot be precisely represented. When you do ROUND(50.0/18, 2) you're giving it NUMERIC(8,6) which it returns as DECIMAL(8,6).
Try declaring your columns to use a more precise type like DECIMAL(18,9). The results should be more predictable.