Filter Table by Range

Filter Table by Range - merge

I have a parent table and a child table. The parent table only lists ranges of attributes. I'm looking to merge the two to create a proper hierarchy, but I need a way to filter the child table by the parent range first, I believe.
Here is a sample of the parent table:
parent_item start_attribute end_attribute
A 10 120
B 130 130
C 140 200
And the child table:
child_item child_attribute
U 10
V 50
W 60
X 130
Y 140
Z 150
The output table I'd be looking for is such:
parent_item child_item
A U
A V
A W
B X
C Y
C Z
To further confuse things, the attributes are alphanumeric, which eliminates uses a List.Generate() function I believe. I think I'm looking for something similar to the EARLIER() function in DAX, but I'm not sure I'm even looking at this problem the right way. Here is my pseudo code as I'd see it working:
Table.AddColumn(
#"parent_table",
"child_item",
each
Table.SelectRows(
child_table,
each ([child_attribute] <= EARLIER(end_attribute) and [child_attribute]>= EARLIER(start_attribute) )
)
)
This is a simplification as the child table actually contains five attributes and the parent table contains five respective attribute ranges.

I found this blog post, which held the key to referencing the current row environment. The main takeaway is this:
Each is a keyword to create simple functions. Each is an abbreviation for (_) =>, in which the underscore represents (if you are in a table environment, as we are) the current row.
Using a new function C for child_table, we can write
= Table.AddColumn(#"parent_table", "child_table", each
Table.SelectRows(Child, (C) =>
C[child_attribute] >= [start_attribute] and
C[child_attribute] <= [end_attribute]))
or more explicitly as
= Table.AddColumn(#"parent_table", "child_table", (P) =>
Table.SelectRows(Child, (C) =>
C[child_attribute] >= P[start_attribute] and
C[child_attribute] <= P[end_attribute]))
Once you add this column, just expand the child_item column from your new child_table column.

One possible approach is to do a full cross join and then filter out the rows you don't want.
Create a custom column on both tables with a constant value of, say, 1.
Merge the Child table into the Parent table matching on the new column.
Expand out the Child table to get a table like this:
Create a custom column with all your desired logic. For example,
if [child_attribute] >= [start_attribute] and
[child_attribute] <= [end_attribute]
then 1
else 0
Filter out just the 1 values in this new column.
Remove all other columns except for parent_item and child_item.

Related

Query table by a value in the second dimension of a two dimensional array column

WHAT I HAVE
I have a table with the following definition:
CREATE TABLE "Highlights"
(
id uuid,
chunks numeric[][]
)
WHAT I NEED TO DO
I need to query the data in the table using the following predicate:
... WHERE id = 'some uuid' and chunks[????????][1] > 10 chunks[????????][3] < 20
What should I put instead of [????????] in order to scan all items in the first dimension of the array?
Notes
I'm not entirely sure that chunks[][1] even close to something I need.
All I need is to test a row, whether its chunks column contains a two dimensional array, that has in any of its tuples some specific values.

May be there's better alternative, but this might do - you just go over first dimension of each array and testing your condition:
select *
from highlights as h
where
exists (
select
from generate_series(1, array_length(h.chunks, 1)) as tt(i)
where
-- your condition goes here
h.chunks[tt.i][1] > 10 and h.chunks[tt.i][3] < 20
)
db<>fiddle demo
update as #arie-r pointed out, it'd be better to use generate_subscripts function:
select *
from highlights as h
where
exists (
select *
from generate_subscripts(h.chunks, 1) as tt(i)
where
h.chunks[tt.i][3] = 6
)
db<>fiddle demo

Removing Duplicates From Multiple Unique Columns

I am accessing table that takes in every encounter between two vehicles (i do not have permissions to change this table). When an encounter occurs, it'll take in one row for each perspective of the encounter- Vehicle X encountered Vehicle Y and another row for Vehicle Y encountered Vehicle X. Here's some sample data:
Location Vehicle1 Vehicle2
103923 5594800 54114
105938 40547 1855442
103923 2588603 5659158
103923 54114 5594800
103923 5659158 2588603
105938 1855442 40547
There are no duplicates in any row, values are all unique. But every value in Vehicle1 exists in vehicle2. How would i get it so only one of each pair exists?

GREATEST and LEAST functions might help.
DELETE ... USING syntax
DELETE
FROM t a USING
( SELECT location,
greatest(Vehicle1 , Vehicle2) as vehicle1,
least(Vehicle1 , Vehicle2) as vehicle2
FROM t
GROUP BY 1,2,3 HAVING COUNT(*) > 1 ) b
WHERE a.location = b.location
AND a.Vehicle1 = b.Vehicle1
AND a.Vehicle2 = b.Vehicle2;

How to do a one-to-many join with conditions in posgreql

Forgive me, I don't know how to ask this question and google for an answer. May have already been answered elsewhere on Stack, let me know if it is.
I want to use postgresql to join one table, Table A, with Table B such that the values in one set of columns in Table A are joined and multiplied (one-to-many join) by the corresponding values in a set of columns in Table B, based on whether the values in the set of columns in Table B are within the range of the values in the set of columns in Table A.
Basically:
Where Start_A >= Start_B AND End_A <= End_B
Like so:

i think this can help you. But in your quest and result where you question "Basically: Where Start_A >= Start_B AND End_A <= End_B", I think this your mistake because in result i saw Start_A <= Start_B AND End_A >= End_B. And id write the query for you:
SELECT *
FROM a LEFT JOIN b ON startA <= startB
WHERE endA >= endB

Min value with GROUP BY in Power BI Desktop

id datetime new_column datetime_rankx
1 12.01.2015 18:10:10 12.01.2015 18:10:10 1
2 03.12.2014 14:44:57 03.12.2014 14:44:57 1
2 21.11.2015 11:11:11 03.12.2014 14:44:57 2
3 01.01.2011 12:12:12 01.01.2011 12:12:12 1
3 02.02.2012 13:13:13 01.01.2011 12:12:12 2
3 03.03.2013 14:14:14 01.01.2011 12:12:12 3
I want to make new column, which will have minimum datetime value for each row in group by id.
How could I do it in Power BI desktop using DAX query?

Use this expression:
NewColumn =
CALCULATE(
MIN(
Table[datetime]),
FILTER(Table,Table[id]=EARLIER(Table[id])
)
)
In Power BI using a table with your data it will produce this:
UPDATE: Explanation and EARLIER function usage.
Basically, EARLIER function will give you access to values of different row context.
When you use CALCULATE function it creates a row context of the whole table, theoretically it iterates over every table row. The same happens when you use FILTER function it will iterate on the whole table and evaluate every row against the filter condition.
So far we have two row contexts, the row context created by CALCULATE and the row context created by FILTER. Note FILTER use the EARLIER to get access to the CALCULATE's row context. Having said that, in our case for every row in the outer (CALCULATE's row context) the FILTER returns a set of rows that correspond to the current id in the outer context.
If you have a programming background it could give you some sense. It is similar to a nested loop.
Hope this Python code points the main idea behind this:
outer_context = ['row1','row2','row3','row4']
inner_context = ['row1','row2','row3','row4']
for outer_row in outer_context:
for inner_row in inner_context:
if inner_row == outer_row: #this line is what the FILTER and EARLIER do
#Calculate the min datetime using the filtered rows
...
...
UPDATE 2: Adding a ranking column.
To get the desired rank you can use this expression:
RankColumn =
RANKX(
CALCULATETABLE(Table,ALLEXCEPT(Table,Table[id]))
,Table[datetime]
,Hoja1[datetime]
,1
)
This is the table with the rank column:
Let me know if this helps.

filtering on a range of values in a db column with sqlalchemy orm

I have a postgresql database and in one particular table, with many rows. One column in this table, called data, is a float array, REAL[], and gets filled with an array of ~4500 elements. I want to access this table through some query via SQLAlchemy and the ORM.
How do I select all rows in the table where a subset of this column satisfies some condition, e.g.contains a range of values? Like I want to select all rows where the data contains values >= 10, or values between >=10 and <=20.
Can I do this with a straight session query like
rows = session.query(Table).filter(Table.data.(some conditional)).all()
where my conditional is something like "VALUES >= 10 and VALUES <= 20"?
Or do I need to define some special methods, or setup, when I'm defining my SQLAlchemy table class. For example, I have my table set up as
class Table(Base):
__tablename__ = 'table'
__table_args__ = {'autoload' : True, 'schema' : 'testdb', 'extend_existing':True}
data = deferred(Column(ARRAY(Float)))
def __repr__(self):
return '<Table (pk={0})>'.format(self.pk)
Ideally I'd like to set it up so I can just do simple filtering in my session.query calls. Is this possible? I'm not super familiar with the ORM, so maybe it is?
I've had a look at the ARRAY Comparator sqlalchemy docs but those only seem to work on exact values. My data is precise to 6 sigfigs, and I don't know the exact values ahead of time.
What's the best way to do this? Thanks.
EDIT:
Based on the below comment, here is the code I'm using in attempting to select all rows (out of 1000) that have data (from 1 column) >= 1.0. There should be 537 rows.
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).all()
This gives the correct subset number. len(rows) = 537. However, I don't understand the logic of with this operator, where to select data >=1.0 , I use the le operator? Also, along those same lines, there should be 234 rows that have data between the values >=1.0 and <1.0, but this statement fails to give the correct subset..
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).filter(datadb.Table.data.any(1.2,operator=operators.ge)).all()
* EDIT 2 *
Here's an example of my database Table with a few rows. pk is an integer, and data is a real[].
db datadb
schema Table
pk data
0 [0.0,0.0,0.5,0.3,1.3,1.9,0.3,0.0,0.0]
1 [0.1,0.0,1.0,0.7,1.1,1.5,1.2,0.3,1.4]
2 [0.0,0.6,0.4,0.3,1.6,1.7,0.4,1.3,0.0]
3 [0.0,0.1,0.2,0.4,1.0,1.1,1.2,0.9,0.0]
4 [0.0,0.0,0.5,0.3,0.2,0.1,0.7,0.3,0.1]
I have 5 rows, 4 of them have data with values >= 1.0, while just 2 have values in the range >= 1.0 and <= 1.2. The query I would do to grab the rows is in the first case
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).all()
This should return the 4 rows, at pk=0,1,2,3. This query does what I expect. The second case
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).filter(datadb.Table.data.any(1.2,operator=operators.ge)).all()
and should return the 2 rows at pk=1,3. However this query just returns the 4 rows from the first query. For the second query, I also tried
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le),datadb.Table.data.any(1.2,operator=operators.ge)).all()
which also didn't work.

Please read documentation on ARRAY.Comparator, according to which you should be able to do the following:
rows = (session.query(Table)
.filter(Table.data.any(10, operator=operators.le))
.filter(Table.data.any(20, operator=operators.ge)
).all()
EDIT:
# combined filter does not work,
# but applying one or the other is still useful as it reduces the result set
q = (session.query(MyTable)
.filter(MyTable.data.any(1.0, operator=operators.le))
# .filter(MyTable.data.any(1.2, operator=operators.ge))
)
# filter in memory
items = [_row for _row in q.all()
if any(1.0 <= item <= 1.2 for item in _row.data)]
for item in items:
print(item)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Filter Table by Range - merge

Related

Query table by a value in the second dimension of a two dimensional array column

Removing Duplicates From Multiple Unique Columns

How to do a one-to-many join with conditions in posgreql

Min value with GROUP BY in Power BI Desktop

filtering on a range of values in a db column with sqlalchemy orm

Categories

Resources