A query to Find average value for each ranges?

A query to Find average value for each ranges? - postgresql

Here is my table
GID | Distance (KM) | Subdistance (KM) | Iri_avg
-------------------------------------------------
1 | 13.952 | 0 | 0.34
2 | 13.957 | 0.005 | 0.22
3 | 13.962 | 0.010 | 0.33
4 | 13.967 | 0.015 | 0.12
5 | 13.972 | 0.020 | 0.35
...
I would like to find AVG of Iri_avg for each ranges,
for example..
each 5 metre (by default)
each 10 metre
each 100 metre
each 500 metre
What is the PostgreSQL query to solve this problem ?

Your question is unclear. Your data has two distance columns, which one do you mean?
Here is an example of how to get averages based on the subdistance.
select floor(subdistance*1000)/5.0)*5.0 as lower_bound, avg(iri_avg) as avg_ari_avg
from t
group by floor(subdistance*1000)/5.0)*5.0
order by 1
The expression "floor(subdistance*1000)/5.0)*5.0" gets the closest 5 meter increment less than the value. You can replace the "5" with "10" or "100" for other binning.
This is meant as an illustration. It is unclear which column you want to bin, what you want to do about empty bins, and whether you are looking for all bin-widths in a single query versus the query to handle just one bin-width.

Related

Display the 0 in PostgresSQL

I'm trying Display the number of satellites per planet. Satellites with a radius of less than 500 km must be ignored and their matching planets must be displayed with a null number. Sort by number of satellites then by planet name.
SELECT planet.name, COUNT(CASE WHEN satellite.radius > 500
THEN 1 END) AS "number of satellites"
FROM planet
JOIN satellite
ON (satellite.id_planet = planet.id)
GROUP BY planet.name
ORDER BY "number of satellites", planet.name;
When I do that I got
name | number of satellites
---------+----------------------
Earth | 1
Neptune | 1
Jupiter | 4
Uranus | 4
Saturn | 5
But I'm supposed to have this :
planet | number of satellites
---------+----------------------
Mars | 0
Mercury | 0
Venus | 0
Earth | 1
Neptune | 1
Jupiter | 4
Uranus | 4
Saturn | 5
I don't understand why the Zero are not present

Spark PIVOT performance is very slow on High volume Data

I have one dataframe with 3 columns and 20,000 no of rows. i need to be convert all 20,000 transid into column.
table macro:
prodid
transid
flag
A
1
1
B
2
1
C
3
1
so on..
Expected Op be like upto 20,000 no of columns:
prodid
1
2
3
A
1
1
1
B
1
1
1
C
1
1
1
I have tried with PIVOT/transpose function but its taking too long time for high volume data. for processing 20,000 rows to column its taking around 10 hrs.
eg.
val array =a1.select("trans_id").distinct.collect.map(x => x.getString(0)).toSeq
val a2=a1.groupBy("prodid").pivot("trans_id",array).sum("flag")
When i used pivot on 200-300 no of rows then it is working fast but when no of rows increase PIVOT is not good.
can anyone please help me to find out the solution.is there any method to avoid PIVOT function as PIVOT is good for low volume conversion only.How to deal with high volume data.
I need this type of conversion for matrix multiplication.
for matrix multiplication my input be like below table and final results will be in matrix multiplication.
|col1|col2|col3|col4|
|----|----|----|----|
|1 | 0 | 1 | 0 |
|0 | 1 | 0 | 0 |
|1 | 1 | 1 | 1 |

Storing a matrix table in Entity Framework

I'm currently working on a recommendation engine which uses a item-based collaborative filter to recommend restaurants to the user. I want to calculate a similarity between restaurants using an adjusted cosine similarity, which all works fine.
Now I want to store these similarities in the database so I can just retrieve the similarity between subjects from there so I can predict a rating for a subject the user hasn't reviewed yet.
A matrix could look like this: (Where R1 is restaurant 1, R2 is restaurant 2, etc.)
| R1 | R2 | R3 | R4 |
R1 | 1 | 0.75 | 0.64 | 0.23 |
R2 | 0.75 | 1 | 0.45 | 0.98 |
R3 | 0.64 | 0.45 | 1 | 0.36 |
R4 | 0.23 | 0.98 | 0.36 | 1 |
This is a very small version of a matrix, since the amount of restaurants could exceed 20k rows in my database.
What would be the easiest/best way to store this in my database using entity framework? Thanks in advance!

Create table with following columns
MatrixValueId
MatrixId
FirstIndex string, contains values like 'R1','R2' ...
SecondIndex string, contains values like 'R1','R2' ...
Value float, contains values like 1.00, 0.23
Matrix in your example stored within 16 record with same MatrixId

BIRT How to correctly chart date time axis with sparse data

I have a query that returns a count of events on dates over the last year.
e.g.
|Date | ev_count|
------------+----------
|2015-09-23 | 12 |
|2016-01-01 | 56 |
|2016-01-15 | 34 |
|2016-04-08 | 65 |
| ...
I want to produce a graph (date on the X-axis and value on Y) that will either show values for all dates (0 when no data), or at least place the dates where there are values in a correctly scaled place for the date along the time axis.
My current graph has just the values one after another. I have previously used dimple for generating graphs, and if you tell it that it's a time axis, it automagically places dates correctly spaced.
This is what I get
|
| *
| *
| *
|*_______________
9 1 1 4
This is what I want to have
|
| *
| *
| *
|_________*________________________________________
0 0 1 1 1 0 0 0 0 .....
8 9 0 1 2 1 2 3 4
Is there a function/trick in BIRT that will allow me to fill in the gaps with 0 or position/scale the date somehow (e.g. based on a max/min)? Or do I have to join my data with a date generator query to fill in the gaps?

Cognos Calculate Variance Crosstab (Dimensional)

This is very similar to Cognos Calculate Variance Crosstab (Relational), but my data source is dimensional.
I have a simple crosstab such as this:
| 04-13-2013 | 04-13-2014
---------------------------------------
Sold | 75 | 50
Purchased | 10 | 15
Repaired | 33 | 44
Filter: The user selects 1 date and then we include that date plus 1 year ago.
Dimension: The date is the day level in a YQMD Hierarchy.
Measures: We are showing various measures from a Measure Dimension.
Sold
Purchased
Repaired
Here is what is looks like in report studio:
| <#Day#> | <#Day#>
---------------------------------------
<#Sold#> | <#1234#> | <#1234#>
<#Purchased#> | <#1234#> | <#1234#>
<#Repaired#> | <#1234#> | <#1234#>
I want to be able to calculate the variance as a percentage between the two time periods for each measure like this.
| 04-13-2013 | 04-13-2014 | Var. %
-----------------------------------------------
Sold | 75 | 50 | -33%
Purchased | 10 | 15 | 50%
Repaired | 33 | 44 | 33%
I added a Query Expression to the right of the <#Day#> as shown below, but I cannot get the variance calculation to work.
| <#Day#> | <#Variance#>
---------------------------------------
<#Sold#> | <#1234#> | <#1234#>
<#Purchased#> | <#1234#> | <#1234#>
<#Repaired#> | <#1234#> | <#1234#>
These are the expressions I've tried and the results that I get:
An expression that is hard coded works, but only for that 1 measure:
total(case when [date] = 2014-04-13 then [Sold] end)
/
total(case when [date] = 2013-04-13 then [Sold] end)
-1
I thought CurrentMember and PrevMember might work, but it produces blank cells:
CurrentMember( [YQMD Hierarchy] )
/
prevMember(CurrentMember([YQMD Hierarchy]))
-1
I think it is because prevMember produces blank.
prevMember(CurrentMember([YQMD Hierarchy]))
Using only CurrentMember gives a total of both columns:
CurrentMember([YQMD Hierarchy])
What expression can I use to take advantage of my dimensional model and add a column with % variance?
These are the pages I used for research:
Variance reporting in Report Studio on Cognos 8.4?
Calculations that span dimensions - PDF
IBM Cognos 10 Report Studio: Creating Consumer-Friendly Reports

I hope there is a better way to do this. I finally found a resource that describes one approach to this problem. Using the tail and head functions, we can get to the first and last periods, and thereby calculate the % variance.
item(tail(members([Day])),0)
/
item(head(members([Day])),0)
-1
This idea came from IBM Cognos BI – Using Dimensional Functions to Determine Current Period.
Example 2 – Find Current Period by Filtering on Measure Data
If the OLAP or DMR data source has been populated with time periods into the future (e.g. end of year or future years), then the calculation of current period is more complicated. However, it can still be determined by finding the latest period that has data for a given measure.
item(tail(filter(members([sales_and_marketing].[Time].[Time].[Month]),
tuple([Revenue], currentMember([sales_and_marketing].[Time].[Time]))
is not null), 1), 0)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

A query to Find average value for each ranges? - postgresql

Related

Display the 0 in PostgresSQL

Spark PIVOT performance is very slow on High volume Data

Storing a matrix table in Entity Framework

BIRT How to correctly chart date time axis with sparse data

Cognos Calculate Variance Crosstab (Dimensional)

Categories

Resources