How to group by date and calculate the averages at the same time - date

I am quite new to this, so here it goes: I am trying to convert from unixtime to date format and then group by this by date while calculating the average on another column. This is in MariaDB.
CREATE OR REPLACE
VIEW `history_uint_view` AS select
`history_uint`.`itemid` AS `itemid`,
date(from_unixtime(`history_uint`.`clock`)) AS `mydate`,
AVG(`history_uint`.`value`) AS `value`
from
`history_uint`
where
((month(from_unixtime(`history_uint`.`clock`)) = month((now() - interval 1 month))) and ((`history_uint`.`value` in (1,
0))
and (`history_uint`.`itemid` in (54799, 54810, 54821, 54832, 54843, 54854, 54865, 54876, 54887, 54898, 54909, 54920, 58165, 58226, 59337, 59500, 59503, 59506, 60621, 60624, 60627, 60630, 60633, 60636, 60639, 60642, 60645, 60648, 60651, 60654, 60657, 60660, 60663, 60666, 60669, 60672, 60675, 60678, 60681, 60684, 60687, 60690, 60693, 60696, 60699, 64610)))
GROUP by 'itemid', 'mydate', 'value'

When you select aggregate functions (like AVG) with columns without aggregate functions, you should list all columns but the ones with aggregate function in GROUP BY-clause.
So your group by should look like:
GROUP by itemid, mydate
If you use single quotes (like 'itemid'), MariaDB treats them as strings, not columns.

Related

Create rows from part of column names

Source data
I am working on an ELT project to load data from CSV files into PostgreSQL where I will transform it. The CSV files have many columns that are consistent across files, but also contain activity columns that are inconsistent with names like Date (05/19/2020), Type (05/19/2020), etc.
In the loading script I am merging all of the columns with dates in the column name into one jsonb column so I don't have to constantly add new columns to the raw data table.
The resulting jsonb column in the raw data table looks like this:
id
activity
12345678
{"Date (05/19/2020)": null, "Type (05/19/2020)": null, "Date (06/03/2020)": "06/01/2020", "Type (06/03/2020)": "E"}
98765432
{"Date (05/19/2020)": "05/18/2020", "Type (05/19/2020)": "B", "Date (10/23/2020)": "10/26/2020", "Type (10/23/2020)": "T"}
JSON to columns
Using the amazing create_jsonb_flat_view function from this post I can convert the jsonb to columns like this:
id
Date (05/19/2020)
Type (05/19/2020)
Date (06/03/2020)
Type (06/03/2020)
Type (10/23/2020
Date (10/23/2020)
Type (10/23/2020)
10629465
null
null
06/01/2020
E
98765432
05/18/2020
B
10/26/2020
T
Need to move part of column name to row
Now, this is where I'm stuck. I need to remove the portion of the column name that is the Activity Date (e.g. (05/19/2020)) and create a row for each id and ActivityDate with additional columns for Date and Type like this:
id
ActivityDate
Date
Type
12345678
05/19/2020
null
null
12345678
06/03/2020
06/01/2020
E
98765432
05/19/2020
05/18/2020
B
98765432
10/23/2020
10/26/2020
T
I followed your link to the create_jsonb_flat_view article yesterday and then forgot this question. While I thank you for pointing me there, I think that mentioning it worked against you.
A more conventional approach using regexp_replace() works here. I left the date values as strings, but you can convert them with to_date() if needed:
with parse as (
select id, e.k, e.v,
regexp_replace(e.k, '\s+\([0-9/]{10}\)', '') as k_no_date,
regexp_replace(e.k, '^.+([0-9/]{10}).+', '\1') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;
db<>fiddle here
#Mike-Organek's Answer works beautifully!
However, I was curious if the regexp_replace() calls might be slowing the query down a bit and it seemed I could get the same results using a simpler function.
Since Mike gave me a great example to start with I modified it to split on the space between Date and (05/19/2020).
For 20,000 rows, it went from taking an avg of 7 sec on my local machine to an avg of .9 sec.
Here is the resulting query:
with parse as (
select id, e.k, e.v,
split_part(e.k, ' ', 1) as k_no_date,
trim(split_part(e.k, ' ', 2),'()') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;

How to join 2 tables on fields which have different formats?

I have 2 tables with the following structure:
Table A:
id - number
a_d - text
where A.a_d has the text format: "yyyy-mm-dd 00:00:00" (examples: 2001-08-22 00:00:00, or 2002-03-23 00:00:00)
Table B:
id - number
a_d - text
where B.a_d has the text format: "dd-month-yyyy" (example: 01-jul-2002 or 09-feb-2005)
I want to run join query on the text fields of those table.
select a.a_d
from A a
join B b
on a.a_d =?= b.a_d
I can't change or update the tables, just get data from them
How can I compare this 2 fields, if there have different format ?
Use TO_DATE to convert the text dates into bona fide dates before comparing:
SELECT a.a_d
FROM A a
INNER JOIN B b
ON a.a_d::date = TO_DATE(b.a_d, 'DD-mon-YYYY');
Note that the a_d field in table A happens to be a text timestamp which can already be directly cast to date, so we only need TO_DATE for the B table.
Ideally you should store your dates and timestamps in proper columns rather than text. Then, the join would be possible without costly conversions.

Replacing null values by average of values grouped by concatenated categories in Teradata

Suppose that I have a lot of NULL values (missing values) in a column named 'score'. I want to replace them by a specific average not from all the values of the column 'score' but by groups that I built with a crosscategory from two concatenated categories:
This kind of query works for getting averages by groups:
SELECT
category1 || ' > ' || category2 AS crosscategory,
ROUND(CAST(AVG(score) AS FLOAT), 2) AS score_avg
FROM DatabaseName.TableName
GROUP BY crosscategory
ORDER BY score_avg;
This one works to replace NULL values by a constant:
SELECT
NVL(score, 0) AS score_without_missing_values
FROM DatabaseName.TableName
The problem that I cannot solve now is how to articulate the replacement of NULL values with a constant here the averages computed with the functions AVG and GROUP BY.
Thank you very much for your help!
Seems you want a Group Average:
SELECT
t.*,
coalesce(score, AVG(score) OVER (PARTITION BY category1, category2)) AS score_avg
FROM DatabaseName.TableName AS t
I removed the ROUND/CAST, because AVG returns FLOAT by default and ROUND in probably not needed (if you need it, you might better cast to a DECIMAL).

Use sum function in calculated column

Is it possible to use a sum function in a calculated column?
If yes, I would like to create a calculated column, that calculates the sum of a column in the same table where the date is smaller than the date of this entry. is this possible?
And last, would this optimize repeated calls on this value over the exemplified view below?
SELECT ProductGroup, SalesDate, (
SELECT SUM(Sales)
FROM SomeList
WHERE (ProductGroup= KVU.ProductGroup) AND (SalesDate<= KVU.SalesDate)) AS cumulated
FROM SomeList AS KVU
Is it possible to use a sum function in a calculated column?
Yes, it's possible using a scalar valued function (scalar UDF) for you computed column but this would be a disaster. Using scalar UDFs for computed columns destroy performance. Adding a scalar UDF that accesses data (which would be required here) makes things even worse.
It sounds to me like you just need a good ol' fashioned index to speed things up. First some sample data:
IF OBJECT_ID('dbo.somelist','U') IS NOT NULL DROP TABLE dbo.somelist;
GO
CREATE TABLE dbo.somelist
(
ProductGroup INT NOT NULL,
[Month] TINYINT NOT NULL CHECK ([Month] <= 12),
Sales DECIMAL(10,2) NOT NULL
);
INSERT dbo.somelist
VALUES (1,1,22),(2,1,45),(2,1,25),(2,1,19),(1,2,100),(1,2,200),(2,2,50.55);
and the correct index:
CREATE NONCLUSTERED INDEX nc_somelist ON dbo.somelist(ProductGroup,[Month])
INCLUDE (Sales);
With this index in place this query would be extremely efficient:
SELECT s.ProductGroup, s.[Month], SUM(s.Sales)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
If you needed to get a COUNT by month & product group you could create an indexed view like so:
CREATE VIEW dbo.vw_somelist WITH SCHEMABINDING AS
SELECT s.ProductGroup, s.[Month], TotalSales = COUNT_BIG(*)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
GO
CREATE UNIQUE CLUSTERED INDEX uq_cl__vw_somelist ON dbo.vw_somelist(ProductGroup, [Month]);
Once that indexed view was in place your COUNTs would be pre-aggregated. You cannot, however, include SUM in an indexed view.

How to write date filter in MDX where clause?

I am new to MDX. Could please suggest how to write below T-SQL query in MDX Query language.
T-SQL:
SELECT wp.date,Sum(wp.bbls_oil)
AS BBLSOIL_TOTAL,Sum(wp.bbls_water)
AS BBLSWATER_TOTAL,Sum(wp.mcf_prod)
AS MCF_PROD_TOTAL,Sum(wp.vent_flare)
AS VENT_FLARE_TOTAL
FROM well_prod_bst_horiz_og_2_yrs wp, well_index wi
WHERE wp.fileno = wi.fileno
AND wp.date <= :startDate
AND wp.date >= :endDate
AND wi.apino IN (:wellids)
GROUP BY wp.date ORDER BY wp.date ASC";
In the above query, Start and End date values are supplied dynamically.
Assuming you have measures named BBLSOIL, BBLSWATER, MCF_PROD, and VENT_FLARE_TOTAL and your date attribute is named [Date].[Date], and your :startDate contains [Date].[Date].&[20120101] and your :endDate contains [Date].[Date].&[20141231], and your cube is named Name of your Cube you would write
SELECT {
Measures.[BBLSOIL],
Measures.[BBLSWATER],
Measures.[MCF_PROD],
Measures.[VENT_FLARE_TOTAL]
}
ON COLUMNS,
[Date].[Date].&[20120101] : [Date].[Date].&[20141231]
ON ROWS
FROM [Name of your Cube]
i. e. you put an MDX set containing the list of required measures on the columns axis and you put a range (specified by :) on the rows axis. Aggregations like Sum and GROUP BY are not necessary inn MDX, these are handled by the cube definition.