Incredibly slow Materialized View creation when using string aggregation, any performance suggestions? - oracle10g

I've got a load of materialized views, some of them take just a few seconds to create and refresh, whereas others can take me up to 40 minutes to compile, if SQLDeveloper doesn't crash before that.
I need to aggregate some strings in my query, and I have the following function
create or replace
function stragg
( input varchar2 )
return varchar2
deterministic
parallel_enable
aggregate using stragg_type
;
Then, in my MV I use a select statement such as
SELECT
hse.refno,
STRAGG (DISTINCT per.person_name) as PERSONS
FROM
HOUSES hse,
PERSONS per
This is great, because it gives me the following :
refno persons
1 Dave, John, Mary
2 Jack, Jill
Instead of :
refno persons
1 Dave
1 John
1 Mary
2 Jack
2 Jill
It seems that when I use this STRAGG function, the time it takes to create/refresh an MV increases dramatically. Is there an alternative method to achieve a comma separate list of values? I use this throughout my MVs so it is quite a required feature for me
Thanks

There are a number of techniques for string aggregation at the link below. They might provide better performance for you.
http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php

Related

Changing a functional qSQL query to involve multiple columns in calculation KDB+/Q

I have a ? exec query like so:
t:([]Quantity: 1 2 3;Price 4 5 6;date:2020.01.01 2020.01.02 2020.01.03);
?[t;enlist(within;`date;(2020.01.01,2020.01.02));0b;(enlist `Quantity)!enlist (sum;(`Quantity))]
to get me the sum of the Quantity in the given date range. I want to adjust this to get me the sum of the Notional in the date range; Quantity*Price. So the result should be (1x4)+(2x5)=14.
I tried things like the following
?[t;enlist(within;`date;(2020.01.01,2020.01.02));0b;(enlist `Quantity)!enlist (sum;(`Price*`Quantity))]
but couldn't get it to work. Any advice would be greatly appreciated!
I would advise in such a scenario to think about the qSql style query that you are looking for and then work from there.
So in this case you are looking, I believe, to do something like:
select sum Quantity*Price from t where date within 2020.01.01 2020.01.02
You can then run parse on this to break it into its function form i.e the ? exec query you refer to.
q)parse"select sum Quantity*Price from t where date within 2020.01.01 2020.01.02"
?
`t
,,(within;`date;2020.01.01 2020.01.02)
0b
(,`Quantity)!,(sum;(*;`Quantity;`Price))
This is your functional form that you need; table, where clause, by and aggregation.
You can see your quantity here is just the sum of the multiplication of the two columns.
q)?[t;enlist(within;`date;(2020.01.01;2020.01.02));0b;enlist[`Quantity]!enlist(sum;(*;`Quantity;`Price))]
Quantity
--------
14
You could also extend this to change the column as necessary and create a function for it too, if you so wish:
q)calcNtnl:{[sd;ed] ?[t;enlist(within;`date;(sd;ed));0b;enlist[`Quantity]!enlist(sum;(*;`Quantity;`Price))]}
q)calcNtnl[2020.01.01;2020.01.02]
Quantity
--------
14

How to select distinct combinations in T-SQL

I'm using SQL in Devexpress dashboard designer. I want to select distinct combinations of two parameters.
Perhaps Devexpress uses Transact-SQL but at the same time GROUP BY clause never works for me.
At the same time DISTINCT BY somehowe doesn't work as well.
Example:
There are two IDs 11 and 22
And there are two values of Date for 11, as an example: 21.01.2000 and 22.01.2000. And there's one for 22 as an example: 23.05.2008
Problem here is that I can't coose DISTINCT by date because there are many other IDs which have the same dates.
So I expect to have one distinct combination of ID and Date.
Does anyone faced with the same problem, can you advice any solution / code example?
Using select distinct will filter duplicates if you leave unique row properties out of the selected fields.
so:
Mike Smit
Mike Smit
Will be reduced to
Mike Smit
But if you're also asking for a PK like a Id field you get the following because id makes both rows distinct
1 Mike Smit
2 Mike Smit
Does this help?

How to use a (repeating) aggregate function value with other columns from the table I use the aggregate function on

Problem: I have to count the number of times a certain user has a certificate and then return the users name, his number of certificates and the difference between the maximum number of certificates across all users and this specific users number of certificates. I succeeded in the first part (getting the number of certificates) which I'll denote as $query$ (because I have a feeling my problem has something to do with aliasing).
So $query$ looks like this:
User |N_Certificates
Geoff 4
Ann 2
Lisa 0
And my end result should look like this:
User |N_Certificates |Difference
Geoff 4 0
Ann 2 2
Lisa 0 4
I tried this query:
SELECT Sub.name, Sub.N_Certificates,
MAX(Sub.N_Certificates)- Sub.Certificates AS Difference FROM ($_query_$) AS SUB
but it returned a error (because I was trying to use an aggregate function in combination with a column I was not grouping by) or a wrong result (notably, difference=0 for all columns).
I tried a contraption with INNER JOIN on another version of sub (same $query$ code with another alias) but it also didn't work (same reason). I could ofcourse hard code the max but I don't think that's a good solution. My about screen tells me I'm using version 1.18 of pg_Admin.
You can't do it in this way, SQL syntax doesn't allow this.
The easiest way is to use a subquery:
SELECT Sub.name, Sub.N_Certificates,
(SELECT MAX(Sub.N_Certificates) FROM ($_query_$))
-
Sub.Certificates AS Difference
FROM ($_query_$) AS SUB
You can also use a common table expression:
WITH some_alias AS(
SELECT * FROM ($_query_$)
)
SELECT name, N_Certificates.
(SELECT MAX(N_Certificates)FROM some_alias)
-
Certificates AS Difference
FROM some_alias
And you can use a windows function: http://www.postgresql.org/docs/9.1/static/tutorial-window.html
SELECT Sub.name, Sub.N_Certificates,
MAX(Sub.N_Certificates) OVER ()
-
Sub.Certificates AS Difference
FROM ($_query_$) AS SUB

Map Reduce for analyzing time series

I am new to map reduce concept and wonder if the following problem can be solved using it.
We have a log of data in the form like this:
TransID Date Operation DocumentID User
1 01/01/2010 Open aaa Anne
2 01/11/2010 Close aaa Anne
3 01/12/2010 Open bbb Mary
4 01/12/2010 Close bbb Mary
We want to be able to calculate different time metrics, such as:
How much time passes between Open and Close operations average globally? or
How much time passes between Open and Close average per each user?
Is there a simple way to achieve this with map-reduce? We are considering MongoDB or Hadoop.
The amount of data can be large - billions of records. Thanks!
The trick here is you need to "flatten" your data during the map phase and send that to the reducer for your calculation. So your key would be DocumentID (and maybe User depending on your use case) and then the value is the time and operation (put time first if it sorts better that way). In your reducer the rows above would only results in rows being able to loop through within a key. Here is an example of something very similar https://allthingshadoop.com/2011/12/16/simple-hadoop-streaming-tutorial-using-joins-and-keys-with-python/

SQL Server 2008: Pivot column with no aggregate function workaround

Yes I know, this question has been asked MANY times but after reading all the posts I found that there wasn't an answer that fits my need. So, Heres my question. I would like to take a column of values and pivot them into rows of 6 columns.
I want to take this...... And turn it into this.......................
G Letter Date Code Ammount Name Account
081278 G 081278 12 00123535 John Doe 123456
12
00123535
John Doe
123456
I have 110000 values in this one column in one table called TempTable. I need all the values displayed because each row is an entity to itself. For instance, There is one unique entry for all of the Letter, Date, Code, Ammount, Name, and Account columns. I understand that the aggregate function is required but is there a workaround that will allow me to get this desired result?
Just use a MAX aggregate
If one row = one column (per group of 6 rows) then MAX of a single value = that row value.
However, the data you've posted in insufficient. I don't see anything to:
associate the 6 rows per group
distinguish whether a row is "Letter" or "Name"
There is no implicit row order or number to rely upon to generate the groups
Unfortunately, the max columns in a SQL 2008 select statement is 4,096 as per MSDN Max Capacity.
Instead of using a pivot, you might consider dynamic SQL to get what you want to do.
Declare #SQLColumns nvarchar(max),#SQL nvarchar(max)
select #SQLColumns=(select '''+ColName+'''',' from TableName for XML Path(''))
set #SQLColumns=left(#SQLColumns,len(#SQLColumns)-1)
set #SQL='Select '+#SQLColumns
exec sp_ExecuteSQL #SQL,N''