I'm trying to use "New Named Query" to add a table in Data Source View in SSAS. The script is shown as follows:
Declare #AvgInvestment float;
SELECT #AvgInvestment=SUM(Investment)/COUNT(distinct meta_ID)
FROM AAAA
SELECT Player, Investment,
InvestmentRange=
Case When Investment >=0 AND Investment <(#AvgInvestment/3) THEN 1
When Investment >=(#AvgInvestment/3) AND Investment <(4*#AvgInvestment/3) THEN 2
When Investment >=(4*#AvgInvestment/3) AND Investment <(6*#AvgInvestment/3) THEN 3
When Investment >=(2*#AvgInvestment) THEN 4
END
FROM AAAA
However, SSAS does not allow declare variables in SQL Query for DVS. Is there any possible way to modify the SQL statement to have no variables? I tried to replace #AvgInvestment as "SELECT SUM(Investment)/COUNT(distinct meta_ID) FROM AAAA" , but it's not working.
Thanks for any possible solutions!
You can join with your average instead using variable. For example:
SELECT Player, Investment,
InvestmentRange=
Case When Investment >=0 AND Investment <(a.avg/3) THEN 1
When Investment >=(a.avg/3) AND Investment <(4*a.avg/3) THEN 2
When Investment >=(4*a.avg/3) AND Investment <(6*a.avg/3) THEN 3
When Investment >=(2*a.avg) THEN 4
END
FROM AAAA cross join
(select SUM(Investment)/COUNT(distinct meta_ID) as avg from aaaa) a
Works perfectly in Data Source view. Stored procedures won't work there.
Related
Let's say you have a Customer table, a simple customer table with just 4 columns:
customerCode numeric(7,0)
customerName char(50)
customerVATNumber char(11)
customerLocation char(35)
Keep in mind that the customers table contains 3 million rows because there are all the customers of the last 40 years, but the active ones are only 980000.
Suppose we then have a table called Sales structured in this way:
saleID integer
customerCode numeric(7,0)
agentID numeric(6,0)
productID char(2)
dateBeginSale date
dateEndSale date
There are about three and a half million rows in this table (here too we have stuff from 40 years ago), but the current supplies for the various products are a total of one million. The company only sells 4 products. Each customer can purchase up to 4 products with 4 different contracts even from 4 different agents. Most (90%) buy only one, the remaining from two to 4 (those who make the complete assortment are just 4 cats).
I was asked to build a pivot table showing for each customer with it's name and location all the product he purchased and from which agent.
The proposed layout for this pivot table is:
customerCode
customerName
customerLocation
productID1
agentID1
saleID1
dateBeginSale1
dateEndSale1
productID2
agentID2
saleID2
dateBeginSale2
dateEndSale2
productID3
agentID3
saleID3
dateBeginSale3
dateEndSale3
productID4
agentID4
saleID4
dateBeginSale4
dateEndSale4
I built the pivot with a view.
First I created 4 views, one for each product id on the Sales table, also useful for other statistical and reporting purposes
View1 as
customerCode1
productID1
agentID1
saleID1
dateBeginSale1
dateEndSale1
View2 as
customerCode2
productID2
agentID2
saleID2
dateBeginSale2
dateEndSale2
and so on till View4
Then i joined the 4 views with the customer table and created the PivotView i needed.
Now Select * from PivotView works perfectly.
Also Select * from PivotView Where customerLocation='NEW YORK CITY' too.
Any other request, for example: we select and count the customers residing in LOS ANGELES who have purchased the products from the same agent or from different sales agents, literally makes the machine sit down, I see the memory occupation grow (probably due to the construction of some temporary table or view) and often the execution of the query crashes.
However, if I create the same pivot on a table instead of a view the times of the various selections collapse and even if heavy (there are always about a million records to scan to verify the existence of the various conditions) they become acceptable.
For sure i am mistaking something and/or there must to be a better way to achieve the result: having a pivot built on on line data istead of one from data extracted nightly.
I'll be happy to read your comments and suggestion.
I don't clearly understand your data layout and what you need. But I'll say that the usual problem with pivoting data on Db2 for IBM i is that there's no built in way to dynamically pivot the data.
Given that you only have 4 products, the above limitation doesn't really apply.
Your problem would seem to be that by creating 4 views over the same table, you're processing records repeatedly. Instead, try to touch the data one time.
create view PivotSales as
select
customerCode,
-- product 1
max(case productID when '01' then productID end) as productID1,
max(case productID when '01' then agentID end) as agentID1,
max(case productID when '01' then saleID end) as saleID1,
max(case productID when '01' then dateBeginSale end) as dateBeginSale1,
max(case productID when '01' then dateEndSale end) as dateEndSale1,
-- product 2
max(case productID when '02' then productID end) as productID2,
max(case productID when '02' then agentID end) as agentID2,
max(case productID when '02' then saleID end) as saleID2,
max(case productID when '02' then dateBeginSale end) as dateBeginSale2,
max(case productID when '02' then dateEndSale end) as dateEndSale2,
-- repeat for product 3 and 4
from Sales
group by customerCode;
Now you can have a CustomerSales view:
create view CustomerSales as
select *
from Customers join SalesPivot using (customerCode);
Run your queries, using Visual Explain to see what indexes the system suggests are needed. At minimum, you should have an indexes:
Customer (customerCode)
Customer (location, customerCode)
Sales (customerCode)
I suspect that some Encoded Vector Indexes (EVI) over various columns in Sales and Customer would prove helpful. Especially since you mention "counting". An EVI keeps track of the counts of the symbols. So counting is "free". An example:
create encoded vector index customerLocEvi
on Customers (location);
-- this doesn't have to read any rows in customer
select count(*)
from customer
where location = 'LOS ANGELES';
For sure I am mistaking something and/or there must to be a better way
to achieve the result: having a pivot built on on line data istead of
one from data extracted nightly.
Don't be too sure about that. The DB structure that best supports Business Intelligence type queries usually doesn't match the typical transactional data structure. A periodic "extract, transform, load (ETL)" is pretty typical.
For your particular use case, you could turn CustomerSales into a Materalized Query Table (MQT), build some supporting indexes for it and just run queries directly over it. Nightly rebuild would be as simple as REFRESH CustomerSales;
Or if you wanted too, since Db2 for IBM i doesn't support SYSTEM MAINTAINED MQTs, a trigger over Sales could automatically propagate data to CustomerSales instead of rebuilding it nightly.
So I have complicated query, to simplify let it be like
SELECT
t.*,
SUM(a.hours) AS spent_hours
FROM (
SELECT
person.id,
person.name,
person.age,
SUM(contacts.id) AS contact_count
FROM
person
JOIN contacts ON contacts.person_id = person.id
) AS t
JOIN activities AS a ON a.person_id = t.id
GROUP BY t.id
Such query works fine in MySQL, but Postgres needs to know that GROUP BY field is unique, and despite it actually is, in this case I need to GROUP BY all returned fields from returned t table.
I can do that, but I don't believe that will work efficiently with big data.
I can't JOIN with activities directly in first query, as person can have several contacts which will lead query counting hours of activity several time for every joined contact.
Is there a Postgres way to make this query work? Maybe force to treat Postgres t.id as unique or some other solution that will make same in Postgres way?
This query will not work on both database system, there is an aggregate function in the inner query but you are not grouping it(unless you use window functions). Of course there is a special case for MySQL, you can use it with disabling "sql_mode=only_full_group_by". So, MySQL allows this usage because of it' s database engine parameter, but you cannot do that in PostgreSQL.
I knew MySQL allowed indeterminate grouping, but I honestly never knew how it implemented it... it always seemed imprecise to me, conceptually.
So depending on what that means (I'm too lazy to look it up), you might need one of two possible solutions, or maybe a third.
If you intent is to see all rows (perform the aggregate function but not consolidate/group rows), then you want a windowing function, invoked by partition by. Here is a really dumbed down version in your query:
.
SELECT
t.*,
SUM (a.hours) over (partition by t.id) AS spent_hours
FROM t
JOIN activities AS a ON a.person_id = t.id
This means you want all records in table t, not one record per t.id. But each row will also contain a sum of the hours for all values that value of id.
For example the sum column would look like this:
Name Hours Sum Hours
----- ----- ---------
Smith 20 120
Jones 30 30
Smith 100 120
Whereas a group by would have had Smith once and could not have displayed the hours column in detail.
If you really did only want one row per t.id, then Postgres will require you to tell it how to determine which row. In the example above for Smith, do you want to see the 20 or the 100?
There is another possibility, but I think I'll let you reply first. My gut tells me option 1 is what you're after and you want the analytic function.
I have a dataset with over 33 million records that includes a name field. I need to flag records where this name field value also appears in a second dataset that includes about 5 million records. For my purposes, a fuzzy match would be both acceptable and beneficial.
I wrote the following program to do this. It works but has been thus far running for 4 days, so I'd like to find a more efficient way to write it.
proc sql noprint;
create table INDIV_MATCH as
select A.NAME, SPEDIS(A.NAME, B.NAME) as SPEDIS_VALUE, COMPGED(A.NAME,B.NAME) as COMPGED_SCORE
from DATASET1 A join DATASET2 B
on COMPGED(A.NAME, B.NAME) le 400 and SPEDIS(A.NAME, B.NAME) le 10
order by A.name;
quit;
Any help would be much appreciated!
In a table (store100) I have a list of storeIDs that make more than 100sales a day. In another table I have sales. Now what I want to do is for every StoreID in table store100 I want to see how many of product x they sold in the sales table. How do I achieve this? Obviously I don't want to be manually entering the storeIDs all the time so I want it to take all the IDs in the table and compare for sales of x in the sales table.
Table Structre:
store100 table:
ID
lon1
lon2
glas4
edi5
etc
Sales Table:
ID |Location|Product|Quantity|Total Price
lon1 London Wallet 5 50
edi5 Manc Shoes 4 100
So for example I want a query where it takes all the store100 IDs and shows how many wallets they sold.
If anyone has a better idea of achieving please tell me
You will need a join for this
SELECT S100.ID
,S.Product
,S.Quantity
FROM Store100 S100
INNER JOIN Sales S
ON (S100.ID = S.ID)
Of course you will still need your where clause if you need it and modify your select to fit your needs.
I'm a member of a MLM network and I'm also a developer. My question is regarding the database structure to build a MLM software with infinite levels. Example:
Person 1 (6000 people is his network - but only 4 direct linked to him)
How to store that data and query how many points does his network produce?
I could possibly do it use many-to-many relationship, but once we have a lot of users and a huge network, it costs a lot to query and loop through these records.
In any database, if each member of the "tree" has the same properties, it's best to use a self referencing table, especially if each tree has 1 and only 1 direct parent.
IE.
HR
------
ID
first_name
last_name
department_id
sal
boss_hr_id (referneces HR.ID)
Usually the big boss would have a NULL boss_hr_id
To query such a structure, in postgres, you can use CTEs ("with recursive" statement)
For table above, a query like this will work:
with recursive ret(id, first_name, last_name, dept_id,boss_hr_id) as
(
select * from hr
where hr.id=**ID_OF_PERSON_YOU_ARE_QUERYING_STRUCTURE**
union
select hr.id, hr.first_name, hr.last_name,hr.dept_id,hr.boss_hr_id, lev+1 from hr
inner join ret on ret.boss_hr_id=hr.hr_id
)
select * from ret
)