Inefficient Azure mobile apps query consuming lots of Azure SQL DTUs - azure-mobile-services

I have an application that uses the Azure Mobile Apps .NET backend and Windows Client. One of the tables being synced between the backend and client has 30,000 rows. When syncing this table, the DTU spikes to around 60% of my 100 DTU (S3) tier, which is very bad. The monitoring graph looks like this:
The table controller for this table is pretty basic:
public async Task<IQueryable<MyBigTable>> GetMyBigTable()
{
return Query();
}
The following is the SQL generated by Azure Mobile Apps:
exec sp_executesql N'SELECT TOP (51)
[Project2].[Field1] AS [Field1],
[Project2].[C1] AS [C1],
[Project2].[C2] AS [C2],
[Project2].[Deleted] AS [Deleted],
[Project2].[C3] AS [C3],
...
FROM ( SELECT
[Project1].[Id] AS [Id],
...
FROM ( SELECT
[Extent1].[Id] AS [Id],
...
FROM [dbo].[MyBigTable] AS [Extent1]
WHERE ([Extent1].[UpdatedAt] >= #p__linq__0)
) AS [Project1]
ORDER BY [Project1].[UpdatedAt] ASC, [Project1].[Id] ASC
OFFSET #p__linq__1 ROWS FETCH NEXT #p__linq__2 ROWS ONLY
) AS [Project2]
ORDER BY [Project2].[UpdatedAt] ASC, [Project2].[Id] ASC',N'#p__linq__0 datetimeoffset(7),#p__linq__1 int,#p__linq__2 int',#p__linq__0='2017-02-28 03:48:49.4840000 +00:00',#p__linq__1=0,#p__linq__2=50
... not very pretty, and far too complicated for its ultimate purpose, but that's another matter. The problem, I think, which explains the shape of the graph, is that the inner SQL aliased as [Project1] always returns all rows from [UpdatedAt] till the end of the table. This inner SQL would have been the ideal place to put a TOP 51 clause, but instead it's found in the outer SQLs.
So, although each paging call from the client returns only 50 rows to the client, the first call causes the inner SQL to return all rows; the next call, 50 rows less than the previous, and so on. This, I think, explains the shape of the graph.
Is there any way to influence how the SQL is generated, or even override the SQL with my own? Does this mean that I need to extract the OData query? What is the best way to do this?

Related

Dynamic FROM clause in Postgres

Using PostgreSQL 9.1.13 I've written the followed query to calculate some data:
WITH windowed AS (
SELECT a.person_id, a.category_id,
CAST(dense_rank() OVER w AS float) / COUNT(*) OVER (ORDER BY category_id) * 100.0 AS percentile
FROM (
SELECT DISTINCT ON (person_id, category_id) *
FROM performances s
-- Want to insert a FROM clause here
INNER JOIN person p ON s.person_id = p.ident
ORDER BY person_id, category_id, created DESC
) a
WINDOW w AS (PARTITION BY category_id ORDER BY score)
)
SELECT category_id,percentile FROM windowed
WHERE person_id = 1;
I now want to turn this into a stored procedure but my issue is that in the middle there, where I showed the comment, I need to place a dynamic WHERE clause. For example, I'd like to add something like:
WHERE p.weight > 110 OR p.weight IS NULL
The calling application let's people pick filters and so I want to be able to pass the appropriate filters into the query. There could be 0 or many filters, depending on the caller, but I could pass it all in as a properly formatted where clause as a string parameter, for example.
The calling application just sends values to a webservice, which then builds the string and calls the stored procedure, so SQL injection attacks won't really be an issue.
The calling application just sends values to a webservice, which then
builds the string and calls the stored procedure, so SQL injection
attacks won't really be an issue.
Too many cooks spoil the broth.
Either let your webserive build the SQL statement or let Postgres do it. Don't use both on the same query. That leaves two possible weak spots for SQL injection attacks and makes debugging and maintenance a lot harder.
Here is full code example for a plpgsql function that builds and executes an SQL statement dynamically while making SQL injection impossible (just from two days ago):
Robust approach for building SQL queries programmatically
Details heavily depend on exact requirements.

Does OrderBy followed by Skip and Take compile into query or run in memory?

Say I have a query like this one:
var result=collection.OrderBy(orderingFunction).Skip(start).Take(length);
Will the whole query run on SQL server and return the result, or will it return the whole ordered table and then run Skip and Take in memory? I am concerned because I noticed that OrderBy returns IOrderedEnumerable.
How about something like this:
if(orderAscending)
orderedCollection=collection.OrderBy(orderingFunction);
else
orderedCollection=collection.OrderByDescending(orderingFunction);
var result=orderedCollection.Skip(start).Take(length);
Will the Skip and Take part run on Server or in memory in this case?
This query is translated into SQL. An Entity Framework query such as
myTable.OrderBy(row => row.Id).Skip(10).Take(20);
Will produce SQL resembling the following:
SELECT TOP (20) [Extent1].[Id] AS [Id]
FROM ( SELECT [Extent1].[Id], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [my_table] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 10
ORDER BY [Extent1].[Id] ASC
I recommend downloading LinqPad, a utility that allows you to execute EF queries (and other queries), and see the results and the corresponding SQL. It is an invaluable tool in developing high-quality EF queries.
Yes, it does translate to SQL. This is essential for paging.
You can verify this using SQL Profiler.

PostgreSQL Aggregate groups with limit performance

I'm a newbie in PostgreSQL. Is there a way to improve execution time of the following query:
SELECT s.id, s.name, s.url,
(SELECT array_agg(p.url)
FROM (
SELECT url
FROM pages
WHERE site_id = s.id ORDER BY created DESC LIMIT 5
) as p
) as last_pages
FROM sites s
I havn't found how to insert LIMIT clause into aggregate call, as ordering.
There are indexes by created (timestamp) and site_id (integer) in table pages, but the foreign key from sites.id to pages.site_id is absent, unfortunately. The query is intented to return a list of sites with sublists of 5 most recently created pages.
PostgreSQL version is 9.1.5.
You need to start by thinking like the database management system. You also need to think very carefully about what you are asking from the database.
Your fundamental problem here is that you likely have a very large number of separate indexing calls happening here when a sequential scan may be quite a bit faster. Your current query gives very little flexibility to the planner because of the fact that you have subqueries which must be correlated.
A much better way to do this would be with a view (inline or not) and a window function:
SELECT s.id, s.name, s.url, array_agg(p.url)
FROM sites s
JOIN (select site_id, url,
row_number() OVER (partition by site_id order by created desc) as num
from pages) p on s.id = p.site_id
WHERE num <= 5;
This will likely change a very large number of index scans to a single large sequential scan.

Feasibility of recreating complex SQL query in Crystal Reports XI

I have about 10 fairly complex SQL queries on SQL Server 2008 - but the client wants to be able to run them from their internal network (as opposed to from the non-local web app) through Crystal Reports XI.
The client's internal network does not allow us to (a) have write access to their proprietary db, nor (b) allow us to set up an intermediary SQL server (meaning we can not set up stored procedures or other data cleaning).
The SQL contains multiple instances of row_number() over (partition by col1, col2), group by col1, col2 with cube|rollup, and/or (multiple) pivots.
Can this even be done? Everything I've read seems to indicate that this is only feasible via stored procedure and I would still need to pull the data from the proprietary db first.
Following is a stripped back version of one of the queries (eg, JOINs not directly related to functionality, WHERE clauses, and half a dozen columns have been removed)...
select sum(programID)
, sum([a.Asian]) as [Episodes - Asian], sum([b.Asian]) as [Eps w/ Next Svc - Asian], sum([c.Asian])/sum([b.Asian]) as [Avg Days to Next Svc - Asian]
, etc... (repeats for each ethnicity)
from (
select programID, 'a.' + ethnicity as ethnicityA, 'b.' + ethnicity as ethnicityB, 'c.' + ethnicity as ethnicityC
, count(*) as episodes, count(daysToNextService) as episodesWithNextService, sum(daysToNextService) as daysToNextService
from (
select programID, ethnicity, datediff(dateOfDischarge, nextDateOfService) as daysToNextService from (
select t1.userID, t1.programID, t1.ethnicity, t1.dateOfDischarge, t1.dateOfService, min(t2.dateOfService) as nextDateOfService
from TABLE1 as t1 left join TABLE1 as t2
on datediff(d, t1.dateOfService, t2.dateOfService) between 1 and 31 and t1.userID = t2.userID
group by t1.userID, t1.programID, t1.ethnicity, t1.dateOfDischarge, t1.dateOfService
) as a
) as a
group by programID
) as a
pivot (
max(episodes) for ethnicityA in ([A.Asian],[A.Black],[A.Hispanic],[A.Native American],[A.Native Hawaiian/ Pacific Isl.],[A.White],[A.Unknown])
) as pA
pivot (
max(episodesWithNextService) for ethnicityB in ([B.Asian],[B.Black],[B.Hispanic],[B.Native American],[B.Native Hawaiian/ Pacific Isl.],[B.White],[B.Unknown])
) as pB
pivot (
max(daysToNextService) for ethnicityC in ([C.Asian],[C.Black],[C.Hispanic],[C.Native American],[C.Native Hawaiian/ Pacific Isl.],[C.White],[C.Unknown])
) as pC
group by programID with rollup
Sooooooo.... can something like this even be translated into Crystal Reports XI?
Thanks!
When you create your report instead of selecting a table or stored procedure choose add command
This will allow you to put whatever valid TSQL statement in there that you want. Using Common Table Expressions (CTE's) and inline Views I've managed to create some rather large complex statements (excess of 400 lines) against Oracle and SQL Server so it is indeed feasible, however if you use parameters you should consider using sp_executesql you'll have to figure out how to avoid SQL injection.

Report Model Inefficiencies. Need Advice

I'm trying to help my power users have more access to our data so I don't have to interrupt my work (playing Pac-Man) 25 times a day writing Ad Hoc Queries and such.
I'm trying to use Data Source Views, Data Models, and Report Builder 2 and 3 to allow them to have access to cleansed data in which they can safely do their own basic analysis. I want to create generic Report Models covering business processes rather than a specific report model for each ad hoc report they would need.
I have to create the Data Source View (DSV) with a named query because the source database lacks primary keys, but does have unique clustered indexes on identity_columns.
Here's my problem. When I use a relatively simple query like this:
SELECT SOM.FSONO AS SalesNo
, SOM.FCUSTNO AS CustNo
,SLC.fcompany as CustName
, SOM.FCUSTPONO AS CustPONo
, SOM.fsoldby AS SalesPerson
, SOR.FENUMBEr AS ItemNo
, SOR.finumber AS IntItemNo
, SOR.frelease AS Rels
, SOI.fprodcl AS ProdClass
, SOI.fgroup AS GroupCode
, rtrim(SOR.FPARTNO) AS PartNo
, SOR.fpartrev AS PartRev
, cast(SOI.fdesc AS VARCHAR(20)) AS PartDescription
,SOM.forderdate as OrderDate
,SOR.fduedate as DueDate
, SOR.FORDERQTY AS QtyOrd
, SOR.FUNETPRICE AS NetUnitPrice
, (SOR.FORDERQTY * SOR.funetprice) AS NetAmountOrdered
FROM slcdpm SLC inner join
somast SOM on SLC.fcustno = SOM.fcustno
LEFT OUTER JOIN soitem SOI
ON (SOM.fsono = SOI.fsono)
LEFT OUTER JOIN sorels SOR
ON (SOI.fsono = SOR.fsono)
AND (SOI.finumber = SOR.finumber)
Let's assume the user takes the Report Model in Report Builder 3 and only requests SalesNo, PartNo, PartRev, OrderDate, and TotalNetAmount for their dataset.
The SQL Generated to pull that data is:
SET DATEFIRST 7
SELECT
CAST(1 AS BIT) [c0_is_agg],
CAST(1 AS BIT) [c1_is_agg],
CAST(1 AS BIT) [c2_is_agg],
CAST(1 AS BIT) [c3_is_agg],
4 [agg_row_count],
[CustomerSales].[TotalNetAmountOrdered] [TotalNetAmountOrdered],
[CustomerSales].[SalesNo] [SalesNo],
[CustomerSales].[PartNo] [PartNo],
[CustomerSales].[PartRev] [PartRev],
[CustomerSales].[OrderDate] [OrderDate]
FROM
(
SELECT
SUM([CustomerSales].[NetAmountOrdered]) [TotalNetAmountOrdered],
[CustomerSales].[SalesNo] [SalesNo],
[CustomerSales].[PartNo] [PartNo],
[CustomerSales].[PartRev] [PartRev],
[CustomerSales].[OrderDate] [OrderDate]
FROM
(
SELECT SOM.fsono AS SalesNo, SOM.fcustno AS CustNo, SLC.fcompany AS CustName, SOM.fcustpono AS CustPONo, SOM.fsoldby AS SalesPerson,
SOR.fenumber AS ItemNo, SOR.finumber AS IntItemNo, SOR.frelease AS Rels, SOI.fprodcl AS ProdClass, SOI.fgroup AS GroupCode, RTRIM(SOR.fpartno) AS PartNo,
SOR.fpartrev AS PartRev, CAST(SOI.fdesc AS VARCHAR(20)) AS PartDescription, SOM.forderdate AS OrderDate, SOR.fduedate AS DueDate, SOR.forderqty AS QtyOrd,
SOR.funetprice AS NetUnitPrice, SOR.forderqty * SOR.funetprice AS NetAmountOrdered
FROM slcdpm AS SLC INNER JOIN
somast AS SOM ON SLC.fcustno = SOM.fcustno LEFT OUTER JOIN
soitem AS SOI ON SOM.fsono = SOI.fsono LEFT OUTER JOIN
sorels AS SOR ON SOI.fsono = SOR.fsono AND SOI.finumber = SOR.finumber
) [CustomerSales]
WHERE
CAST(1 AS BIT) = 1
GROUP BY
[CustomerSales].[SalesNo], [CustomerSales].[PartNo], [CustomerSales].[PartRev], [CustomerSales].[OrderDate]
) [CustomerSales]
ORDER BY
[SalesNo], [PartNo], [PartRev], [OrderDate]
I would have expected only the fields pulled which the user requests in the report and not every single field in the DSV. Also, if parameters are created which constrain the data such as a beginning and ending date for OrderDate, the full data set is returned anyway.
Am I doing something wrong here?
Is there a better way to approach this?
Do other administrators find themselves with performance issues when using Report Models?
There are sometimes performance issues when dealing with Report Models. This is one of the reasons that report models are not meant for rolling out to all of your users to replace all reports. The queries generated by the semantic query engine behind reports models are not tunable and are often totally NOT the way you yourself woudl write them.
The engine essentially treats the named query as a view, which it expands into the underlying query, just as it would a view. This is often an issue when building a model directly overlaying your database.
The ideal situation, from my perspective, is to have a separate database (datawarehouse possibly) that is preferrably housed on a separate server. This dw would be flattenned out such that you could optimize it for read performance. Then, you could use those tables directly in your data source view and the semantic query engine behind the model should be able to make better queries.
This ideal is often not possible due to economic or other constraints. Could you try having a job more or less ETL from your base tables into a new set of tables that you could optimize for reporting to support your model?