Optimize SQLite query that has multiple parameters using FMDB - iphone

I have a query that starts pretty simple:
SELECT zudf16, zudf17, zudf18, zudf19, zudf20, zitemid, zItemName, zPhotoName, zBasePrice FROM zskus WHERE zmanufacturerid=? AND zudf16=? ORDER BY zItemID
As choices are made I build on it:
SELECT zudf16, zudf17, zudf18, zudf19, zudf20, zitemid, zItemName, zPhotoName, zBasePrice FROM zskus WHERE zmanufacturerid=? AND zudf16=? AND zudf19=? AND zudf20=? ORDER BY zItemID
For each parameter I add the query drops in performance, especially on iPad1. Each of these columns have an index on them.
How can I increase the performance of this query as it get's more targeted rather than lose performance?
This is across around 60,000 records that match the initial ManufacturerID but with a total of about 200,000 rows total in the database.

Related

DB2 Optimize for n rows

I'm learning DB2, and I came across this clause: OPTIMIZE FOR 1 ROW right after FETCH FIRST 100 ROWS ONLY.
I understand that FETCH FIRST 100 ROWS ONLY would give me the first 100 rows that qualified. But I don't understand what the OPTIMIZE FOR 1 ROW really doing here. I read this DB2 documentation, it says
Use OPTIMIZE FOR 1 ROW clause to influence the access path. OPTIMIZE FOR 1 ROW tells Db2 to select an access path that returns the first qualifying row quickly.
and this DB2 documentation, it says
In general, if you are retrieving only a few rows, specify OPTIMIZE FOR 1 ROW to influence the access path that Db2 selects.
But I'm still confused. Is using OPTIMIZE FOR n ROWS would make a query more efficient?
I also found this post on SO and it seems like OPTIMIZE FOR n ROWS is equivalent to FETCH FIRST n ROWS ONLY per the accepted answer.
But when I experimented it myself using OPTIMIZE FOR n ROWS instead of FETCH FIRST n ROWS ONLY, the result set was not the same. With OPTIMIZE FOR n ROWS, the query returns all qualifying rows.
Could someone please explain it to me what OPTIMIZE FOR n ROWS really does? Thanks!
Is using OPTIMIZE FOR n ROWS would make a query more efficient?
Not necessarily. However, it might cause your application to start receiving rows earlier than it otherwise would, if there is an access plan alternative that can find the first row matching the query criteria faster although the entire query will as a result run longer.
There's this bit in the Db2 for LUW docs that gives some examples specific to that platform:
Try specifying OPTIMIZE FOR n ROWS along with FETCH FIRST n ROWS ONLY, to encourage query access plans that return rows directly from the referenced tables, without first performing a buffering operation such as inserting into a temporary table, sorting, or inserting into a hash join hash table.
Applications that specify OPTIMIZE FOR n ROWS to encourage query access plans that avoid buffering operations, yet retrieve the entire result set, might experience poor performance. This is because the query access plan that returns the first n rows fastest might not be the best query access plan if the entire result set is being retrieved.

Statistics of all/many tables in FileMaker

I'm writing a kind of summary page for my FileMaker solution.
For this, I have define a "statistics" table, which uses formula fields with ExecuteSQL to gather info from most tables, such as number of records, recently changed records, etc.
This strangely takes a long time - around 10 seconds when I have a total of about 20k records in about 10 tables. The same SQL on any database system shouldn't take more than some fractions of a second.
What could the reason be, what can I do about it and where can I start debugging to figure out what's causing all this time?
The actual code is, like this:
SQLAusführen ( "SELECT COUNT(*) FROM " & _Stats::Table ; "" ; "" )
SQLAusführen ( "SELECT SUM(\"some_field_name\") FROM " & _Stats::Table ; "" ; "" )
Where "_Stats" is my statistics table, and it has a string field "Table" where I store the name of the other tables.
So each row in this _Stats table should have the stats for the table named in the "Table" field.
Update: I'm not using FileMaker server, this is a standalone client application.
We can definitely talk about why it may be slow. Usually this has mostly to do with the size and complexity of your schema. That is "usually", as you have found.
Can you instead use the DDR ( database design report ) instead? Much will depend on what you are actually doing with this data. Tools like FMPerception also will give you many of the stats you are looking for. Again, depends on what you are doing with it.
Also, can you post your actual calculation? Is the statistic table using unstored calculations? Is the statistics table related to any of the other tables? These are a couple things that will affect how ExecuteSQL performs.
One thing to keep in mind, whether ExecuteSQL, a Perform Find, or relationship, it's all the same basic query under-the-hood. So if it would be slow doing it one way, it's going to likely be slow with any other directly related approach.
Taking these one at a time:
All records count.
Placing an unstored calc in the target table allows you to get the count of the records through the relationship, without triggering a transfer of all records to the client. You can get the value from the first record in the relationship. Super light way to get that info vs using Count which requires FileMaker to touch every record on the other side.
Sum of Records Matching a Value.
using a field on the _Stats table with a relationship to the target table will reduce how much work FileMaker has to do to give you an answer.
Then having a Summary field in the target table so sum the records may prove to be more efficient than using an aggregate function. The summary field will also only sum the records that match the relationship. ( just don't show that field on any of your layouts if you don't need it )
ExecuteSQL is fastest when it can just rely on a simple index lookup. Once you get outside of that, it's primarily about testing to find the sweet-spot. Typically, I will use ExecuteSQL for retrieving either a JSON object from a user table, or verifying a single field value. Once you get into sorting and aggregate functions, you step outside of the optimizations of the function.
Also note, if you have an open record ( that means you as the current user ), FileMaker Server doesn't know what data you have on the client side, and so it sends ALL of the records. That's why I asked if you were using unstored calcs with ExecuteSQL. It can seem slow when you can't control when the calculations fire. Often I will put the updating of that data into a scheduled script.

Data Lake Analytics - Large vertex query

I have a simple query which make a GROUP BY using two fields:
#facturas =
SELECT a.CodFactura,
Convert.ToInt32(a.Fecha.ToString("yyyyMMdd")) AS DateKey,
SUM(a.Consumo) AS Consumo
FROM #table_facturas AS a
GROUP BY a.CodFactura, a.DateKey;
#table_facturas has 4100 rows but query takes several minutes to finish. Seeing the graph explorer I see it uses 2500 vertices because I'm having 2500 CodFactura+DateKey unique rows. I don't know if it normal ADAL behaviour. Is there any way to reduce the vertices number and execute this query faster?
First: I am not sure your query actually will compile. You would need the Convert expression in your GROUP BY or do it in a previous SELECT statement.
Secondly: In order to answer your question, we would need to know how the full query is defined. Where does #table_facturas come from? How was it produced?
Without this information, I can only give some wild speculative guesses:
If #table_facturas is coming from an actual U-SQL Table, your table is over partitioned/fragmented. This could be because:
you inserted a lot of data originally with a distribution on the grouping columns and you either have a predicate that reduces the number of rows per partition and/or you do not have uptodate statistics (run CREATE STATISTICS on the columns).
you did a lot of INSERT statements, each inserting a small number of rows into the table, thus creating a big number of individual files. This will "scale-out" the processing as well. Use ALTER TABLE REBUILD to recompact.
If it is coming from a fileset, you may have too many small files in the input. See if you can merge them into less, larger files.
You can also try to hint a small number of rows in your query that creates #table_facturas if the above does not help by adding OPTION(ROWCOUNT=4000).

Redshift table design for efficiency

I have a redshift cluster with a single dc1.large node. I've got data writing into it, on order of 50 million records a day, in the format of a timestamp, a user ID and an item ID. The item ID (varchar) is unique, the user ID (varchar) is not, and the timestamp (timestamp) is not.
In my redshift DB of about 110m records, if I have a table with no sort key, it takes about 30 seconds to search for a single item ID.
If I have a table with a sort key on item ID, I get a single item ID search time of about 14-16 seconds.
If I have a table with an interleved sort key with all three columns, the single item ID search time is still 14-16 seconds.
What I'm hoping to achieve is the ability to query for the records of thousands or tens of thousands of item IDs on order of a second.
The query just looks like
select count(*) from rs_table where itemid = 'id123';
or
select count(*) from rs_table where itemid in ('id123','id124','id125');
This query comes back in 541ms
select count(*) from rs_table;
AWS documentation suggests that there is a compile time for queries the first time they're run, but I don't think that's what I'm seeing (and it would be not ideal if it was, since each unique set of 10,000 IDs might never be queried in exactly the same order again.
I have to assume I'm doing something wrong with either the sort key design, the query, or some combination of the two - for only ~10g of table space, something like redshift shouldn't take this long to query, right?
Josh,
We probably need a few additional pieces of information to give you a good recommendation.
Here are some things to start thinking about.
Are most of your queries record lookups as you describe above?
What is your distribution key?
Do you join this table with other large fact tables?
If you load 50M records per day and you only have 110M records in the
table, does that mean that you only store 2 days?
Do you do massive deletes and then load another 50M records per day?
Do you run ANALYZE after your loads?
If you deleted a large number of records, did you run VACUUM?
If all of your queries are similar to the ones that you describe, why are you using Redshift? Amazon DynamoDB or MongoDB (even Cassandra) would be great database choices for the types of queries that you describe.
If you run analytical workloads Redshift is an excellent platform. If you are more interested in "record lookups" the NoSQL options, as well as mysql or MariaDB might give you better performance.
Also, if this is a dev/test environment and you have loaded and deleted large amounts of data without ever running a VACUUM you would see significant performance degradation.

SQL Query with a little bit more complicated WHERE clause performance issue

I have some performance issue in this case:
Very simplified query:
SELECT COUNT(*) FROM Items WHERE ConditionA OR ConditionB OR ConditionC OR ...
Simply I have to determine how many Items the user has access through some complicated conditions.
When there is a large number of records (100,000+) in the Items table and say ~10 complicated conditions concatenated in WHERE clause, I get the result about 2 seconds in my case. The problem is when a very few conditions are met, f.e. when I get only 10 Items from 100,000.
How can I improve the performace in this "Get my items" case?
Additional information:
the query is generated by EF 6.1
MS SQL 2012 Express
SQL Execution Plan
Add an additional table to your schema. Instead of building a long query, insert the value for each condition into this new table, along with a key for that user/session. Then, JOIN the two tables together.
This should perform much better, because it will allow the database engine to make better use of indexes on your Items table.
Additionally, this will position you to eventually pre-define sets of permissions for your users, such that you don't need to insert them right at the moment you do the check. The permission sets will already be in the table, and the new table can also be indexed, which will improve performance further.