Duplicating PostgreSQL's window functions like lag, lead, over - mongodb

How do I change a PostgreSQL query into a mongodb bson call? I have the same use case listed at http://archives.postgresql.org/pgsql-general/2011-10/msg00157.php I would like to calculate the delta time between two log entries by using something like lag or lead. Is there anything similar in mongodb to Postgres' lag / lead syntax?
select
index,
starttime,
endtime,
starttime - lag(endtime) over(order by starttime asc) as delta
from test
http://www.postgresql.org/docs/8.4/static/functions-window.html
I was looking at http://www.mongovue.com/2010/11/03/yet-another-mongodb-map-reduce-tutorial/ and it seems that map / reduce / finalize should do it. Map the id, start and end time, reduce does nothing, then do a inner join on its self (the double fors) during the finalize. I can almost, kind of, sort of, see it...

This is something you'll have to do in your application. Right now, mongoDB doesn't support anything like this.

You can rewrite some of the window functions as subqueries. See if that's possible in the aggregation framework. This subquery should after the filtering and grouping are done.
Couchbase is going to have the standard window functions. https://blog.couchbase.com/on-par-with-window-functions-in-n1ql/

Related

where column in (single value) performance

I am writing dynamic sql code and it would be easier to use a generic where column in (<comma-seperated values>) clause, even when the clause might have 1 term (it will never have 0).
So, does this query:
select * from table where column in (value1)
have any different performance than
select * from table where column=value1
?
All my test result in the same execution plans, but if there is some knowledge/documentation that sets it to stone, it would be helpful.
This might not hold true for each and any RDBMS as well as for each an any query with its specific circumstances.
The engine will translate WHERE id IN(1,2,3) to WHERE id=1 OR id=2 OR id=3.
So your two ways to articulate the predicate will (probably) lead to exactly the same interpretation.
As always: We should not really bother about the way the engine "thinks". This was done pretty well by the developers :-) We tell - through a statement - what we want to get and not how we want to get this.
Some more details here, especially the first part.
I Think this will depend on platform you are using (optimizer of the given SQL engine).
I did a little test using MySQL Server and:
When I query select * from table where id = 1; i get 1 total, Query took 0.0043 seconds
When I query select * from table where id IN (1); i get 1 total, Query took 0.0039 seconds
I know this depends on Server and PC and what.. But The results are very close.
But you have to remember that IN is non-sargable (non search argument able), it will not use the index to resolve the query, = is sargable and support the index..
If you want the best one to use, You should test them in your environment because they both work so good!!

about sqlite offset - What i do not understand?

If you check SELECT sqlite clause: https://www.sqlite.org/lang_select.html
You will see OFFSET demands EXPR. to get correct result from your database query.
And when I went checking what is EXPR. (Please see this: https://www.sqlite.org/syntax/expr.html) i see theoretically there should be a way to express a function after offset. For an example:
select * from my_table limit 50 offset count(id);
Count function would give you numeric value, however we know this is not possible. So my question is: Is there any way to add functions to offset or am I reading things in wrong way from links?
It is possible to use functions in the LIMIT/OFFSET expressions:
SELECT 42 LIMIT length('x') OFFSET round(0.123);
The count() function does not work here because it is an aggregate function, and inside the OFFSET clause, there is no table or group over which it could be applied.
It does not work in general. You have to select your count(id) in an extra query.
For more help look here:
Sqlite LIMIT / OFFSET query

pipelinedb aggregates in where clause

Im am using pipelinedb for testing some analysis of data streams from sensors.
I want to be able, as an example, to find events in a stream that are defined by an aggregate. E.g. find events where the difference between max(temperature) and min(temparature) in the last 5 minutes exceeds a certain range.
When trying to put aggregates in the WHERE clause I get an error message saying something like 'aggregates not allowed in continuous views where clasues'
Am I missing something here or is it just not possible?
Otherwise I like pipelinedb very very much!
Well, pipelinedb says: "continuous queries don't support HAVING clauses".
What I'm trying to do is the following:
I have a stream named geo_vital_stream, which sends some sensor data along with a geolocation. At the moment I am interested
insert into geo_vital_stream (device_id, user_id, measured_at, heartrate, energy, eda, lon, lat) VALUES( 'A005D8-E4 2.0',1,'2015-10-08 15:04:33.134000+02',96.8497201823,351.056269367,0.505791,8.07154018407,52.9531484103 );
My cv looks like this:
CREATE CONTINUOUS VIEW cv_sensor_eda AS
SELECT user_id::integer,
MAX(eda::numeric) - MIN(eda::numeric) as range_eda
FROM geo_vital_stream
WHERE (measured_at > clock_timestamp() - interval '1 minutes')
GROUP BY user_id
Now, I am interested only in those "events", where the range (range_eda execeeds a certain value in the last minute.)
Using an aggregate in a WHERE clause actually isn't legal SQL. That is accomplished using a HAVING clause, but it doesn't seem like that's what you need here. Since aggregates compute values across multiple rows, it's not clear to me how you'd retrieve individual events based on aggregates (min, max) across multiple events. Could you provide an example of what each event looks like?

T-SQL speed comparison between LEFT() vs. LIKE operator

I'm creating result paging based on first letter of certain nvarchar column and not the usual one, that usually pages on number of results.
And I'm not faced with a challenge whether to filter results using LIKE operator or equality (=) operator.
select *
from table
where name like #firstletter + '%'
vs.
select *
from table
where left(name, 1) = #firstletter
I've tried searching the net for speed comparison between the two, but it's hard to find any results, since most search results are related to LEFT JOINs and not LEFT function.
"Left" vs "Like" -- one should always use "Like" when possible where indexes are implemented because "Like" is not a function and therefore can utilize any indexes you may have on the data.
"Left", on the other hand, is function, and therefore cannot make use of indexes. This web page describes the usage differences with some examples. What this means is SQL server has to evaluate the function for every record that's returned.
"Substring" and other similar functions are also culprits.
Your best bet would be to measure the performance on real production data rather than trying to guess (or ask us). That's because performance can sometimes depend on the data you're processing, although in this case it seems unlikely (but I don't know that, hence why you should check).
If this is a query you will be doing a lot, you should consider another (indexed) column which contains the lowercased first letter of name and have it set by an insert/update trigger.
This will, at the cost of a minimal storage increase, make this query blindingly fast:
select * from table where name_first_char_lower = #firstletter
That's because most database are read far more often than written, and this will amortise the cost of the calculation (done only for writes) across all reads.
It introduces redundant data but it's okay to do that for performance as long as you understand (and mitigate, as in this suggestion) the consequences and need the extra performance.
I had a similar question, and ran tests on both. Here is my code.
where (VOUCHER like 'PCNSF%'
or voucher like 'PCLTF%'
or VOUCHER like 'PCACH%'
or VOUCHER like 'PCWP%'
or voucher like 'PCINT%')
Returned 1434 rows in 1 min 51 seconds.
vs
where (LEFT(VOUCHER,5) = 'PCNSF'
or LEFT(VOUCHER,5)='PCLTF'
or LEFT(VOUCHER,5) = 'PCACH'
or LEFT(VOUCHER,4)='PCWP'
or LEFT (VOUCHER,5) ='PCINT')
Returned 1434 rows in 1 min 27 seconds
My data is faster with the left 5. As an aside my overall query does hit some indexes.
I would always suggest to use like operator when the search column contains index. I tested the above query in my production environment with select count(column_name) from table_name where left(column_name,3)='AAA' OR left(column_name,3)= 'ABA' OR ... up to 9 OR clauses. My count displays 7301477 records with 4 secs in left and 1 second in like i.e where column_name like 'AAA%' OR Column_Name like 'ABA%' or ... up to 9 like clauses.
Calling a function in where clause is not a best practice. Refer http://blog.sqlauthority.com/2013/03/12/sql-server-avoid-using-function-in-where-clause-scan-to-seek/
Entity Framework Core users
You can use EF.Functions.Like(columnName, searchString + "%") instead of columnName.startsWith(...) and you'll get just a LIKE function in the generated SQL instead of all this 'LEFT' craziness!
Depending upon your needs you will probably need to preprocess searchString.
See also https://github.com/aspnet/EntityFrameworkCore/issues/7429
This function isn't present in Entity Framework (non core) EntityFunctions so I'm not sure how to do it for EF6.

Optimising (My)SQL Query

I usually use ORM instead of SQL and I am slightly out of touch on the different JOINs...
SELECT `order_invoice`.*
, `client`.*
, `order_product`.*
, SUM(product.cost) as net
FROM `order_invoice`
LEFT JOIN `client`
ON order_invoice.client_id = client.client_id
LEFT JOIN `order_product`
ON order_invoice.invoice_id = order_product.invoice_id
LEFT JOIN `product`
ON order_product.product_id = product.product_id
WHERE (order_invoice.date_created >= '2009-01-01')
AND (order_invoice.date_created <= '2009-02-01')
GROUP BY `order_invoice`.`invoice_id`
The tables/ columns are logically names... it's an shop type application... the query works... it's just very very slow...
I use the Zend Framework and would usually use Zend_Db_Table_Row::find(Parent|Dependent)Row(set)('TableClass') but I have to make lots of joins and I thought it'll improve performance by doing it all in one query instead of hundreds...
Can I improve the above query by using more appropriate JOINs or a different implementation? Many thanks.
The query is wrong, the GROUP BY is wrong. All columns in the SELECT-part that are not in an aggregate function, have to be in the GROUP BY. You mention only one column.
Change the SQL Mode, set it to ONLY_FULL_GROUP_BY.
When this is done and you have a correct query, use EXPLAIN to find out how the query is executed and what indexes are used. Then start optimizing.