BETWEEN Versus >= and <= in DB2 SQL Query - Performance - db2

I have the following querys:
SELECT ID, ADDRESS
FROM EMPLOYEE A
WHERE ID=12345
AND CURRENT DATE BETWEEN A.EFF_DT AND A.EXP_DT
SELECT ID, ADDRESS
FROM EMPLOYEE A
WHERE ID=12345
AND CURRENT DATE >= A.EFF_DT AND CURRENT DATE <= A.EXP_DT
Out of these two queries which query yields better performance.
Here I am using operators >= and <= instead of BETWEEN.
Please suggest.
Thanks in advance.

Both those should give you exactly the same execution profile, based on my knowledge of DB2/z (the LUW product may be different but I doubt it).
If you're really concerned, you should run an EXPLAIN on the two queries to see if there are any differences.

Between is simply a shorthand for >= and <= ,
if want find more help go to the link is here: Is the 'BETWEEN' function very expensive in SQL Server?

Related

JBPM Business central - Data set not working with aggregate function

I have an issue with the Data set - Execution server. I am using PostgreSQL as DB. I want to calculate the difference between the two dates column for my report. The query I have used in DB is:
Query 1:
SELECT end_date as end,
start_date as start,
processid as pidd,
AGE(end_date, start_date) as duration
from processinstancelog
Query 2:
select end_date,start_date,processid, end_date - start_date as
duration from processinstancelog
Both queries reflecting the correct expected result in Postgres DB. But when I am using the same queries in the Data set>Execution server it's not showing the "duration" column.
Question
Can anyone please advise what is issue why the data set is not showing the duration column?
Many Thanks
Both queries reflecting the correct expected result in Postgres DB. But when I am using the same queries in the Data set>Execution server it's not showing the "duration" column.
How do you use in the query in the execution server? Are you implementing advance query functionality? If yes can you please share the exact steps you are following and your advance query definition to review.
Answer, I deleted the old setup and installed the new setup for JBPM and data set start appearing

postgresSQL How to do a SELECT clause with an condition iterating through a range of values?

Hy everyone. This is my first post on Stack Overflow so sorry if it is clumpsy in any way.
I work in Python and make postgresSQL requests to a google BigQuery database. The data structure looks like this :
sample of data
where time is represented in nanoseconds, and is not regularly spaced (it is captured real-time).
What I want to do is to select, say, the mean price over a minute, for each minute in a time range that i would like to give as a parameter.
This time range is currently a list of timestamps that I build externally, and I make sure they are separated by one minute each :
[1606170420000000000, 1606170360000000000, 1606170300000000000, 1606170240000000000, 1606170180000000000, ...]
My question is : how can I extract this list of mean prices given that list of time intervals ?
Ideally I'd expect something like
SELECT AVG(price) OVER( PARTITION BY (time BETWEEN time_intervals[i] AND time_intervals[i+1] for i in range(len(time_intervals))) )
FROM table_name
but I know that doesn't make sense...
My temporary solution is to aggregate many SELECT ... UNION DISTINCT clauses, one for each minute interval. But as you can imagine, this is not very efficient... (I need up to 60*24 = 1440 samples)
Now there very well may already be an answer to that question, but since I'm not even sure about how to formulate it, I found nothing yet. Every link and/or tip would be of great help.
Many thanks in advance.
First of all, your sample data appears to be at millisecond resolution, and you are looking for averages at minute (sixty-second) resolution.
Please try this:
select div(time, 60000000000) as minute,
pair,
avg(price) as avg_price
from your_table
group by div(time, 60000000000) as minute, pair
If you want to control the intervals as you said in your comment, then please try something like this (I do not have access to BigQuery):
with time_ivals as (
select tick,
lead(tick) over (order by tick) as next_tick
from unnest(
[1606170420000000000, 1606170360000000000,
1606170300000000000, 1606170240000000000,
1606170180000000000, ...]) as tick
)
select t.tick, y.pair, avg(y.price) as avg_price
from time_ivals t
join your_table y
on y.time >= t.tick
and y.time < t.next_tick
group by t.tick, y.pair;

Identifying next closest record by date in tableau

I have a table of users and another table of transactions.
The transactions all have a date against them. What I am trying to ascertain for each user is the average time between transactions.
User | Transaction Date
-----+-----------------
A | 2001-01-01
A | 2001-01-10
A | 2001-01-12
Consider the above transactions for user A. I am basically looking for the distance from one transaction to the next chronologically to determine the distances.
There are 9 days between transactions one and two; and there are 2 days between transactions three and four. The average of these is obviously 4.5, so I would want to identify the average time between user A's transactions to be 4.5 days.
Any idea of how to achieve this in Tableau?
I am trying to create a calculated field for each transaction to identify the date of the "next" transaction but I am struggling.
{ FIXED [user id] : MIN(IF [Transaction Date] > **this transaction date** THEN [Transaction Date]) }
I am not sure what to replace this transaction date with or whether this is the right approach at all.
Any advice would be greatly appreciated.
LODs dont have access to previous values directly, so you need to create a self join in your data connection. Follow below steps to achieve what you want.
Create a self join with your data with following criteria
Create an LOD calculation as below
{FIXED [User],[Transaction Date]:
MIN(DATEDIFF('day',[Transaction Date],[Transaction Date (Data1)]))
}
Build the View
PS: If you want to improve the performance, Custom SQL might be the way.
The only type of calculation that can take order sequence into account (e.g., when the value for a calculated field depends on the value of the immediately preceding row) is a table calc. You can't use an LOD calc for this kind of problem.
You'll need to understand how partitioning and addressing works with table calcs, along with specifying your sort order criteria. See the online help. You can then do something like, for example, define days_since_last_transaction as:
if first() > 0 then min([Transaction Date]) -
lookup(min([Transaction Date]), -1) end
If you have very large data or for other reasons want to do your calculations at the database instead of in Tableau by a table calc, then you use SQL windowing (aka analytical) queries instead via Tableau's custom SQL.
Please attach an example workbook and anything you tried along with the error you have.
This might not be useful if you cannot set User ID Field as a filter.
So, you can set
User ID
as a filter. Then following the steps mentioned in here will lead you to calculating difference between any two dates. Ideally if you select any one value in the filter, the calculated field from the link should give you the difference in the dates that you have in the transaction dates column.

KDB - Filter List Column Based on Another Column

I'm struggling with eliminating data from my query. I have attached a picture with my data results (data itself is too large and has customer info so I can't include). I have two tables that I'm joining by SKU to show when we enter a SKU into the system and when we sell it. We reuse SKUs based on vendors which isn't the best practice but is currently a necessity. What I'd like to do is eliminate the InvoiceDates where InvoiceDate < TransferDate. So in the InvoiceDate column it would only show the highlighted yellow dates for the first few rows.
Please let me know if you have any questions and thanks for the help!
This would work:
q) update InvoiceDate:{x where x >= y}'[InvoiceDate;TransferDate] from tbl
Explanation:
Above query uses 'each-both(') function to iterate over InvoiceDate and TransferDate values pair wise(indirectly row wise), pass each pair to lambda function as 'x' and 'y' and then select 'x'(InvoiceDate) which are >= 'y'(TransferDate)
You question is cut off, but I'm guessing you want to filter on whether a particular date is in your invoiceDate lists. You can do this as follows:
q)select from tbl where in[2019.01.01;] each invoiceDate
If this isn't what you are looking for, please clarify above with an example

How to get all missing days between two dates

I will try to explain the problem on an abstract level first:
I have X amount of data as input, which is always going to have a field DATE. Before, the dates that came as input (after some process) where put in a table as output. Now, I am asked to put both the input dates and any date between the minimun date received and one year from that moment. If there was originally no input for some day between this two dates, all fields must come with 0, or equivalent.
Example. I have two inputs. One with '18/03/2017' and other with '18/03/2018'. I now need to create output data for all the missing dates between '18/03/2017' and '18/04/2017'. So, output '19/03/2017' with every field to 0, and the same for the 20th and 21st and so on.
I know to do this programmatically, but on powercenter I do not. I've been told to do the following (which I have done, but I would like to know of a better method):
Get the minimun date, day0. Then, with an aggregator, create 365 fields, each has that "day0"+1, day0+2, and so on, to create an artificial year.
After that we do several transformations like sorting the dates, union between them, to get the data ready for a joiner. The idea of the joiner is to do an Full Outer Join between the original data, and the data that is going to have all fields to 0 and that we got from the previous aggregator.
Then a router picks with one of its groups the data that had actual dates (and fields without nulls) and other group where all fields are null, and then said fields are given a 0 to finally be written to a table.
I am wondering how can this be achieved by, for starters, removing the need to add 365 days to a date. If I were to do this same process for 10 years intead of one, the task gets ridicolous really quick.
I was wondering about an XOR type of operation, or some other function that would cut the number of steps that need to be done for what I (maybe wrongly) feel is a simple task. Currently I now need 5 steps just to know which dates are missing between two dates, a minimun and one year from that point.
I have tried to be as clear as posible but if I failed at any point please let me know!
Im not sure what the aggregator is supposed to do?
The same with the 'full outer' join? A normal join on a constant port is fine :) c
Can you calculate the needed number of 'dublicates' before the 'joiner'? In that case a lookup configured to return 'all rows' and a less-than-or-equal predicate can help make the mapping much more readable.
In any case You will need a helper table (or file) with a sequence of numbers between 1 and the number of potential dublicates (or more)
I use our time-dimension in the warehouse, which have one row per day from 1753-01-01 and 200000 next days, and a primary integer column with values from 1 and up ...
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
Ok... so you could override your source qualifier to achieve this in the selection query itself (am giving Oracle based example as its what I'm used to and I'm assuming your data in is from a table). I looked up the connect syntax here
SQL to generate a list of numbers from 1 to 100
SELECT (MIN(tablea.DATEFIELD) + levquery.n - 1) AS Port1 FROM tablea, (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <= 365) as levquery
(Check if the query works for you - haven't access to pc to test it at the minute)