ESPER. send data with min value - complex-event-processing

I put data into Esper with a type:
{"symbol" :string
"price" :double}
I want to have a symbol of a min price from every minute. When I do something like that:
select min(price), symbol
from Market.win:time_batch(60 sec)
I get a lot of events with the same price (min price), but different symbols (and I want to have only one event (per minute) with only one symbol and price).

Its similar in behavior to an SQL query and provides the symbol for each row and the batch minimum.
Just like in SQL you can use "group by" to control on what level the aggregation operates.
select min(price), symbol from Market.win:time_batch(60 sec) group by symbol
By the way, the batch window retains events in memory. There is an "output snapshot" so that there is no need for the batch window.

Related

From group by to window function postgres

my goal is to compare Event Engagement Rate = ( Event Subscribed / Event Participated ) for the newsletter source vs Email Source by using window functions
i Managed to do it by using group by
select source ,sum(event_subscribed/event_participated) as "engagement rate" from events
group by source
i keep failing with the windows function by having too many rows and wrong engagement rates
Thanks for your help.
First you need to understand the difference between the Aggregate functions and the Window variant of the same function. The aggregate functions builds groups as specified in the 'Group By' clause and reduces the result to a single row per group. The Window version also builds groups as specified in the 'Partition By' clause. However, it does not reduce the number of rows, instead it generated the same result in each of the rows within the group. See example. To reduce the Window version to a single row per group use Distinct on expression.
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
each set of rows where the given expressions evaluate to equal. The
DISTINCT ON expressions are interpreted using the same rules as for
ORDER BY (see above). Note that the “first row” of each set is
unpredictable unless ORDER BY is used to ensure that the desired row
appears first.
Your query becomes
select distinct on (source)
source
, sum(event_subscribed/event_participated) as "engagement rate"
from events
order by source.

How can I select only data within a specific window in KSQL?

I have a table with tumbling window, e.g.
CREATE TABLE total_transactions_per_1_days AS
SELECT
sender,
count(*) AS count,
sum(amount) AS total_amount,
histogram(recipient) AS recipients
FROM
completed_transactions
WINDOW TUMBLING (
SIZE 1 DAYS
)
Now I need to only select data from the current window, i.e. windowstart <= current time and windowend <= current time. Is it possible? I could not find any example.
Depends what you mean when you say 'select data' ;)
ksqlDB supports two main query types, (see https://docs.ksqldb.io/en/latest/concepts/queries/).
If what you want is a pull query, i.e. a traditional sql query where you want to pull back the current window as a one time result, then what you want may be possible, though pull queries are a recent feature and not fully featured yet. As of version 0.10 you can only look up a known key. For example, if sender is the key of the table, you could run a query like:
SELECT * FROM total_transactions_per_1_days
WHERE sender = some_value
AND WindowStart <= UNIX_TIMESTAMP()
AND WindowEnd >= UNIX_TIMESTAMP();
This would require the table to have processed data with a timestamp close to the current wall clock time for it to pull back data, i.e. if the system was lagging, or if you were processing historic or delayed data, this would not work.
Note: the above query will work on ksqlDB v0.10. Your success on older versions may vary.
There are plans to extend the functionality of pull queries. So keep an eye for updates to ksqlDB.

kdb/q: use function in a select from partitioned table

I'm trying to get max drawdown from a partitioned table across multiple dates. The query works fine when run with a date constrained to a specific day. E.g.
select {max neg x-maxs x} pnl from trades where date=last date
It's getting map-reduced over multiple dates so the above query no longer works. I can make the query run over multiple dates by adding another aggregation:
select max {max neg x-maxs x} pnl from trades
but it's not getting the max drawdown from continuous sequence of trades but a maximum of daily drawdowns.
I wonder if there's a way to make it work with a single select without chaining selects like
select {max neg x-maxs x} pnl from select pnl from trades
I've got a rather big query to pull a lot of various metrics on the trades where max drawdown is just one of them. Using chained select means that I need to break the big query into two queries, map-reduced and non-map-reduced, and then join them back which would make the query look ugly.
Thanks!
Select query runs on each date in partition db and apply function to each date values and finally aggregates them depending upon the call (user defined function behaves differently than plain 'q' functions).
So I don't think you can combine that into one query. But there are ways you can look for to make your query more generalized and reusable for different scenarios.
For ex. convert your query to functional form and use variables in that query for column name and user function. Put this in one function which will accept column name and user function. Now you can call this function with different set of (column ;function). Something like :
runF:{[col;usrfunc] funtional_query_uses_col_userfunc }
All this depends on your use cases. Also check for memory usage as you'll be taking lot of data into memory.

Volume of an Incident Queue at a Point in Time

I have an incident queue, consisting of a record number-string, the open time - datetime, and a close time-datetime. The records go back a year or so. What I am trying to get is a line graph displaying the queue volume as it was at 8PM each day. So if a ticket was opened before 8PM on that day or anytime on a previous day, but not closed as of 8, it should be contained in the population.
I tried the below, but this won't work because it doesn't really take into account multiple days.
If DATEPART('hour',[CloseTimeActual])>18 AND DATEPART('minute',[CloseTimeActual])>=0 AND DATEPART('hour',[OpenTimeActual])<=18 THEN 1
ELSE 0
END
Has anyone dealt with this problem before? I am using Tableau 8.2, cannot use 9 yet due to company license so please only propose 8.2 solutions. Thanks in advance.
For tracking history of state changes, the easiest approach is to reshape your data so each row represents a change in an incident state. So there would be a row representing the creation of each incident, and a row representing each other state change, say assignment, resolution, cancellation etc. You probably want columns to represent an incident number, date of the state change and type of state change.
Then you can write a calculated field that returns +1, -1 or 0 to to express how the state change effects the number of currently open incidents. Then you use a running total to see the total number open at a given time.
You may need to show missing date values or add padding if state changes are rare. For other analytical questions, structuring your data with one record per incident may be more convenient. To avoid duplication, you might want to use database views or custom SQL with UNION ALL clauses to allow both views of the same underlying database tables.
It's always a good idea to be able to fill in the blank for "Each record in my dataset represents exactly one _________"
Tableau 9 has some reshaping capability in the data connection pane, or you can preprocess the data or create a view in the database to reshape it. Alternatively, you can specify a Union in Tableau with some calculated fields (or similarly custom SQL with a UNION ALL clause). Here is a brief illustration:
select open_date as Date,
"OPEN" as Action,
1 as Queue_Change,
<other columns if desired>
from incidents
UNION ALL
select close_date as Date,
"CLOSE" as Action,
-1 as Queue_Change,
<other columns if desired>
from incidents
where close_date is not null
Now you can use a running sum for SUM(Queue_Change) to see the number of open incidents over time. If you have other columns like priority, department, type etc, you can filter and group as usual in Tableau. This data source can be in addition to your previous one. You don't have ta have a single view of the data for every worksheet in your workbook. Sometimes you want a few different connections to the same data at different levels of detail or for perspectives.

FileMaker: Is there a way to build an export order in a script?

Question: Is there a way to build an export order while performing a script? I would prefer a FileMaker-native or FileMaker-called AppleScript solution, if one is possible.
Project: The project is a reporting tool which summarizes sales information (units, price, cost) by user-selectable criteria such as: week, quarter, year, location, product, supplier, etc. I would like a way to specify, at runtime, an export based on the user-selected criteria.
Example: If a user selected units sold summarized by supplier per quarter I would like to be able to have the script select:
Group by:
quarter
supplier
Export Order
quarter
units summary by quarter
supplier
units summary by supplier
There are obviously many permutations, so setting up an export for each individual export for each set of options is infeasible.
If the target format is text-based (i.e. tab- or comma-separated), then I'd export to XML and write a XSLT to summarize it as necessary. To pass parameters to the XSLT I normally export a small XML file to the same folder.
A solution I can think of is to export calculations rather than the original fields. With the example you give, assume that the user can export up to two fields. You create two calculation fields and two text fields. The text fields store the name of the field to export and the calculation fields use Evaluate (or GetField) to get the contents of the fields. It gets complicated if you're also exporting date and time fields, but it's still workable. If you need to include the field names in the export, you create an extra record and work your calculations for that record to contain the names of the fields the user has selected.
Not trivial, but still possible.
Building on Mikhail's and Chuck's suggestions, I think the best method for this particular project is going to be to build the contents of a .csv in a global field and then Export Field Contents. The basic outline of what I'm doing:
Go to the first record
Loop
WriteTheRows (see below), comma delimited, to a global field
Set $thisGroup to the count of records summarized by this summary field
Exit Loop If Get (CurrentRecord) + $thisGroup >= Get (FoundCount)
Go to record [Get (CurrentRecord) + $thisGroup]
End Loop
Export Field Contents [global field]
WriteTheRows is a custom function that does the following:
The output I'm trying to write can be sorted by up to 7 different criteria at the same time (for example: I could summarize supplier sales by quarter or I could summarize quarter sales by supplier)
Compare the highest level sort field's value to the last value we found for the highest level sort field.
If they're different WriteALine to the global field for this sort field, the next sort field, all sort fields down to the lowest level.
If they're the same, compare the (highest level sort field - 1) to the stored value for the (highest level sort field - 1)
If they're the same, WriteALine to the global field for the (highest level sort field - 1) on down to the lowest level sort field
... repeat until we're down to the lowest sort field
WriteALine is another custom function which adds the appropriate labels, commas and values using the GetSummary ( revenueSummary ; Evaluate ( "summaryField" & summaryFieldNumber ) as Chuck suggests in his answer.