Create continuous-aggregates from base table - postgresql

i have a base table which update with more than 500 item per second and whole table contain almost 500 million data. how create continuous-aggregates from this table by interval 1m?
if i create the Create continuous-aggregates will it goes to loop? because handle this much data will take time while in this time new data comes
i want a way to create continuous-aggregates first calculate all data and don't look at update. then update news

You can use the WITH NO DATA option to later refresh_continuous_aggregates only of the data you want. You can select the begin and end of the timeframe that you want to update.
Example:
CALL refresh_continuous_aggregate('caggs_name', '2020-01-01', '2020-02-01');
See the API docs here.

Related

Loop through database ANYLOGIC

In my model I want to loop through the database which contains multiple columns (see example) by an event. The idea behind it is that I want to create dynamic events based on the rows in the database.
But I've no clue how to iterate through a database in anylogic and also was not able to find an example of a loop with a database.
The dummycode of my problem would look something like this:
For order in orderdatabase:
Create order based on (order.name, order.quantity, order.arrivaltime, order.deliverylocation)
Where order in the loop is every row of the database, and the value on which the creation is based based on the different column values of that specific row.
Can somebody give me a simple example of how to create such a loop for this specific problem.
Thanks in advance.
Use the database query wizard:
put your cursor into a code field
this will allow you to open the database wizard
select what you need (in your case, you want the "iterate over returned rows and do something" option
Click ok
adjust the dummy code to make it do what you want
For details and examples, check the example models and the AnyLogic help, explaining all options in detail.

iSQLOutput - Update only Selected columns

My flow is simple and I am just reading a raw file into a SQL table.
At times the raw file contains data corresponding to existing records. I do not want to insert a new record in that case and would only want to update the existing record in the SQL table. The challenge is, there is a 'record creation date' column which I initialize at the time of record creation. The update operation overwrites that column too. I just want to avoid overwriting that column, while updating the other columns from the information coming from the raw file.
So far I am having no idea about how to do that. Could someone make a recommendation?
I defaulted the creation column to auto-populate in the SQL database itself. And I changed my flow to just update the remaining records. Talend job is now not touching that column. Problem solved.
Yet another reminder of 'Simplification is underrated'. :)

Postgres count(*) optimization idea

I'm currently working on a project involving keeping track of users and their actions with my database (PostgreSQL as the RDMS), and I have run into an issue when trying to perform COUNT(*) on occurrences of each user. What I want is to be able to, efficiently, count the number of times each user appears from every record, and also be able to achieve looking at counts on a particular date range.
So, the problem is how do we achieve counting the total number of times a user appears from the tables contents, and how do we count the total number on a date range.
What I've tried
As you might know, Postgres doesn't support COUNT(*) very well using indices, so we have to consider other ways to reduce the # of records it looks at in order to speed up the query. So my first approach is to create a table to keep track of the number of times a user has a log message associated with them, and on what day (similar to the idea behind a materialized view, but I dont want continually refresh the materialized view with my count query). Here is what I've come up with:
CREATE TABLE users_counts(user varchar(65536), counter int default 0, day date);
CREATE RULE inc_user_date_count
AS ON INSERT TO main_table
DO ALSO UPDATE users_counts SET counter = counter + 1
WHERE user = NEW.user AND day = DATE(NEW.date_);
What this does is every time a new record is inserted into my 'main_table', we update the current users_counts table to increment the records whose date is equal to the new records date, and the user names are the same.
NOTE: the date_ column in 'main_table' is a timestamp so I must cast the new records date_ to be a DATE type.
The problem is, what if the user column value doesn't already exist in my new table 'users_count' for the current day, then nothing is updated.
Here is my question:
How do I write the rule such that we check if a user exists for the current day, if so increment that counter, otherwise insert new row with user, day, and counter of 1;
I also would like to know if my approach makes sense to do, or is there any ideas I am missing that I just haven't thought about. As my database grows, it is increasingly inefficient to perform counting, so I want to avoid any performance bottlenecks.
EDIT 1: I was able to actually figure this out by creating a separate RULE but I'm not sure if this is correct:
CREATE RULE test_insert AS ON INSERT TO main_table
DO ALSO INSERT INTO users_counts(user, counter, day)
SELECT NEW.user, 1, DATE(NEW.date)
WHERE NOT EXISTS (SELECT user FROM users.log_messages WHERE user = NEW.user_);
Basically, an insert happens if the user doesn't already exist in my CACHED table called user_counts, and the first rule above updates the count.
What I'm unsure of is how do I know when which rule is called first, the update rule or insert.. And there must be a better way, how do I combine the two rules? Can this be done with a function?
It is true that postgresql is notoriously slow when it comes to count(*) queries. However if you do have a where clause that limits the number of entries the query will be much faster. If you are using postgresql 9.2 or newer this query will be just as fast as it's in mysql because of index only scans which was added in 9.2 but it's best to explain analyze your query to make sure.
Does my solution make sense?
Very much so provided that your explain analyze show that index only scans are not being used. Trigger based solutions like the one that you have adapted find wide usage. But as you have realized the problem with the initial state arises (whether to do an update or an insert).
which rule is called first
Multiple rules on the same table and same event type are applied in
alphabetical name order.
from http://www.postgresql.org/docs/9.1/static/sql-createrule.html
the same applies for triggers. If you want a particular rule to be executed first change it's name so that it comes up higher in the alphabetical order.
how do I combine the two rules?
One solution is to modify your rule to perform an upsert (Look right at the bottom of that page for a sample upsert ). The other is to populate the counter table with initial values. The trick is to create the trigger at the same time to avoid errors. This blog post explains it really well.
While the initial setup will be slow each individual insert will probably be faster. The two opposing factors being the slowness of a WHERE NOT EXISTS query vs the overhead of catching an exception.
Tip: A block containing an EXCEPTION clause is significantly more
expensive to enter and exit than a block without one. Therefore, don't
use EXCEPTION without need.
Source the postgresql documentation page linked above.

What's the best way to temporarily persist results of a long running SP?

I have a TSQL stored procedure that can run for a few minutes and return a few million records, I need to display that data in an ASP.NET Grid (Infragistics WebDataGrid to be precise). Obviously I don't want return all data at once and need to setup some kind of paging options - every time user selects another page - another portion of data is loaded from the DB. But I can't run the SP every time new page is requested - it would take too much time.
What would be the best way to persist data from the SP, so when user selects a new page - new data portion would be loaded by a simple SELECT... WHERE from that temp data storage?
A few options
One:
If the user only pages forward then you could just hold the connection open and use a DataReader. Just .Read() as needed.
Two:
Create a #temp table using the userID as part of the name to store the results. I don't like this as if user aborts sometimes tables are left over. About 1/2 second hit to create and drop the #temp. Store the entire results or just the PK and create the page detail on demand.
Three:
Use a DataReader to read the the PK into a List<>. It is faster than you would guess. That List is only going to IIS (not to the browser). List can be referenced by ordinal [] and preserves the sort. Get the detail for a page as required. The problem here is where PK in (3,9,2,6) will not return them in that order. I use TVP to pass the order, PK so the page is sorted by order. I do exactly this and get pages loads for objects with 20 properties 40 rows at a time and it takes less than 1/2 second. Do one query per table (NOT one per row) then assemble assign properties in .NET. Use DataReader (not DataTable). And you can even run the reader on a backgroundworker and pass back the first page of PKs using progresschanged.
Have you look at Server Side Paging (article is 2005, but will work with 2008 and CTEs). Also - just wondering, is there any reason you are returning that many rows? I can't see a very good use of a human paging through a million records even if the page size was 1000.

iPhone Dev - Trying to access every row of a sqlite3 table sequentially

this is my first time using SQL at all, so this might sound basic. I'm making an iPhone app that creates and uses a sqlite3 database (I'm using the libsqlite3.dylib database as well as importing "sqlite3.h"). I've been able to correctly created the database and a table in it, but now I need to know the best way to get stuff back from it.
How would I go about retrieving all the information in the table? It's very important that I be able to access each row in the order that it is in the table. What I want to do (if this helps) is get all the info from the various fields in a single row, put all that into one object, and then store the object in an array, and then do the same for the next row, and the next, etc. At the end, I should have an array with the same number of elements as I have rows in my sql table. Thank you.
My SQL is rusty, but I think you can use SELECT * FROM myTable and then iterate through the results. You can also use a LIMIT/OFFSET(1) structure if you do not want to retrieve all elements at one from your table (for example due to memory concerns).
(1) Note that this can perform unexpectedly bad, depending on your use case. Look here for more info...
How would I go about retrieving all the information in the table? It's
very important that I be able to access each row in the order that it
is in the table.
That is not how SQL works. Rows are not kept in the table in a specific order as far as SQL is concerned. The order of rows returned by a query is determined by the ORDER BY clause in the query, e.g. ORDER BY DateCreated, or ORDER BY Price.
But SQLite has a rowid virtual column that can be used for this purpose. It reflects the sequence in which the rows were inserted. Except that it might change with a VACUUM. If you make it an INTEGER PRIMARY KEY it should stay constant.
order by rowid