different batch size according to table column in database - anylogic

I am new to anylogic. I am trying to model a source with arrival dates and multiple agents per arrival and had been imported successfully from a table from database. it works well until i am trying to batch the units per arrival and set the batch size from the same column of multiple agents per arrival. it doesn't work well. why this happening as there is no error occurs but when i try to run the model doesn't open. what is the wrong and how i can do it? thanks in advance.source, batch block](https://i.stack.imgur.com/0Oui3.png)

Your SQL statement in the Batch block is not sufficient: you are always just querying the first entry in shipments_data.amount:
You need to add a "Choice condition" just below it such that you specify which row from shipments_data to use for the amount value.
(Think about the table: How would an agent in the Batch block know which amount to use without a specific condition?)
If you cannot find a good condition, you probably should store the amount value as a parameter in your MyAgent type upfront in the Source block and use the parameter in the "Batch size" field instead, not query the dbase

Related

Does parameters variation not update the builtin database?

I notice that whenever I run a ParametersVariation model, the built-in database does not update... I have PLE, so there is no way for me to write my own database. I am currently able to pull data from various logs present in the database, but only from a normal simulation run. Is there a way to have the parameters variation write its data to the database after each simulation run?
I am currently running this code in After simulation run
Database myFile = new Database(this, "A DB from Excel", "C:/Users/Downloads/DataExport.xlsx");
ModelDatabase modelDB = getEngine().getModelDatabase();
modelDB.exportToExternalDB("flowchart_stats_time_in_state_log", myFile.getConnection(), "Sheet", false, true);
The export works perfectly. But the data never changes and this is confirmed by exporting a distribution from a histogram that changes with every simulation run. But for this export, its the same data as was written to the database from the last standard (non-parametersvariation) simulation run.
Model log database tables aren't produced for multi-run experiments. It's not specifically stated anywhere, but they're designed more for testing/debugging (single runs of) models.
(Also, notice that the log tables don't have columns specifying a run ID or similar, so there's no way that you would have been able to distinguish rows for different runs anyway, even if there were rows written in multi-run experiments.)
Unfortunately, because they are one of the only ways to 'automatically' produce certain forms of output data (like the contents of datasets or histograms) many people try to use them for that (even though they have a pretty un-useful 'internal' format). In general you should write to your own internal database tables for any persistent outputs, where you can also govern whether you store outputs for multiple runs or not (which will require you to calculate some form of unique run IDs and use those in columns to differentiate outputs per run, plus have logic or UI elements to determine when the table data is cleared for a new run and when it isn't).
NB: Note that the kinds of data the model log tables (like flowchart_stats_time_in_state_log which you mention) create can in virtually all cases be determined and created 'manually' via your own model code. That table in particular has a large amount of detail on what's happened in each block and, in any given case, it's probably only a fraction of that data (or a simplification/aggregation of it) that you really want/need.

EF Core 3.1: Update a single record

What would be the best solution for building a number sequence generator. I've to apply several different text formats to a sequence of numbers. I would like to save the current number values in a own table. When I need to draw a new number, I would like to increase the number in my table and save the new number to my table. But I have to make it independent from other (bigger) transactions, which may be running at the same time.
For better understanding my issue: I've to import a batch of sales orders. During the import I've to generate a new number sequence value, but it will happen quite in the middle of a transaction. After processing some header information I can now generate my number from the sequence. At this point I've to save this in the database, so that the next user has to draw a new number. After the number generation I've to process the order lines. Only if this is finished successful, then I'm allow to save the whole sales order in database. If errors occur, the whole sales order will be retried to be imported manually later. The used number of the sequence must not be used again.
So I need a possibility to save single records from the ChangeTracker instead of writing all modifications at once. Any idea how to deal with it?

How to create all agents at once by a database?

I'm generating my agents in anylogic based on a database table that I've created. In this DB I have some characteristics of my agent. This agent is supposed to be my "scheduling agent", since my focus is on rescheduling, it is important that my production orders are saved as agents in a queue. My problem is that when generating the agents, firstly I can't tell the system to generate all of them at once (so like "import" the line of my DB and transform each line into an agent with characteristics).
I tried doing it by adding 1s difference between every production order, but, when the last date is reached, my simulation gives an error and stops working. Could someone help me achieve my task? Do you think there would be a better solution?
I am not sure 100% what you are trying to do, but I have a similar problem I think that I have solved this way.
I have a database of batches that I want to load all at once.
enter image description here
This is going to load the batches one at a time with 0 interarrival time. This means that batches will flow continuously. Also important is the Limited number of arrivals option, which will stop the loading when the end of the database is reached.
Also, after the source, I added a queue with Maximum capacity set to infinite.
Hope that helps

Incremental upload/update to PostgreSQL table using Pentaho DI

I have the following flow in Pentaho Data Integration to read a txt file and map it to a PostgreSQL table.
The first time I run this flow everything goes ok and the table gets populated. However, if later I want to do an incremental update on the same table, I need to truncate it and run the flow again. Is there any method that allows me to only load new/updated rows?
In the PostgreSQL Bulk Load operator, I can only see "Truncate/Insert" options and this is very inefficient, as my tables are really large.
See my implementation:
Thanks in advance!!
Looking around for possibilities, some users say that the only advantage of Bulk Loader is performance with very large batch of rows (upwards of millions). But there ways of countering this.
Try using the Table output step, with Batch size("Commit size" in the step) of 5000, and altering the number of copies executing the step (depends on the number of cores your processor has) to say, 4 copies (Dual core CPU with 2 logical cores ea.). You can alter the number of copies by right clicking the step in the GUI and setting the desired number.
This will parallelize the output into 4 groups of Inserts, of 5000 rows per 'cycle' each. If this cause memory overload in the JVM, you can easily adapt that and increase the memory usage in the option PENTAHO_DI_JAVA_OPTIONS, simply double the amount that's set on Xms(minimum) and XmX(maximum), mine is set to "-Xms2048m" "-Xmx4096m".
The only peculiarity i found with this step and PostgreSQL is that you need to specify the Database Fields in the step, even if the incoming rows have the exact same layout as the table.
you are looking for an incremental load. you can do it in two ways.
There is a step called "Insert/Update" , this will be used to do incremental load.
you will have option to specify key columns to compare. then under fields section select "Y" for update. Please select "N" for those columns you are selecting under key comparison.
Use table output and uncheck "Truncate table" option. While retrieving the data from source table, use variable in where clause. first get the max value from your target table and set this value to a variable and include in the where clause of your query.
Editing here..
if your data source is a flat file, then as I told get the max value(date/int) from target table and join with your data. after that use filter rows to have incremental data.
Hope this will help.

Neo4j's MERGE command on big datasets

Currently, I am working on a project of implementing a Neo4j (V2.2.0) database in the field of web-analytics. After loading some samples, I'm trying to load a big data set (>1GB, >4M lines). The problem I am facing, is that the usage of the MERGE command takes exponentially more time as the data size grows. Online sources are ambiguous on what the best way is to load big sets of data when not every line has to be loaded as a node, and I would like some clarity on the subject. To emphasize, in this situation I am just loading the nodes; relations are the next step.
Basically there are three methods
i) Set a uniqueness constraint for a property, and create all nodes. This method was used mainly before the MERGE command was introduced.
CREATE CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE
followed by
USING PERIODIC COMMIT 250
LOAD CSV WITH HEADERS FROM "file:C:\\path\\file.tsv" AS row FIELDTERMINATOR'\t'
CREATE (:Book{isbn=row.isbn, title=row.title, etc})
In my experience, this will return a error if a duplicate is found, which stops the query.
ii) Merging the nodes with all their properties.
USING PERIODIC COMMIT 250
LOAD CSV WITH HEADERS FROM "file:C:\\path\\file.tsv" AS row FIELDTERMINATOR'\t'
MERGE (:Book{isbn=row.isbn, title=row.title, etc})
I have tried loading my set in this manner, but after letting the process run for over 36 hours and coming to a grinding halt, I figured there should be a better alternative, as ~200K of my eventual ~750K nodes were loaded.
iii) Merging nodes based on one property, and setting the rest after that.
USING PERIODIC COMMIT 250
LOAD CSV WITH HEADERS FROM "file:C:\\path\\file.tsv" AS row FIELDTERMINATOR'\t'
MERGE (b:Book{isbn=row.isbn})
ON CREATE SET b.title = row.title
ON CREATE SET b.author = row.author
etc
I am running a test now (~20K nodes) to see if switching from method ii to iii will improve execution time, as a smaller sample gave conflicting results. Are there methods which I am overseeing and could improve execution time? If I am not mistaken, the batch inserter only works for the CREATE command, and not the MERGE command.
I have permitted Neo4j to use 4GB of RAM, and judging from my task manager this is enough (uses just over 3GB).
Method iii) should be the fastest solution since you MERGE against a single property. Do you create the uniqueness constraint before you do the MERGE? Without an index (constraint or normal index), the process will take a long time with a growing number of nodes.
CREATE CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE
Followed by:
USING PERIODIC COMMIT 20000
LOAD CSV WITH HEADERS FROM "file:C:\\path\\file.tsv" AS row FIELDTERMINATOR'\t'
MERGE (b:Book{isbn=row.isbn})
ON CREATE SET b.title = row.title
ON CREATE SET b.author = row.author
This should work, you can increase the PERIODIC COMMIT.
I can add a few hundred thousand nodes within minutes this way.
In general, make sure you have indexes in place. Merge a node first on the basis of the properties that are indexed (to exploit fast lookup) and then modify that node's properties as needed with SET.
Beyond that, both of your approaches are going through the transaction layer. If you need to jam a lot of data into the DB really quickly, you probably don't want to use transactions to do that, because they're giving you functionality you might not need, and they require overhead that's slowing you down. So a larger solution would be to not insert data with LOAD CSV but go another route entirely.
If you're using the 2.2 series of neo4j, you can go for the batch inserter via java, or the neo4j-import tool sadly not available prior to 2.2. What they both have in common is that they don't use transactions.
Finally, either way you go you should read Michael Hunger's article on importing data into neo4j as it provides a good conceptual discussion of what's happening, and why you need to skip transactions if you're going to load big huge piles of data into neo4j.