Reconciling schedule data (Solve a logic problem)

Reconciling schedule data (Solve a logic problem) - service

I have a problem I need to solve using Node. The question I have is surrounding the best logical way to solve it. Any advise is appreciated.
Summary
You will build a tool that imports train schedules from an external data source and is stored in an internal database. 
Eternal Data Source
There is a service that provides a list of trains:
[{"id":1,"name":"A EXPRESS"},{"id":2,"name":"B EXPRESS"},{"id":3,"name":"C EXPRESS"},{"id":4,"name":"D EXPRESS"},{"id":5,"name":"E EXPRESS"}]
And a list of train stations:
Example for train "A EXPRESS":
[{"arrival":"2019-04-30T11:48:00.000Z","departure":"2019-05-01T05:42:00.000Z","service":"Loop 5","station":{"id":"ST1","name":"Waterloo"}},{"arrival":"2019-05-13T18:00:00.000Z","departure":"2019-05-14T05:00:00.000Z","service":"Loop 5","station":{"id":"ST2","name":"Paddington"}},{"arrival":"2019-05-15T04:00:00.000Z","departure":"2019-05-15T11:00:00.000Z","service":"Loop 5","station":{"id":"ST3","name":"Heathrow"}},{"arrival":"2019-05-16T20:00:00.000Z","departure":"2019-05-17T10:00:00.000Z","service":"Loop 5","station":{"id":"ST4","name":"Wimbledon"}},{"arrival":"2019-05-18T15:00:00.000Z","departure":"2019-05-19T21:00:00.000Z","service":"Loop 5","station":{"id":"ST5","name":"Reading"}},{"arrival":"2019-05-21T04:00:00.000Z","departure":"2019-05-21T21:00:00.000Z","service":"Loop 5","station":{"id":"ST6","name":"Algate"}},{"arrival":"2019-05-31T03:00:00.000Z","departure":"2019-05-31T15:00:00.000Z","service":"Loop 5","station":{"id":"ST1","name":"Waterloo"}}]
Note: this train stops at station "ST1" ("Waterloo") twice.
To do
For each imported station call, we want to maintain the latest information as well as the history of the station call: 
-What station is the train calling? 
-What are the latest arrival and departure dates? 
-When was the station call first imported? 
-When was the station call last updated? 
-How did the station call change as time went? (evolution of arrival and departure dates with time) This kind of information is useful for us to understand how often trains are delayed, when do schedule changes happen and if there are patterns to these changes. 
How it works
-The external data source is a simulation of train schedules forecasts 
-The data covers a time range from January 1st 2019 to May 31st 2019 
-This 5 months time window is compressed and simulated over a 24 hours cycle 
-This 24 hours cycle restarts every day at 00:00 UTC 
-The data source provides endpoints to request train schedules
-A train list endpoint provides a dynamic list of trains that you can import (see Data above)
-A train schedules endpoint provides a dynamic list of stations calls for a specific train (see Data above)
-A train schedule consists of a list of station calls with a varying amount of past station calls and future station calls
-This external data source does not provide a unique identifier for each station call
-This means that merging station calls is not straightforward. This is the crux of my question: reconciling external station calls with the existing ones in the database. 
-Station calls arrival and departure dates routinely change, sometimes by multiple days. Sometimes they swap, get deleted or new ones appear
-Station calls can sometimes be deleted (the train will not stop in that specific station
-New station calls can sometimes be created (the train will make an unscheduled stop)
-The train schedules endpoint changes the data returned every 15 minutes. 
Specific requirements
You need to capture 24 hours of all train schedules starting at 00:00:00 UTC on one day and ending at 23:59:59 UTC on the same day.   
Question
As you can see from the task above, there is a need to reconcile the new imported data with existing data as the new data changes.
There is no ID that can be used to match station visits, but these visits need to be updated when the external data changes.
We do have a train ID and a station ID as well as the date of visits.
What is the logic I can apply to keep the database data accurate and up to date?
Thank you
Answer?
My initial thoughts are to do the following, however I am not sure if this is the best solution. If you have a better solution, or see a problem with mine, please let me know.
The list of station calls retrieved from the external service will always be saved to the internal database under its retrieval timestamp. This will always be displayed as the current information in the UI (latest database entry).
Each 15 min a call is made to the external service and a new latest entry is added to the internal database, and reflected on the UI.
The status of the latest entry needs to reflect the difference between the the latest entry and the previous entry. E.g. delayed by X min, cancelled, etc. This is the tricky part, because the current station visit needs to be matched with a previous station visit.
My thinking is for a specific train is to just find the matching station ID.
If there is no previous matching station ID, then the status is "new".
if there is previous station ID, and no matching current station ID, the status is "cancelled".
if there is one matching station ID, then there times are compared, and the status is updated to "early" or "delayed".
If there is more than one matching station ID, the station previous and current station IDs with the closest timestamps are matched, and their status updates accordingly to "early" or "delayed".
Is my logic correct?

Related

Measuring system time of specific agent in anylogic

I've got 3 different product types of agent, which each go it's individual path within the fabric. How can i measure the average time the product type spends in the system?
My logic looks like this , and i wanted to implement the measurement in the first service, like this:, it will be completed in the last service like this :
Now I get some really high numbers, which are absolutely wrong. The process itself works fine, if you run the measurement with the code "//agent.enteredSystemP1 = time()", you will get a mean of 24 minutes, per product. But how can i get the mean per product type?

Just use the same if-elseif-else distinction in the 2nd service block as well.
Currently, any agent leaving the system adds time to any systemTimeDistribution

In Anylogic How can I model truck that deliver orders to multiple clients

I'm creating a model based on the product delivery example provided by AnyLogic. In my own model, I want a truck to deliver multiple orders in one trip instead of one. My process diagram is shown below. Here, an order enters via the enter block and several orders are accumulated in the batch block. Every order has a specified destination. How do I model the truck such that it combines two orders and move to the nearest delivery location first and then the second etc?
The main problem is that I don't know the code that accesses the parameter "Delivery location" in each order.
enter image description here
enter image description here
Additional information:
The orders agents are generated and the delivery location is stored in a parameter called "client"
The batch block combines (lets say 2) orders into a batch of the type Order ( advanced settings set to population of agents)
The service block pulls a truck from the resource poule and send the batch of orders to the truck agent using send(batch.unit)
The truck agent stores the order/orders(?) in a variable called "order"
Then, a moveTo function should deliver the order to the first destination
What would be the code to move to the first, second etc., destination?

Here is the conceptual piece you are probably not aware of, and that should help you move in the right direction:
you can have "for loops" as part of your process flows. Below, you see an example where an agent keeps driving to places until it has no more parcels.
Obviously, the details of the blocks depend on your model but in each, you can access the truck's orders if you have them in your Truck agent type (which is needed, obviously).

Exploring date and address variables

I have a dataset that contains date variables, quantitative and qualitative predictor variables, and a binary dependent variable. The goal of my analysis is find the percent of success in CORRECT and to learn more about the relationship between CORRECT with the independent variables.
There are people we can call trackers that live throughout the US. Each one has the job of keeping track of the addresses for the participants of our program in their location. The problem is that some of these trackers are not regularly updating the address for the group of participants that they are responsible for. Some of the addresses in their database can be outdated or incorrect some other way. I’m looking to explore more into these correct/incorrect addresses and their relationship between the other variables. Below are some of the variables that I have in my dataset:
CORRECT: a binary variable to indicate whether or not the RECORDER entered the correct address
RECORDER_ADDRESS: the address that the RECORDER put into their database for the participant
ACTUAL_ADDRESS: the address where the participant is actually located
ZIP_CODE: the zip code for the participant
PARTICIPANT_ID: The unique ID for the participant
CREATED_DATE: date when the initial address for the participant was recorded
MODIFIED_DATE: dates when any variable was modified
PARTICIPANT_START_DATE: the start date for the participant on the job
PARTICIPANT_END_DATE: the date when this participants duty will end
RECORDER: name of the recorder that is in charge of tracking this entry
TRAINING: the type of training that the participant received
I’ve figured out the accuracy of the RECORDERs. I found they were correct about 56% of the time. Now I’m trying to look more for into these incorrect and correct addresses. I’ve tried logistic regression to predict CORRECT but none of the predictor variables were significant. I made a stacked bar-plot using the CORRECT variable and STATE along with CORRECT and RECORDER. Now I’m resorting to using the 4 date variables along with the ZIP_CODE, RECORDER_ADDRESS, and ACTUAL_ADDRESS to learn about the success and failure of the RECORDERS. Are there some visualization ideas or analysis that can use the date variables and/or address variables to gain insight about the correct/incorrect recordings?
An idea that can be used is to create another variable that’s the difference in time between CREATED_DATE and MODIFIED_DATE. Another difference for PARTICIPANT_START_DATE and MODIFIED_DATE.

AnyLogic Model- Patient Scheduling Project

I am currently working on a project to model outpatient appointment scheduling for a local hospital. The goal of this project is to model their current situation and then adjust different factors to reduce the wait time until the next available appointment. We are using AnyLogic to create the model. At this hospital the current system is as follows:
A patient calls and schedules an appointment with one of the hospitals 19 sub-specialties.
- appointment will either be a first time consultation lasting 1 hour or a follow up appointment lasting 15 minutes.
Patient waits 1 week-6 months until their appointment date (based on sub-specialty)
Patient is seen by a doctor and then exits the system
We have approached the problem in two ways, the first was to attach the schedule to the resource pool which consists of the doctors for a single sub specialty. This would allow the schedule to change as the number of doctors change. The second approach was to attach the schedule to the source which consists of the patients entering the system. This better controls the flow of individuals into the system.
We are having difficulties figuring out how to configure the model so that it accurately shows the result of adding more doctors while still allowing the flexibility we need in scheduling different length appointments in multiple sub-specialties.
If anyone has experience with AnyLogic Scheduling, has dealt with a similar problem, or has any advice on how to proceed, I would appreciate the input.
Thanks!

If I understand correctly, you want to change the number of resources you have according to a certain schedule...
In the schedule you need to you use integer type and then create your schedule based however you want. In the action you can use "value" as a variable that corresponds to the value of the current schedule value. The action in the schedule is activated everytime the schedule changes... so you can simply do resourcePool.set_capacity(value);
With this you have the flexibility to use different length appointments... You can use one different schedule for each sub-specialty

How to programmatically manage startWatch and stopWatch in an AnyLogic model

I'm building a model with AnyLogic using the Process Model Library (PML).
In my network I have 4 "source" elements that emits agents, they are all of the same type but with a different "category" string ID inside them (saved as a variable) to differentiate them; they are purchase order from different departments. I have also inserted blocks to measure the time the agents spend to exit from the assembler elements (you will see in the picture inside the red circles). This is the time that I will want to plot in a graph to show how fast they are.
After some test and reading the documentation I have see that when an agent pass under the start photocell (agent of any department) a timer is activated and the opposite when an agent (agent of any department) pass under the stop photocell the timer is stopped. Also in case of agent with different category IDs.
How can I synchronize the timers to measure time of object with the same category ID? I want that if the first source element "Category Nilo&Salmoni" produce an agent the stopWatch measure the time of this one and not of another one, emitted from another category, that reach the stopwatch first.