How to Load Only One Track at a Time in Liquidsoap - liquidsoap

I have a MySQL database that stores all my tracks and their associated information. One of the tables in the database is a queue table from which I pull a track for Liquidsoap to play. I am providing those tracks to play with Liquidsoap by using the request.dynamic.list.
def get_track() =
# Get the first line of my external process
result = list.hd(default="", get_process_lines(scripts ^ "get_track.py"))
print(result)
# Create and return a request using this result
[request.create(result)]
end
# Create the source
sourcetrack = request.dynamic.list(id="play_queue", conservative=false, get_track)
The get_track.py script retrieves a record from a queue table in the database.
I noticed that Liquidsoap will grab two tracks when in starts up. Two get "accepted" and one is "prepared."
Is there a way to get Liquidsoap to only accept one track at a time and wait to accept the next one only when reaching near the end of the currently playing track?
I also have scheduled programs that get added to the queue table in the database and when this occurs, all tracks are cleared from the queue table in the database and the program is then added to the queue table.
Since Liquidsoap appears to have a track already loaded in its queue while playing the "prepared" track, is there a way to remove that track so Liquidsoap will not play that track next, but rather call again the get_track.py script to load new track from queue table in database?

Liquidsoap always prepares stream's next items in advance, and it's a fundamental principle of its scheduler. This allows to start a download before playing the downloaded track, for example. As long as you are using request.dynamic.list, the called script must take care of this. In other words, you can't only rely on clock time to evaluate the track to return.
As far as I understand your use case you might prefer using a request.queue source, and have your script push each request on time via the telnet server.

Related

mimic test_start test_stop events in distributed mode worker

In my locustfile I defined test_on_start and test_on_stop events to read a file needed for the test and to write detailed statistics in a CSV at the end of the test. when running in distributed mode, these events occur on the master, not the worker. I am assembling a list of detailed stats for each task in a task sequence and at the end of the test writing a CSV file when the test stops. I found this stackoverflow question which references a setup and teardown. I added these to my class User(HttpUser): but they appear to not be executed.
How can I mimic these events when the test is running on a worker in distributed mode?
Is there a better way?
I am using User on_start and on_stop already - my on_start calls a function to select a random user from a list which was created when the #events.test_start.add_listener is fired, which only happens on the master and not on the workers, so the worker doesn't have any user login data.
It seems counter productive to open the file, read it, select a user at random and close it every time the User on_start method is called. User on_start also sets up the iteration list [] which is where i store the times per task.
When the task sequence is done, meaning the last task is executed, i do a self.interrupt() which runs on_stop, which is where I take the iteration times, and put them into a second list, which is later written using the CSV module. maybe it would be better to just write the data to the CSV during on_stop
The setup/teardown for individual Users has been removed (because they were confusing, as it was run on the first instance of that User class, and when people set properties on that instance got very confused by the fact that later instances didnt get that). Tbh, I wish they had just been replaced by class methods...
The User still has on_start/stop methods though, and if you combine that with a flag it may be able to do what you want. Something like this:
class MyUser(HttpUser):
stopped = False
...
def on_stop(self):
if not MyUser.stopped:
MyUser.stopped = True
# write your csv
# this doesnt guarantee that all your Users are finished though.
https://docs.locust.io/en/stable/writing-a-locustfile.html#on-start-and-on-stop-methods

Complicated job aggregate

I have a very complicated job process and it's not 100% clear to me where to handle what.
I don't want to have code, it just the question who is responsible for what.
Given is the following:
There is a root directory "C:\server"
Inside are two directories "ftp" and "backup"
Imagine the following process:
An external customer sends a file into the ftp directory.
An importer application get's the file and now the fun starts.
A job aggregate have to be created for this file.
The command "CreateJob(string file)" is fired.
?. The file have to be moved from ftp to backup. Inside the CommandHandler or inside the Aggregate or on JobCreated event?
StartJob(Guid jobId) get's called. A third folder have to be created "in-progress", File have to be copied from backup to in-progress. Who does it?
So it's unclear for me where Filesystem things have to be handled if the Aggregate can not work correctly without the correct filesystem.
Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. But it seems not 100% correct?!
And top of this, what is with replaying?
You can't replay things/files that were moved, you have to somehow simulate that a customer sends the file to the ftp folder...
Thankful for answers
The file have to be moved from ftp to backup. Inside the CommandHandler or inside the Aggregate or on JobCreated event?
In situations like this, I move the file to the destination folder in the Application service that sends the command to the Aggregate (or that calls a command-like method on the Aggregate, it's the same) before the command is sent to the Aggregate. In this way, if there are some problems with the file-system (not enough permissions or space is not available etc) the command is not sent. These kind of problems should not reach our Aggregate. We most protect it from the infrastructure. In fact we should keep the Aggregate isolated from anything else; it must contain only pure business logic that is used to decide what events get generated.
Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. But it seems not 100% correct?!
Indeed, this seems like over engineering to me. You must KISS.
StartJob(Guid jobId) get's called. A third folder have to be created "in-progress", File have to be copied from backup to in-progress. Who does it?
Whoever's calling the StartJob could do the moving, before the StartJob gets called. Again, keep the Aggregate pure. In this case it depends on your framework/domain details.
And top of this, what is with replaying? You can't replay things/files that where moved, you have to somehow simulate that a customer sends the file to the ftp folder...
The events are loaded from the event store and replayed in two situations:
Before every command gets sent to the Aggregate, the Aggregate Repository loads all the events from the event store then it applies every one of them to the Aggregate, probably calling some applyThisEvent(TheEvent) method on the Aggregate. So, this methods should be with no side effects (pure) otherwise you change the outside world again and again at every command execution and you don't want that.
The read-models (the projections, the query-models) that present data to the user listen to those events and update the database tables that hold the data that the users see. The events are sent to those read-models after they are generated and every time the read-models are being recreated. When you invent a new read-model, you must pass it all the events that were previous generated by the aggregates in order to build the correct/complete state on them. If your read-model's event listeners have side effects what do you think happens when you replay those long past events? The outside world is modified again and again and you don't want that! The read-models only interpret the events, they don't generate other events and they don't change the outside world.
There is a special third case when events reach another type of model, a Saga. A Saga must receive an event only once! This is the case that you thought to use in Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. You could do this in your case but is not KISS.
I have a very complicated job process and it's not 100% clear to me where to handle what. I don't want to have code, it just the question who is responsible for what.
The usual answer is that the domain model -- aka the "aggregate" makes decisions, and saves them. Observing those decisions, some event handler induces side effects.
And top of this, what is with replaying? You can't replay things/files that where moved, you have to somehow simulate that a customer sends the file to the ftp folder...
You replay the events to the aggregate, so that it is restored to the state where it made the last decision. That's a separate concern from replaying the side effects -- which is part of the motivation for handling the side effects elsewhere.
Where possible, of course, you prefer to have the side effects be idempotent, so that a duplicated message doesn't create a problem. But notice that from the point of view of the model, it doesn't actually matter whether the side effect succeeds or not.

Data store in an actor system

I'm working on an event processing pipeline based on Akka actors. I have 3 actors for each step of the pipeline: FilterWorker, EnrichWorker and ProcessWorker; plus a supervisor actor that makes sure the events are sent from one step of the pipeline to the next.
The enrich step might need to query some external database for extra data or even create new data that I'll want to persist. For example, the enrich step of a web analytics system might want to enrich a click event with the user that made the click and store that user information in a database.
Keeping in mind that example, I see the following options:
1.Use a singleton; e.g. UserStore that keeps in memory all the users gathered so far and saves them to the database once in a while; has all the logic to fetch users that are not yet in memory. Doesn't seem like a good idea to use a singleton in an actor system however (?).
Use a store actor. Use tell to add a new user and ask to fetch it.
Is there a better pattern for this?
Thanks!
In order to not leave this unanswered, I went with my second option and johanandren's suggestion of having an Actor fill the data store role. Works pretty well!

What are some options for keeping track of temporary results and re-use them after restart, in case the program dies while running?

(Suggestions for improving the title of this question are welcomed.)
I have a perl script that uses web APIs to fetch a user's "liked" posts on various sites (tumblr, reddit, etc.), then download some portion of each post (for example, an image that's linked from the post).
Right now, I have a JSON-encoded file that keeps track of the posts that have already been fetched (for tumblr, it just records the total number of likes, for reddit, it records, the "id" of the last post fetched) so that the script can just pick up with the newly "liked" items the next time it runs. This means that after the program is finished archiving a new batch of links, the new "stopping point" is recorded in the JSON file.
However, if the program croaks for some reason (or is killed with ctrl+c, say), the progress is not recorded (since the progress is only recorded at the end of the "fetching"). So the next time the program runs, it looks in the tracking file and gets the last recorded stopping point (the last time it successfully completed fetching and recorded the progress), and picks up there again, downloading duplicates up to the point where it croaked the last time.
My question is, what's the best (i.e. simplest, most efficient, take your pick--I'm open to options here) way to record progress with each incremental archived item, so that if the program dies for some reason, it always knows exactly where to pick up where it left off? Adapting the current method (literally print-ing to the tracking file at the end of each fetch) to do the same thing after each individual item is definitely not the best solution because it's got to be pretty inefficient.
Edited for clarity
Let me make clearer that the file used to track the downloaded posts is not large, and does not grow appreciably with each "fetch" operation. There is only one element for each api (tumblr, etc.) that contains either the total number of likes for the account (in other words, the number that we have already downloaded, so we query the api for the current total, subtract the number in the file, and we know how many new items to fetch), or the ID of the last item fetched (reddit uses this, so we can ask the api for all items "after" the one in the file and only get the new stuff).
My problem is not an ever growing list of fetched posts, rather it is writing to the tracking file every time one single post is downloaded (and there could be thousands of posts downloaded in a single run).
Some ideas I would consider:
Write to the file more often or use an interrupt handler to 'safely' handle the interrupt signal. When it's called, allow the script to write to your file so it's as current as possible and elegantly quit.
Use a better storage mechanic than writing to a flat file. I would consider, depending on the need, using a database to store the ids. I groan when database starts getting in play due to the complexities it adds, however it doesn't have to be. I've used SQLite for queuing but also consider DBD::CSV which just writes to a CSV while allowing SQL syntax (haven't used it myself). In your code you could then check if the id is already in the database and know to skip it. I would imagine that SQLite is also more 'efficient' than reading/writing a flat file and, imo, would be easier to code than having to write code to read a file yourself.
I'd just use a hash, tied to an NDBM file, to keep track of what is loaded and what isn't.
When you start a new batch of URLs, you delete the NDBM file.
Then, in your code, at the start of the program, you do
tie(%visited, 'NDBM_File', 'visitedurls', O_RDWR|O_CREAT, 0666)
(don't worry about the O_CREAT, the file will remain intact if it exists unless you pass O_TRUNC as well)
Assuming your main loop looks like this:
while ($id=<INFILE>) {
my $url=id_to_url($id);
my $results=fetch($url);
save_results($url, $results);
}
you change that to
while ($id=<INFILE>) {
my $url=id_to_url($id);
my $results;
if ($visited{$url}) {
$results=$visited{$url};
} else {
$results=fetch($url);
$visited{$url}=$results;
}
save_results($url, $results);
}
So whenever you fetch a new URL, you write the results to the NDBM file, and whenever you restart your program, the results that have already been fetched will be in the NDBM file and fetched from there instead of reading the URL.
This assumes $results is a scalar, else you won't be able to store/retrieve it in this way. But as you're producing JSON anyway, the "partial json" for each URL will probably be what you want to store.

Using Mojo Event Loop for a long-running script processing huge text files?

I have a script implemented as a Mojo::Command.
It reads a huge text file and extracts data from it. The file contains simple tab separated (C/TSV) records. One record per line.
How can I use the Mojo Event loop to store those records in small files - one file per record - so my script does not wait for each record to be stored but continues to the next record.
Here is a stripped down example:
package My::task;
use Mojo::Base 'Mojolicious::Command';
#in My::task::run
#use Text::CSV to open and read the file
while (!$csv->eof()) {
my $row = $csv->getline($fh)
do_something_time_consuming_and_store_the_record_somewhere($row)
}
I was thinking Mojo Event Loop can be used and avoid forking/threading.
I used successfully previously Parallel::Forker, but I was thinking Mojo has what to offer to speedup the execution.
Is that possible? How?
It depends on the nature of do_something_time_consuming. If that is something that has your process CPU-busy, then you're looking for parallelism, which an event loop doesn't try to give you. In that event you might want to feed each row to redis (via mojo::redis) and have worker processes consume, process, store each record. Then throughput is down to how many parallel workers you can run.
On the other hand, if do_something_time_consuming involves a lot of waiting, eg post to a web service and wait for results, then an event loop (incl mojo's) can be a big win, and handle the concurrency that you want. It's hard to guess which of the non-blocking UserAgent examples is closest to your scenario, since you're short on detail. The gist is to create a callback that does what you want (eg store_the_record_somewhere) when it gets a response back from the remote service.