How are Toloka task pages created - pool

I've uploaded a file with tasks to a pool, but the pool page has 0 task pages and the pool doesn't start. How can I fix it?

When you upload individual tasks, Toloka automatically groups tasks into sets. For example, if "smart mixing" (the mixer_config key for the pool) is set to 7 main tasks, 2 control tasks, and 1 training task, and you only uploaded the main tasks, the system won't be able to generate task pages using the configuration you set. Upload all types of tasks you are using in the pool or change the mixer settings to start the pool.

Related

How to archive a pool?

I want to archive an old pool in Toloka, which I no longer need. I have created new ones to label new batches and the old ones are just interrupting my work I can't find a button.
In Toloka, there is a restriction on archiving pools with unreviewed or rejected tasks. This is because you can't change the task status in archived pools, like review a submitted response or accept a previously rejected task. You can't archive a pool that contains tasks rejected less than 9 days ago.
https://yandex.com/support/toloka-requester/concepts/pool-archive.html

Check for Agent Copies in Wait Blocks then Free them

I am modeling an development process where projects (Epics) come from the Source and are broken down into their components (Features) that go through development. Once those components (Features) are completed the project (Epic) is complete. Here are the model detail (see screen shot too). enter image description here
Epics agents (main project) leave the source and are copied with a random number of copies by the Split block into Features agents (the components of the main project).
Epics agent is linked bi-directionally with Feature agent. At "On exit copy" the connection is made with "agent.ConnectTo(original)".
The Feature agents then move to one of 2 Service block for development based in the probability set in a SelectOutput.
The Epic moves immediately to a Wait (epicWait) and will remain there until all the copied agents (Feature) have moved through their development Service blocks.
When an Feature leaves the development Service it goes to a Wait block (featureWait1 or 2) where it needs to check to see if all the Feature agents that went to the other development block are complete. If so the Feature agent should free itself, the other Feature agents in the other featureWait, and the matching Epic agent that is waiting in the epicWait. This signifies the project (Epic) is complete.
So my questions are how do I write the On Enter function to check the other featureWait block for other linked Feature agents? And then how do I free them to signify the epic is now complete? Thanks,
What I would do instead of using featureWait1 and 2, on the sinkFeature1 and 2 I would do
if(agent.EpicLink.getConnectedAgent().featureLink.getConnectionsNumber()==1){
epicWait.free(agent.EpicLink.getConnectedAgent());
}
agent.EpicLink.disconnect();
where epicLink represents the connection in the feature agent that is the connection to the associated epic

Way to persist process spawned by a task on an agent?

I'm developing an Azure Devops extension with tasks in it. In one of the tasks, I'm starting a process and I'm doing configurations. In another task, I'm accessing the same process API to consume it. This is working perfectly fine, but I notice that after the job is done, my process is killed. I was planning to allow the user to do the configuration on an agent and be able to access it in another job or pipeline.
Is there a way to persist a process on an agent? I feel like the agent is killing every child processes created on cleanup. Where can I find documentation on this?
Edit: I managed to find this thread that talks about a certain Process.clean variable but there's not any more information about it and I didn't find documentation on it.
Your feeling is correct. Agents clean up spawned processes when the job finishes, and that's by design. A single machine can have multiple agents on it, and multiple agents can be running tasks in parallel. What if you have one machine with 10 agents on it, and they all start up this process at once?
IMO, the approach you're taking is suspect. If you need to persist information across jobs, there are numerous ways to do so (for example, an output variable containing JSON) that don't involve spawning a service that continues running outside the scope of the job that started it.

Azure DevOps: how to check if there is items waiting in Pool queue

I wish to monitor private pool queue if there is waiting items in queue. If there is one waiting (which means that there is not enough agents to serve request) - I wish to add more VMs with agents. But I could not find any API endpoint, which will answer me, if there is any items in current pool queue.
I was not able to locate any api, which will be able to tell me how much tasks is in queue for an agent pool currently, so, I found my way around:
Query https://dev.azure.com/{instanceName}/_apis/distributedtask/pools/{poolId}/agents - this will show me how much agents I have and how much of them is online
Query https://dev.azure.com/{instanceName}/_apis/distributedtask/pools/{poolId}/jobrequests - this shows all jobs in this pool, including running one (theirs status will be null).
So, if amount of jobs is lower than amount of online agents - I am OK. As soon as amount of jobs is higher than online agents - I can employ SDK to add more agents in VMSS (until license permits, though)

How to modify the scheduler of Pegasus WMS

I'm interested in scientific workflow scheduling. I'm trying to figure out and modify the existing scheduling algorithm inside Pegasus workflow management system from http://pegasus.isi.edu/, but I don't know where it is and how to do so. Thanks!
Pegasus has a notion of site selection during it's mapping phase where it maps the jobs to the various sites defined in the site catalog. The site selection is explained in the documentation here
https://pegasus.isi.edu/wms/docs/latest/running_workflows.php#mapping_refinement_steps
Internally, there is a site selector interface that you can implement to incorporate your own scheduling algorithms.
You can access the javadoc at
https://pegasus.isi.edu/wms/docs/latest/javadoc/edu/isi/pegasus/planner/selector/SiteSelector.html
There are some implementations included in this package
There is a version of Heft also implemented there. The algorithm is implemented in the the following class.
edu.isi.pegasus.planner.selector.site.heft.Algorithm
Looking at the Heft implementation of site selector will provide you a good template on how to incorporate other site selection algorithms.
However, you need to keep in mind, that Pegasus maps the workflow to various sites and then hands over the workflow to Condor DAGMan for execution. Condor DAGMAn looks at what jobs are ready to run and then releases them to local Condor queue ( managed by Condor Schedd). The jobs are then submitted to the remote sites by Condor Schedd. The actual node on which a job gets executed is determined the by local resource scheduler on the site. For example, if you submit the jobs in a workflow to a site that is running PBS , then PBS decides the actual nodes on which a job runs.
In case of Condor you can associate requirements with your jobs that can help you steer jobs to specific nodes etc.
With a workflow, you can also associate job priorities that determine the priority of the job in the local Condor Queue on the submit host. You can use that to control what job gets submitted by schedd first if there are multiple jobs in the queue.