Internals |
Understanding the task, host, and agent life cycle can be crucial to effectively writing applications. Most of these details are very technical and are not needed to successfully write an application. The information in this section can prove useful in helping to understand why the system behaves in certain ways.
In this section the following is explained:
The coordinator internally has a task queue. When a task is submitted to the coordinator, the task gets put in the queue. The task's order in the queue is based on the job's priority (Job.Priority). The coordinator takes the first task from the queue whose preconditions (Job.getTaskPreconditions()) are satisfied.
Once a task is taken from the queue, the coordinator assigns the task to an available agent based on the job's specified scheduling algorithm (Job.AgentSelectionPreference). The default agent selection algorithm first sort agents based on priority, then it cycles through each agent until the agent's maximum task capacity is reached.
Tasks are run in hosts, which are isolated processes. If a host process crashes, it will not affect the other host processes or the agent.
When a host starts up, the host calls TaskEnvironment.setup() on the task environment it receives. Then the host loops for two states:
The chart below shows this state diagram:
Most of the workflow associated with the agent centers around comparing task environment identifications and checking whether a host has already loaded the environment. As a refresher, this topic explains how task environment identifications are compared.
When an agent is assigned a task, its workflow is as follows. First, the agent checks the task environment identification of the task. If an idle host process with that task environment loaded already exists, the agent sends the task to that host. If no hosts have the specified task environment loaded, the agent checks if it can start another host. If the agent cannot start another host because an empty host slot is not available, the agent will cause one of the idle host processes to exit and start a new host process in its place. This procedure is called recycling a host. The new host is then given the new task environment
Note |
---|
Here are a few scenarios and a description of what happens internally in each situation. For simplicity, the task environment identification GUIDS have been replaced with letters. To learn more about the task environment identification comparison mechanism, read this.
Situation: One idle host (task environment id "A"), One empty host slot. Task is received with environment id "A".
Situation: One idle host (task environment id "A"), One empty host slot. Task is received with environment id "B".
Situation: Two idle hosts (task environment id "A" and "B" respectively). Task is received with environment id "C".
Situation: Task completes on a host (host completed 1 task so far). Task Environment's HostRecycleSettings.setFixedNumberOfTasks() option is set to 32.
Situation: Task completes on a host (host completed 32 tasks so far). Task Environment's HostRecycleSettings.setFixedNumberOfTasks() option is set to 32.
|
After a host completes a task, the agent will check whether the host has met any of the requirements of its task environment's HostRecycleSettings. Various recycling strategies are set through the TaskEnvironment.RecycleSettings's properties. Recycling strategies fall into three categories: time, memory, and job based.
Time Based
Memory Based
Job Based
package stkparallelcomputingserversdk; import java.util.UUID; import agi.parallel.client.ClusterJobScheduler; import agi.parallel.client.IJobScheduler; import agi.parallel.client.Job; import agi.parallel.infrastructure.HostRecycleSettings; import agi.parallel.infrastructure.Task; public class RecyclingStrategies { public static void main(String[] args) { try (IJobScheduler scheduler = new ClusterJobScheduler("localhost")) { scheduler.connect(); // Create HostRecycleSetting to use in TaskEnvironment HostRecycleSettings settingsTaskEnvironmentNoLongerReferenced = new HostRecycleSettings(); settingsTaskEnvironmentNoLongerReferenced.setTaskEnvironmentNoLongerReferenced(true); HostRecycleSettings settingsJobCompletion = new HostRecycleSettings(); settingsJobCompletion.setJobCompletion(true); // Create jobs and set TaskEnvironments Job job1 = scheduler.createJob(); Job job2 = scheduler.createJob(); Job job3 = scheduler.createJob(); Job job4 = scheduler.createJob(); Job job5 = scheduler.createJob(); Job job6 = scheduler.createJob(); job1.getTaskEnvironment().setRecycleSettings(settingsTaskEnvironmentNoLongerReferenced); job2.getTaskEnvironment().setRecycleSettings(settingsTaskEnvironmentNoLongerReferenced); job3.getTaskEnvironment().setRecycleSettings(settingsTaskEnvironmentNoLongerReferenced); job4.getTaskEnvironment().setRecycleSettings(settingsTaskEnvironmentNoLongerReferenced); job5.getTaskEnvironment().setRecycleSettings(settingsJobCompletion); job6.getTaskEnvironment().setRecycleSettings(settingsJobCompletion); // Jobs 1 and 2 have identical IDs job1.getTaskEnvironment().setId(UUID.fromString("A457B1FA-BE37-49b4-8B95-A53A16E2AA5F")); job2.getTaskEnvironment().setId(UUID.fromString("A457B1FA-BE37-49b4-8B95-A53A16E2AA5F")); // Jobs 3 and 4 have different IDs job3.getTaskEnvironment().setId(UUID.fromString("B457B1FA-BE37-49b4-8B95-A53A16E2AA5F")); job4.getTaskEnvironment().setId(UUID.fromString("C457B1FA-BE37-49b4-8B95-A53A16E2AA5F")); // Jobs 5 and 6 have identical IDs job5.getTaskEnvironment().setId(UUID.fromString("D457B1FA-BE37-49b4-8B95-A53A16E2AA5F")); job6.getTaskEnvironment().setId(UUID.fromString("D457B1FA-BE37-49b4-8B95-A53A16E2AA5F")); for (int i = 0; i < 100; i++) { job1.addTask(new BlankTask()); job2.addTask(new BlankTask()); job3.addTask(new BlankTask()); job4.addTask(new BlankTask()); job5.addTask(new BlankTask()); job6.addTask(new BlankTask()); } /* * Hosts will be recycled after both jobs complete because the task environments * are identical. */ // Submit Jobs with identical Environment (TaskEnvironmentNoLongerReferenced) job1.submit(); job2.submit(); job1.waitUntilDone(); job2.waitUntilDone(); /* * Hosts will be recycled after each individual job completes because the task * environments are different. */ // Submit Jobs with different Environment (TaskEnvironmentNoLongerReferenced) job3.submit(); job4.submit(); job3.waitUntilDone(); job4.waitUntilDone(); /* * Hosts will be recycled after each individual job completes even though their * task environments are identical. */ // Submit Jobs with identical Environment (JobCompletion) job5.submit(); job6.submit(); job5.waitUntilDone(); job6.waitUntilDone(); } } public static class BlankTask extends Task { @Override public void execute() { } } }
Changes to hosts will be displayed in the AgentService.exe window. Shown below is the output after jobs 1 and 2 finish running (each job has 100 tasks).
STK Parallel Computing Server 2.9 API for Java