Click or drag to resize

Internals

Understanding the task, host, and agent life cycle can be crucial to effectively writing applications. Most of these details are very technical and are not needed to successfully write an application. The information in this section can prove useful in helping to understand why the system behaves in certain ways.

In this section the following is explained:

Agent selection

The coordinator internally has a task queue. When a task is submitted to the coordinator, the task gets put in the queue. The task's order in the queue is based on the job's priority (Job.Priority). The coordinator takes the first task from the queue whose preconditions (Job.getTaskPreconditions()) are satisfied.

Once a task is taken from the queue, the coordinator assigns the task to an available agent based on the job's specified scheduling algorithm (Job.AgentSelectionPreference). The default agent selection algorithm first sort agents based on priority, then it cycles through each agent until the agent's maximum task capacity is reached.

Process isolation

Tasks are run in hosts, which are isolated processes. If a host process crashes, it will not affect the other host processes or the agent.

Task workflow

When a host starts up, the host calls TaskEnvironment.setup() on the task environment it receives. Then the host loops for two states:

  1. If the agent sends a task to the host, call Task.execute(), and then send the result back to the client. If the task fails, then the host is shutdown.
  2. If the agent recycles the host, then call TaskEnvironment.teardown() and shut down the host process.

The chart below shows this state diagram:

Host Logical State Diagram
Host assignment and recycling

Most of the workflow associated with the agent centers around comparing task environment identifications and checking whether a host has already loaded the environment. As a refresher, this topic explains how task environment identifications are compared.

When an agent is assigned a task, its workflow is as follows. First, the agent checks the task environment identification of the task. If an idle host process with that task environment loaded already exists, the agent sends the task to that host. If no hosts have the specified task environment loaded, the agent checks if it can start another host. If the agent cannot start another host because an empty host slot is not available, the agent will cause one of the idle host processes to exit and start a new host process in its place. This procedure is called recycling a host. The new host is then given the new task environment

Note Note

Here are a few scenarios and a description of what happens internally in each situation. For simplicity, the task environment identification GUIDS have been replaced with letters. To learn more about the task environment identification comparison mechanism, read this.

Situation: One idle host (task environment id "A"), One empty host slot. Task is received with environment id "A".
Agent will... assign the task to the idle host.

Situation: One idle host (task environment id "A"), One empty host slot. Task is received with environment id "B".
Agent will... create a new host in the empty slot. New host will be sent environment id "B".

Situation: Two idle hosts (task environment id "A" and "B" respectively). Task is received with environment id "C".
Agent will... recycle one of the idle hosts and start a new host with environment id "C".

Situation: Task completes on a host (host completed 1 task so far). Task Environment's HostRecycleSettings.setFixedNumberOfTasks() option is set to 32.
Agent will... do nothing. Host task count is still below HostRecycleSettings.getFixedNumberOfTasks(). Host process remains idle.

Situation: Task completes on a host (host completed 32 tasks so far). Task Environment's HostRecycleSettings.setFixedNumberOfTasks() option is set to 32.
Agent will... recycle the host.

After a host completes a task, the agent will check whether the host has met any of the requirements of its task environment's HostRecycleSettings. Various recycling strategies are set through the TaskEnvironment.RecycleSettings's properties. Recycling strategies fall into three categories: time, memory, and job based.

Time Based

  • HostRecycleSettings.RegularInterval: Continually recycles hosts after a set number of milliseconds, measured from the time the host started.

    Host Recycle Settings - RegularInterval
    Above, one job with five tasks is submitted to run on two hosts. The task environment’s HostRecycleSettings.RegularInterval property is set to a designated period. The hosts will recycle after the specified period of time, as measured from when the host process first started. If a host is recycled, the regular interval is measured from the time the new host process starts.

  • HostRecycleSettings.IdleTimeout: Recycles a host after it has spent a set number of milliseconds not executing any tasks.

    Host Recycle Settings - Idle Timeout
    Above, a job with two tasks is submitted to run on a host. The task environment’s HostRecycleSettings.IdleTimeout property is set to a period of time. The host will recycle after the designated period, measured from the time the last task finished executing.

  • HostRecycleSettings.SpecificTimes: Recycles hosts at specified times of day (input as strings, times separated by commas ',').

    Host Recycle Settings - Specific Times
    Above, one job with seven tasks is submitted to run on two hosts. The task environment’s HostRecycleSettings.SpecificTimes property is set to a list of three times. Each started host is recycled following the specified times after it has finished executing its most recent task.

Memory Based

Job Based

  • HostRecycleSettings.FixedNumberOfTasks: Recycles after the host completes a set number of tasks.

    Host Recycle Settings - Fixed Number of Tasks
    Above, a job with seven tasks is submitted to run on two hosts. The task environment’s HostRecycleSettings.FixedNumberOfTasks property is set to three. Each host will recycle after every three tasks it executes. If a host executes less than three tasks, the host process remains idle.

  • HostRecycleSettings.JobCompletion: Recycles the hosts used for a given job once the job completes.

    Host Recycle Settings - Job Completion
    Above, two jobs, each with three tasks and identical task environments, are submitted to the coordinator and ran on two available hosts. The task environment’s HostRecycleSettings.JobCompletion property is set to true. Each host exits after all of its Job A tasks are completed. New host processes are started to execute Job B’s tasks, after which the hosts exit.

  • HostRecycleSettings.TaskEnvironmentNoLongerReferenced: Recycles when the host's task environment is no longer needed. If two jobs have the same task environment, the hosts will not recycle when moving on to the second job since the task environment is still in use. The host will recycle once the environment is no longer used (i.e. both jobs complete).

    Host Recycle Settings - Task Environment No Longer Referenced
    Above, two jobs, each with three tasks and identical task environments, are submitted to the coordinator and ran on two available hosts. The task environment's HostRecycleSettings.TaskEnvironmentNoLongerReferenced property is set to true. The two host processes stay alive until all of Job A and Job B's tasks have been completed, after which no more tasks require the shared task environment and the host processes exit.

The code sample below highlights the differences between the JobCompletion and TaskEnvironmentNoLongerReferenced recycling strategies:

Java
package stkparallelcomputingserversdk;

import java.util.UUID;
import agi.parallel.client.ClusterJobScheduler;
import agi.parallel.client.IJobScheduler;
import agi.parallel.client.Job;
import agi.parallel.infrastructure.HostRecycleSettings;
import agi.parallel.infrastructure.Task;

public class RecyclingStrategies {
    public static void main(String[] args) {
        try (IJobScheduler scheduler = new ClusterJobScheduler("localhost")) {
            scheduler.connect();

            // Create HostRecycleSetting to use in TaskEnvironment
            HostRecycleSettings settingsTaskEnvironmentNoLongerReferenced = new HostRecycleSettings();
            settingsTaskEnvironmentNoLongerReferenced.setTaskEnvironmentNoLongerReferenced(true);
            HostRecycleSettings settingsJobCompletion = new HostRecycleSettings();
            settingsJobCompletion.setJobCompletion(true);

            // Create jobs and set TaskEnvironments
            Job job1 = scheduler.createJob();
            Job job2 = scheduler.createJob();
            Job job3 = scheduler.createJob();
            Job job4 = scheduler.createJob();
            Job job5 = scheduler.createJob();
            Job job6 = scheduler.createJob();
            job1.getTaskEnvironment().setRecycleSettings(settingsTaskEnvironmentNoLongerReferenced);
            job2.getTaskEnvironment().setRecycleSettings(settingsTaskEnvironmentNoLongerReferenced);
            job3.getTaskEnvironment().setRecycleSettings(settingsTaskEnvironmentNoLongerReferenced);
            job4.getTaskEnvironment().setRecycleSettings(settingsTaskEnvironmentNoLongerReferenced);
            job5.getTaskEnvironment().setRecycleSettings(settingsJobCompletion);
            job6.getTaskEnvironment().setRecycleSettings(settingsJobCompletion);

            // Jobs 1 and 2 have identical IDs
            job1.getTaskEnvironment().setId(UUID.fromString("A457B1FA-BE37-49b4-8B95-A53A16E2AA5F"));
            job2.getTaskEnvironment().setId(UUID.fromString("A457B1FA-BE37-49b4-8B95-A53A16E2AA5F"));
            // Jobs 3 and 4 have different IDs
            job3.getTaskEnvironment().setId(UUID.fromString("B457B1FA-BE37-49b4-8B95-A53A16E2AA5F"));
            job4.getTaskEnvironment().setId(UUID.fromString("C457B1FA-BE37-49b4-8B95-A53A16E2AA5F"));
            // Jobs 5 and 6 have identical IDs
            job5.getTaskEnvironment().setId(UUID.fromString("D457B1FA-BE37-49b4-8B95-A53A16E2AA5F"));
            job6.getTaskEnvironment().setId(UUID.fromString("D457B1FA-BE37-49b4-8B95-A53A16E2AA5F"));

            for (int i = 0; i < 100; i++) {
                job1.addTask(new BlankTask());
                job2.addTask(new BlankTask());
                job3.addTask(new BlankTask());
                job4.addTask(new BlankTask());
                job5.addTask(new BlankTask());
                job6.addTask(new BlankTask());
            }

            /*
             * Hosts will be recycled after both jobs complete because the task environments
             * are identical.
             */
            // Submit Jobs with identical Environment (TaskEnvironmentNoLongerReferenced)
            job1.submit();
            job2.submit();
            job1.waitUntilDone();
            job2.waitUntilDone();

            /*
             * Hosts will be recycled after each individual job completes because the task
             * environments are different.
             */
            // Submit Jobs with different Environment (TaskEnvironmentNoLongerReferenced)
            job3.submit();
            job4.submit();
            job3.waitUntilDone();
            job4.waitUntilDone();

            /*
             * Hosts will be recycled after each individual job completes even though their
             * task environments are identical.
             */
            // Submit Jobs with identical Environment (JobCompletion)
            job5.submit();
            job6.submit();
            job5.waitUntilDone();
            job6.waitUntilDone();
        }
    }

    public static class BlankTask extends Task {
        @Override
        public void execute() {
        }
    }

}

Changes to hosts will be displayed in the AgentService.exe window. Shown below is the output after jobs 1 and 2 finish running (each job has 100 tasks).

AgentService Host Recycling
See Also

STK Parallel Computing Server 2.9 API for Java