Internals

Understanding the task, host, and agent life cycle can be crucial to effectively writing applications. Most of these details are very technical and are not needed to successfully write an application. The information in this section can prove useful in helping to understand why the system behaves in certain ways.

Overview

Agent Selection

The coordinator internally has a task queue. When a task is submitted to the coordinator, the task gets put in the queue. The task’s order in the queue is based on the job’s priority (Job.priority). The coordinator takes the first task from the queue whose preconditions (Job.task_preconditions) are satisfied.

Once a task is taken from the queue, the coordinator assigns the task to an available agent based on the job’s specified scheduling algorithm (Job.agent_selection_preference). The default agent selection algorithm first sort agents based on priority, then it cycles through each agent until the agent’s maximum task capacity is reached.

Process Isolation

Tasks are run in hosts, which are isolated processes. If a host process crashes, it will not affect the other host processes or the agent.

Task Workflow

When a host starts up, the host calls TaskEnvironment.setup() on the task environment it receives. Then the host loops for two states:

  1. If the agent sends a task to the host, call Task.execute(), and then send the result back to the client. If the task fails, then the host is shutdown.

  2. If the agent recycles the host, then call TaskEnvironment.teardown() and shut down the host process.

The chart below shows this state diagram:

HostLogicalStateDiagram

Host Assignment and Recyling

Most of the workflow associated with the agent centers around comparing task environment identifications and checking whether a host has already loaded the environment. As a refresher, this topic explains how task environment identifications are compared.

When an agent is assigned a task, its workflow is as follows. First, the agent checks the task environment identification of the task. If an idle host process with that task environment loaded already exists, the agent sends the task to that host. If no hosts have the specified task environment loaded, the agent checks if it can start another host. If the agent cannot start another host because an empty host slot is not available, the agent will cause one of the idle host processes to exit and start a new host process in its place. This procedure is called recycling a host. The new host is then given the new task environment.

Note

Here are a few scenarios and a description of what happens internally in each situation. For simplicity, the task environment identification GUIDS have been replaced with letters. To learn more about the task environment identification comparison mechanism, read this.

Situation: One idle host (task environment id “A”), One empty host slot. Task is received with environment id “A”.

Agent will… assign the task to the idle host.

Situation: One idle host (task environment id “A”), One empty host slot. Task is received with environment id “B”.

Agent will… create a new host in the empty slot. New host will be sent environment id “B”.

Situation: Two idle hosts (task environment id “A” and “B” respectively). Task is received with environment id “C”.

Agent will… recycle one of the idle hosts and start a new host with environment id “C”.

Situation: Task completes on a host (host completed 1 task so far). Task Environment’s HostRecycleSettings.fixed_number_of_tasks option is set to 32.

Agent will… do nothing. Host task count is still below HostRecycleSettings.fixed_number_of_tasks. Host process remains idle.

Situation: Task completes on a host (host completed 32 tasks so far). Task Environment’s HostRecycleSettings.fixed_number_of_tasks option is set to 32.

Agent will… recycle the host.

After a host completes a task, the agent will check whether the host has met any of the requirements of its task environment’s HostRecycleSettings. Various recycling strategies are set through the TaskEnvironment.recycle_settings’s properties. Recycling strategies fall into three categories: time, memory, and job based.

Time Based

  • HostRecycleSettings.regular_interval: Continually recycles hosts after a set number of milliseconds, measured from the time the host started.

    HostRecyclingSettings_RegularInterval

    Above, one job with five tasks is submitted to run on two hosts. The task environment’s HostRecycleSettings.regular_interval property is set to a designated period. The hosts will recycle after the specified period of time, as measured from when the host process first started. If a host is recycled, the regular interval is measured from the time the new host process starts.

  • HostRecycleSettings.idle_timeout: Recycles a host after it has spent a set number of milliseconds not executing any tasks.

    HostRecyclingSettings_IdleTimeout

    Above, a job with two tasks is submitted to run on a host. The task environment’s HostRecycleSettings.idle_timeout property is set to a period of time. The host will recycle after the designated period, measured from the time the last task finished executing.

  • HostRecycleSettings.specific_times: Recycles hosts at specified times of day (input as strings, times separated by commas ‘,’).

    HostRecyclingSettings_SpecificTimes

    Above, one job with seven tasks is submitted to run on two hosts. The task environment’s HostRecycleSettings.specific_times property is set to a list of three times. Each started host is recycled following the specified times after it has finished executing its most recent task.

Memory Based

Job Based

  • HostRecycleSettings.fixed_number_of_tasks: Recycles after the host completes a set number of tasks.

    HostRecyclingSettings_FixedNumberOfTasks

    Above, a job with seven tasks is submitted to run on two hosts. The task environment’s HostRecycleSettings.fixed_number_of_tasks property is set to three. Each host will recycle after every three tasks it executes. If a host executes less than three tasks, the host process remains idle.

  • HostRecycleSettings.job_completion: Recycles the hosts used for a given job once the job completes.

    HostRecyclingSettings_JobCompletion

    Above, two jobs, each with three tasks and identical task environments, are submitted to the coordinator and ran on two available hosts. The task environment’s HostRecycleSettings.job_completion property is set to True. Each host exits after all of its Job A tasks are completed. New host processes are started to execute Job B’s tasks, after which the hosts exit.

  • HostRecycleSettings.task_environment_no_longer_referenced: Recycles when the host’s task environment is no longer needed. If two jobs have the same task environment, the hosts will not recycle when moving on to the second job since the task environment is still in use. The host will recycle once the environment is no longer used (i.e. both jobs complete).

    HostRecyclingSettings_TaskEnvironmentNoLongerReferenced

    Above, two jobs, each with three tasks and identical task environments, are submitted to the coordinator and ran on two available hosts. The task environment’s HostRecycleSettings.task_environment_no_longer_referenced property is set to True. The two host processes stay alive until all of Job A and Job B’s tasks have been completed, after which no more tasks require the shared task environment and the host processes exit.

See Also

Reference

Other Resources