Problem

You want to optimize the amount of time used to call an expensive setup method for a group of tasks.

Solution

Create a user defined task environment and place the common setup logic inside the Setup method. Set the TaskEnvironment on the job's Job.TaskEnvironment.

Java  Copy imageCopy
package stkscalabilitysdk.howto;


import java.util.UUID;

import agi.parallel.client.ClusterJobScheduler;
import agi.parallel.client.IJobScheduler;
import agi.parallel.client.Job;
import agi.parallel.infrastructure.Task;
import agi.parallel.infrastructure.TaskEnvironment;

public class TaskEnvironmentExample {
    public static void main(String[] args) {
        IJobScheduler scheduler = new ClusterJobScheduler("localhost");
        try {
            scheduler.connect();

            // Job 1
            long elapsed = submitJobUsingTaskEnvironment(scheduler);
            System.out.println("First job took " + elapsed + " ms");

            // Job 2
            elapsed = submitJobUsingTaskEnvironment(scheduler);
            System.out.println("Second job took " + elapsed + " ms");
        } finally {
            scheduler.dispose();
        }

        /*
         * The output of the application should resemble:
         * First job took 2514 ms
         * Second job took 13 ms
         */
    }

    private static long submitJobUsingTaskEnvironment(IJobScheduler scheduler) {
        // Time how long it takes
        long startTime = System.nanoTime();
        Task task = new SimpleTask();
        Job job = scheduler.createJob();
        job.addTask(task);

        // Make sure you set Task Environment
        job.setTaskEnvironment(new SimpleTaskEnvironment());
        job.submit();
        job.waitUntilDone();

        long endTime = System.nanoTime();
        long duration = endTime - startTime;
        return duration / 1000000;
    }

    public static class SimpleTask extends Task {
        public SimpleTask() {}

        @Override
        public void execute() {}
    }

    public static class SimpleTaskEnvironment extends TaskEnvironment {
        public SimpleTaskEnvironment() {
            // Setting the Task Environment to a constant guid allows the
            // task environment to be reused past the lifetime of a job.
            // Because the task environment is not recycled, any changes to the task or task environment assemblies may not be visible.
            // During development, you may want to set the GUID to a different value everytime you update your code so that your
            // changes appear immediately.
            setId(UUID.fromString("28C296A2-D105-4e9c-83FA-5E6FEB57842E"));
        }

        @Override
        public void setup() {
            // The Setup method will be called once before tasks are executed on a host.
        }

        @Override
        public void teardown() {
            // The Teardown method will be called once when the host gets recycled.
        }
    }
}

Discussion

For a thorough treatment of the Task Environment concept, read the section here.

Having common code run before and after one or more tasks can be useful for grouping methods that only need to run once before a task. A task environment, much like a task, is a user defined class that gets serialized and called in the host. When a host starts a job, the task environment's setup() method is called once before any tasks are executed. When the host gets recycled, the task environment's teardown() method is called before the process exits.

The task environment can be reused for multiple jobs. In the code example above, notice how the instance of the task environment is assigned to two jobs, but the setup method was called only once. This happens because the task environment's identification is identical for the two jobs. For more information on how task environment identifications are compared and the policy for reusing task environments, read the section here.

Important

A common trap during development dealing with task environments happens often enough that it deserves special attention. Because the assemblies in a task environment are not reloaded in the host process, changes to task environment assemblies with the same identification will not take effect during the lifetime of the host process. In other words, the host process caches the task environment and any changes made to the assemblies in the task environment will not be visible until the host process recycles. There are two workarounds for this issue:

  • Create a new identification every time the assemblies are changed.
  • Restart the coordinator every time the assemblies are changed.

See Also