Optimize Task Setup Time

Problem

You want to optimize the amount of time used to call an expensive setup method for a group of tasks.

Solution

Create a user defined task environment and place the common setup logic inside the setup() method. Set the TaskEnvironment with the job’s set_task_environment.

 1from agiparallel.client import *
 2from agiparallel.infrastructure import *
 3from agiparallel.infrastructure.TaskEnvironment import TaskEnvironment
 4
 5
 6class SimpleTask(object):
 7    def execute(self):
 8        pass
 9
10
11class MyEnvironment(TaskEnvironment):
12    def __init__(self):
13        # Set a unique_id to reuse the same task environment
14        # between jobs and reduce host process setup overhead
15        super().__init__()
16        self.unique_id = uuid.UUID("94CFE45C-5EA9-4592-BB64-B83A5E72DB77")
17
18    def setup(self):
19        pass
20
21    def teardown(self):
22        pass
23
24
25def submit_job_using_task_environment(client):
26    import time
27    start = time.time()
28    job = client.create_job()
29    job.add_task(SimpleTask())
30    job.task_environment = MyEnvironment()
31    job.submit()
32    job.wait_until_done()
33    return time.time() - start
34
35
36if __name__ == "__main__":
37    with ClusterJobScheduler("localhost") as client:
38        client.connect()
39        elapsed = submit_job_using_task_environment(client)
40        print("First job took " + str(elapsed * 1000) + "ms")
41
42        elapsed = submit_job_using_task_environment(client)
43        print("Second job took " + str(elapsed * 1000) + "ms")

Discussion

For a thorough treatment of the Task Environment concept, read the section Task Environment.

Having common code run before and after one or more tasks can be useful for grouping methods that only need to run once before a task. A task environment, much like a task, is a user defined class that gets serialized and called in the host. When a host starts a job, the task environment’s setup() method is called once before any tasks are executed. When the host gets recycled, the task environment’s teardown() method is called before the process exits.

The task environment can be reused for multiple jobs. If the task environment’s identification is the same, the setup() will only be called once. Read more about task environment idenfification here.

Note

A common trap during development dealing with task environments happens often enough that it deserves special attention. Because the assemblies in a task environment are not reloaded in the host process, changes to task environment assemblies with the same identification will not take effect during the lifetime of the host process. In other words, the host process caches the task environment and any changes made to the assemblies in the task environment will not be visible until the host process recycles. There are two workarounds for this issue:

  • Create a new identification every time the assemblies are changed.

  • Restart the coordinator every time the assemblies are changed.

See Also

Reference

Other Resources