Optimize Task Setup Time¶

Problem¶

You want to optimize the amount of time used to call an expensive setup method for a group of tasks.

Solution¶

Create a user defined task environment and place the common setup logic inside the setup() method. Set the TaskEnvironment with the job’s set_task_environment.

from agiparallel.client import *
from agiparallel.infrastructure import *
from agiparallel.infrastructure.TaskEnvironment import TaskEnvironment


class SimpleTask(object):
    def execute(self):
        pass


class MyEnvironment(TaskEnvironment):
    def __init__(self):
        # Set a unique_id to reuse the same task environment
        # between jobs and reduce host process setup overhead
        super().__init__()
        self.unique_id = uuid.UUID("94CFE45C-5EA9-4592-BB64-B83A5E72DB77")

    def setup(self):
        pass

    def teardown(self):
        pass


def submit_job_using_task_environment(client):
    import time
    start = time.time()
    job = client.create_job()
    job.add_task(SimpleTask())
    job.task_environment = MyEnvironment()
    job.submit()
    job.wait_until_done()
    return time.time() - start


if __name__ == "__main__":
    with ClusterJobScheduler("localhost") as client:
        client.connect()
        elapsed = submit_job_using_task_environment(client)
        print("First job took " + str(elapsed * 1000) + "ms")

        elapsed = submit_job_using_task_environment(client)
        print("Second job took " + str(elapsed * 1000) + "ms")

Discussion¶

For a thorough treatment of the Task Environment concept, read the section Task Environment.

Having common code run before and after one or more tasks can be useful for grouping methods that only need to run once before a task. A task environment, much like a task, is a user defined class that gets serialized and called in the host. When a host starts a job, the task environment’s setup() method is called once before any tasks are executed. When the host gets recycled, the task environment’s teardown() method is called before the process exits.

The task environment can be reused for multiple jobs. If the task environment’s identification is the same, the setup() will only be called once. Read more about task environment idenfification here.

Note

A common trap during development dealing with task environments happens often enough that it deserves special attention. Because the assemblies in a task environment are not reloaded in the host process, changes to task environment assemblies with the same identification will not take effect during the lifetime of the host process. In other words, the host process caches the task environment and any changes made to the assemblies in the task environment will not be visible until the host process recycles. There are two workarounds for this issue:

Create a new identification every time the assemblies are changed.
Restart the coordinator every time the assemblies are changed.

Optimize Task Setup Time¶

Problem¶

Solution¶

Discussion¶

See Also¶

Reference¶

Other Resources¶

Table of Contents

Previous topic

Next topic

This Page