Synchronize Tasks On Same Machine¶
Problem¶
You need to synchronize tasks such that only one task runs in a certain critical section.
Solution¶
Use Python’s Threading module. Read more here.
1from agiparallel.client import ClusterJobScheduler
2from socket import gethostname
3from datetime import datetime
4from time import sleep
5from os import getpid
6from threading import Lock
7
8
9class Task:
10 def __init__(self, task_number):
11 self.TaskNumber = task_number
12
13 def execute(self):
14 mutex = Lock()
15
16 # acquire the mutex
17 mutex.acquire()
18
19 try:
20 self.user_resource()
21 finally:
22 # release the mutex
23 mutex.release()
24
25 def user_resource(self):
26 # access the resource
27 self.set_property("resource-start", "Agent={0} PID={1} accessing resource at {2}".format(gethostname(), getpid(), datetime.now()))
28 sleep(5)
29 self.set_property("resource-stop", "Agent={0} PID={1} done accessing resource at {2}".format(gethostname(), getpid(), datetime.now()))
30
31
32def synchronize_tasks_example():
33 with ClusterJobScheduler("localhost") as scheduler:
34 scheduler.connect()
35 job = scheduler.create_job()
36
37 for i in range(8):
38 job.add_task(Task(i + 1))
39
40 job.submit()
41 job.wait_until_done()
42
43 for task in job.tasks:
44 print(task.properties["resource-start"])
45 print(task.properties["resource-stop"])
46 print()
47
48 # output should resemble:
49 # Agent=AgentMachine1 PID=15904 accessing resource at 2019-10-07 09:12:24.410432
50 # Agent=AgentMachine1 PID=15904 done accessing resource at 2019-10-07 09:12:29.411153
51 #
52 # Agent=AgentMachine1 PID=16748 accessing resource at 2019-10-07 09:12:24.398437
53 # Agent=AgentMachine1 PID=16748 done accessing resource at 2019-10-07 09:12:29.399157
54 #
55 # Agent=AgentMachine1 PID=508 accessing resource at 2019-10-07 09:12:24.452412
56 # Agent=AgentMachine1 PID=508 done accessing resource at 2019-10-07 09:12:29.453130
57 #
58 # Agent=AgentMachine1 PID=11680 accessing resource at 2019-10-07 09:12:24.452412
59 # Agent=AgentMachine1 PID=11680 done accessing resource at 2019-10-07 09:12:29.453130
60 #
61 # Agent=AgentMachine1 PID=20464 accessing resource at 2019-10-07 09:12:24.463407
62 # Agent=AgentMachine1 PID=20464 done accessing resource at 2019-10-07 09:12:29.464126
63 #
64 # Agent=AgentMachine1 PID=728 accessing resource at 2019-10-07 09:12:24.467405
65 # Agent=AgentMachine1 PID=728 done accessing resource at 2019-10-07 09:12:29.468124
66 #
67 # Agent=AgentMachine1 PID=10048 accessing resource at 2019-10-07 09:12:24.479399
68 # Agent=AgentMachine1 PID=10048 done accessing resource at 2019-10-07 09:12:29.480117
69 #
70 # Agent=AgentMachine1 PID=2372 accessing resource at 2019-10-07 09:12:24.497390
71 # Agent=AgentMachine1 PID=2372 done accessing resource at 2019-10-07 09:12:29.498112
72
73
74if __name__ == "__main__":
75 synchronize_tasks_example()
Discussion¶
A frequently occurring pattern is synchronizing multiple access to a resource on a local machine. That is, only one host per machine should access the resource at a time. Examples include:
Updating an entry in the Windows Registry.
Writing a file to a common location.
Deleting a file or directory from a common location all hosts share.
Checking if an expensive operation has been done to make sure it is performed only once.