To do this, I tested multiprocessing and ParallelPython modules of python. Without using these modules, it took 38 seconds to perform the task whereas with the help of these modules, it went down to 29 seconds(multiprocessing) and 30 seconds (ParallelPython). Not a great deal but better than nothing. By the way, ParallelPython is way too complicated compared to multiprocessing.
Here is the code for ParallelPython:
import translator
import os
from utils.pp import pp
base = "/some/path"
organisms = [ "organism1", "organism2", ...]
def convert_organism(base, organism):
t = translator.BiogridOspreyTranslator()
# uses os module here
t.translate()
if __name__ == '__main__':
job_server = pp.Server(ppservers=())
jobs = [(organism, job_server.submit(convert_organism, (base, organism,), (), ("os","translator",))) for organism in organisms]
for organism, job in jobs:
job()
ParallelPython requires you to tell him the modules the functions requires. I didn't like that.
And here is the code for multiprocessing:
from translator import *
import os
from multiprocessing import Pool
base = "/some/path"
organisms = [ "organism1", "organism2", ...]
def convert_organism(organism):
t = BiogridOspreyTranslator()
# uses os module here
t.translate()
if __name__ == '__main__':
pool = Pool(processes = 2)
pool.map(convert_organism, organisms)
No comments:
Post a Comment