Thursday, December 16, 2010

ParallelPython vs multiprocessing

Today I'm working on parallelisation of a process I have written. Process is simply a text conversion. I have a translator class, organism directories and files inside these directories. What I want to do is to split the data among the processors and perform the operation faster.

To do this, I tested multiprocessing and ParallelPython modules of python. Without using these modules, it took 38 seconds to perform the task whereas with the help of these modules, it went down to 29 seconds(multiprocessing) and 30 seconds (ParallelPython). Not a great deal but better than nothing. By the way, ParallelPython is way too complicated compared to multiprocessing.

Here is the code for ParallelPython:

import translator
import os
from utils.pp import pp

base = "/some/path"
organisms = [ "organism1", "organism2", ...]

def convert_organism(base, organism):
t = translator.BiogridOspreyTranslator()
# uses os module here
t.translate()

if __name__ == '__main__':
job_server = pp.Server(ppservers=())
jobs = [(organism, job_server.submit(convert_organism, (base, organism,), (), ("os","translator",))) for organism in organisms]
for organism, job in jobs:
job()




ParallelPython requires you to tell him the modules the functions requires. I didn't like that.
And here is the code for multiprocessing:

from translator import *
import os
from multiprocessing import Pool

base = "/some/path"
organisms = [ "organism1", "organism2", ...]

def convert_organism(organism):
t = BiogridOspreyTranslator()
# uses os module here
t.translate()

if __name__ == '__main__':
pool = Pool(processes = 2)
pool.map(convert_organism, organisms)

Tuesday, December 7, 2010

Uploading on sourceforge

Today, we're trying to publish our project Robinviz. It'll be a 1.0-beta version seperately for windows/linux source/binary. Binary files are around 400MB so it's a bit hard to upload it. Google Code -which we use for project management- limits the maximum file size for uploads. So we had to switch to sourceforge. But uploading on http was still a problem. So I found this solution: Rsync over SSH

At first, I found it hard to understand the URL format but then understood and wanted to share it with you. With the following command, I was able to send my 400MB file with resume support:

rsync -avP -e ssh robinviz-1.0-beta-linux-binary.tar.gz aladagemre,robinviz@frs.sourceforge.net:/home/frs/project/r/ro/robinviz/linux-1.0-beta

robinviz-1.0-beta-linux-binary.tar.gz: filename
aladagemre: username on sourceforge
robinviz: project name
/r/ro/robinviz: the first letter, the first two letters and all letters of the project joined by /
linux-1.0-beta: the folder I'd like to put my file in.

Hope it's useful.