Skip to content

Parallel executions

The Daisi platform supports parallel executions, meaning that multiple workers can be assigned to a Daisi. pydaisi currently supports a straightforward Map and monolithic reduce framework, hence making it easy to address embarrassingly parallel problems.

Warning

This feature is an alpha version and has currently many limitations.

Set and monitor workers

By default, every Daisi is assigned 1 worker.

daisi.workers.set will set the number of workers to the specified value, i.e. if the available worker is too many, it will delete the extra workers, vice versa.

Warning

This setting is for the Daisi, meaning that it will affect all users.

import pydaisi as pyd

daisi = pyd.Daisi('exampledaisies/Add Two Numbers')

# before set workers
print(daisi.workers.number)

# asynchronous call
worker_number = 50
daisi.workers.set(worker_number)

# after set workers, it will take a while to delete or create workers
print(daisi.workers.number)

# check the status of the workers update, i.e. 
# increasing, decreasing, ready_to_update
print(daisi.workers.status)

Running a Daisi in parallel

In a map framework, the same function will be applied to each input.

pydaisi allows to pass a list of inputs as an argument of a Daisi, using the map method:

import pydaisi as pyd

with pyd.Daisi("Add Two Numbers") as my_daisi:
    dbe = my_daisi.map(func="compute", args_list=[{"firstNumber": 5, "secondNumber": x} for x in range(10)])
    print(dbe.value)

A more realistic example

Consider the example below which combines two Daisies:

  1. A Daisi to fetch news from Google News
  2. A Daisi to analyze the sentiment of each title

Running one execution of the Sentiment Analysis Daisi has a wall time of about 700ms. If we query 100 news results, thats about 70s of computation. By distributing this task on 4 workers, we can get it done in about 10s.

import pydaisi as pyd
import pandas as pd 
google_news = pyd.Daisi("exampledaisies/GoogleNews")

# the "GoogleNews" Daisi returns a Pandas Dataframe.
# We will put the titles in a list.
news_title = google_news.get_news(query = "Apple", 
                                  nb = 100).value['title'].to_list()

classify = pyd.Daisi("exampledaisies/Zero Shot Text Classification")

# Prepare a parallel execution

dbe = classify.map(func="compute", 
                   args_list=[{"text": title, 
                               "candidate_labels": "positive, negative"} for title in news_title])

dbe.start()

# Wait for completion
while "RUNNING" in dbe.value: