Parallel executions¶
The Daisi platform supports parallel executions, meaning that multiple workers
can be assigned to a Daisi. pydaisi
currently supports a straightforward Map and monolithic reduce
framework, hence making it easy to address embarrassingly parallel problems.
Warning
This feature is an alpha version and has currently many limitations.
Set and monitor workers¶
By default, every Daisi is assigned 1 worker.
daisi.workers.set
will set the number of workers to the specified value,
i.e. if the available worker is too many, it will delete the extra workers, vice versa.
Warning
This setting is for the Daisi, meaning that it will affect all users.
import pydaisi as pyd
daisi = pyd.Daisi('exampledaisies/Add Two Numbers')
# before set workers
print(daisi.workers.number)
# asynchronous call
worker_number = 50
daisi.workers.set(worker_number)
# after set workers, it will take a while to delete or create workers
print(daisi.workers.number)
# check the status of the workers update, i.e.
# increasing, decreasing, ready_to_update
print(daisi.workers.status)
Running a Daisi in parallel¶
In a map
framework, the same function will be applied to each input.
pydaisi
allows to pass a list of inputs as an argument of a Daisi, using the map
method:
import pydaisi as pyd
with pyd.Daisi("Add Two Numbers") as my_daisi:
dbe = my_daisi.map(func="compute", args_list=[{"firstNumber": 5, "secondNumber": x} for x in range(10)])
print(dbe.value)
A more realistic example¶
Consider the example below which combines two Daisies:
- A Daisi to fetch news from Google News
- A Daisi to analyze the sentiment of each title
Running one execution of the Sentiment Analysis Daisi has a wall time of about 700ms. If we query 100 news results, thats about 70s of computation. By distributing this task on 4 workers, we can get it done in about 10s.
import pydaisi as pyd
import pandas as pd
google_news = pyd.Daisi("exampledaisies/GoogleNews")
# the "GoogleNews" Daisi returns a Pandas Dataframe.
# We will put the titles in a list.
news_title = google_news.get_news(query = "Apple",
nb = 100).value['title'].to_list()
classify = pyd.Daisi("exampledaisies/Zero Shot Text Classification")
# Prepare a parallel execution
dbe = classify.map(func="compute",
args_list=[{"text": title,
"candidate_labels": "positive, negative"} for title in news_title])
dbe.start()
# Wait for completion
while "RUNNING" in dbe.value: