Transition to d6tflow¶
Current Workflow Using Functions¶
Your code currently probably looks like the example below. How do you turn it into a d6tflow workflow?
import pandas as pd
def get_data():
data = pd.read_csv('rawdata.csv')
data = clean(data)
data.to_pickle('data.pkl')
def preprocess(data):
data = scale(data)
return data
# execute workflow
get_data()
df_train = pd.read_pickle('data.pkl')
do_preprocess = True
if do_preprocess:
df_train = preprocess(df_train)
Workflow Using d6tflow Tasks¶
In a d6tflow workflow, you define your own task classes and then execute the workflow by running the final downstream task which will automatically run required upstream dependencies.
The function-based workflow example will transform to this:
import d6tflow
import pandas as pd
class TaskGetData(d6tflow.tasks.TaskPqPandas):
# no dependency
def run(): # from `def get_data()`
data = pd.read_csv('rawdata.csv')
data = clean(data)
self.save(data) # save output data
class TaskProcess(d6tflow.tasks.TaskPqPandas):
do_preprocess = luigi.BoolParameter(default=True) # optional parameter
def requires(self):
return TaskGetData() # define dependency
def run(self):
data = self.input().load() # load input data
if self.do_preprocess:
data = scale(data) # # from `def preprocess(data)`
self.save(data) # save output data
flow = d6tflow.Workflow(TaskProcess)
flow.run() # execute task with dependencies
data = flow.outputLoad() # load output data
Learn more about Writing and Managing Tasks and Running Workflows.
Interactive Notebook¶
Live mybinder example http://tiny.cc/d6tflow-start-interactive
Design Pattern Templates for Machine Learning Workflows¶
See code templates for a larger real-life project at https://github.com/d6t/d6tflow-template. Clone & code!