%punch_pex -l scikit-learn mlflow --group demo --artifact dependencies -v 1.0.0 -o

  adding: dependencies-1.0.0.pex (deflated 2%)
  adding: metadata.yml (deflated 26%)

++ java -Xmx1g -Xms256m -Dlog4j.configurationFile=/punch/conf/log4j2/log4j2-stdout.xml -cp /punch/resourcectl.jar com.github.punchplatform.resourcectl.ResourceCtl -u http://artifacts-server.punch-artifacts:4245 upload -f /punch/punch_pex/dependencies-1.0.0.zip -o

Resource uploaded : additional-pex:demo:dependencies:1.0.0


%%punch_dependencies
additional-pex:demo:dependencies:1.0.0

++ java -Xmx1g -Xms256m -Dlog4j.configurationFile=/punch/conf/log4j2/log4j2-stdout.xml -cp /punch/resourcectl.jar com.github.punchplatform.resourcectl.ResourceCtl -u http://artifacts-server.punch-artifacts:4245 download -r additional-pex:demo:dependencies:1.0.0 -o /usr/share/punch/extlib/python

Resource additional-pex:demo:dependencies:1.0.0 downloaded to /usr/share/punch/extlib/python/dependencies-1.0.0.pex


from sklearn import tree
from sklearn.metrics import accuracy_score
import mlflow


%%punch_source --type s3 --name train -o 
bucket: demo
prefix: train/train.csv

Data is available in train variable.
Execution time: 0:00:00.418062


%%punch_source --type s3 --name test -o 
bucket: demo
prefix: test/test.csv

Data is available in test variable.
Execution time: 0:00:00.165612


train = train[['distance_from_home', 'distance_from_last_transaction',
       'ratio_to_median_purchase_price', 'repeat_retailer', 'used_chip',
       'used_pin_number', 'online_order', 'fraud']]
train.head(2)


test = test[['distance_from_home', 'distance_from_last_transaction',
       'ratio_to_median_purchase_price', 'repeat_retailer', 'used_chip',
       'used_pin_number', 'online_order', 'fraud']]
test.head(2)


model = tree.DecisionTreeClassifier()
model = model.fit(train.drop("fraud", axis=1).values, train["fraud"].values)


prediction = model.predict(test.drop("fraud", axis=1))
accuracy_score(test["fraud"], prediction)

/root/.pex/installed_wheels/a4f411ae99491abcca22021235750d2a9cc0bfbab39d5aa7e62f9861f905f58a/scikit_learn-1.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl/sklearn/base.py:402: UserWarning: X has feature names, but DecisionTreeClassifier was fitted without feature names
  warnings.warn(

0.9999866666666667


%%punch_upload_model -g demo -n credit_card -v 1.0.0 -o
lambda path: mlflow.sklearn.save_model(model, path)

2022/12/14 10:59:39 WARNING mlflow.utils.environment: Encountered an unexpected error while inferring pip requirements (model URI: /tmp/punch_upload_model/demo/credit_card/1.0.0/model.pkl, flavor: sklearn), fall back to return ['scikit-learn==1.2.0', 'cloudpickle==2.2.0']. Set logging level to DEBUG to see the full traceback.
/root/.pex/installed_wheels/57f6f22bde4e042978bcd50176fdb381d7c21a9efa4041202288d3737a0c6a54/setuptools-65.6.3-py3-none-any.whl/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
2022/12/14 10:59:39 WARNING mlflow.utils.environment: Failed to resolve installed pip version. ``pip`` will be added to conda.yaml environment spec without a version specifier.
++ java -Xmx1g -Xms256m -Dlog4j.configurationFile=/punch/conf/log4j2/log4j2-stdout.xml -cp /punch/resourcectl.jar com.github.punchplatform.resourcectl.ResourceCtl -u http://artifacts-server.punch-artifacts:4245 upload -f /tmp/punch_upload_model/demo/credit_card/1.0.0/artifact_credit_card_1.0.0.zip -o

Resource uploaded : model:demo:credit_card:1.0.0

Notebook to train a model¶

Creation of a pex with python dependencies¶

Adding dependencies to the environment¶

Importing modules¶

Reading data from s3¶

Removing unused columns¶

Training the model¶

Testing the model¶

Saving the model and uploading it as an artifact¶

	distance_from_home	distance_from_last_transaction	ratio_to_median_purchase_price	repeat_retailer	used_chip	used_pin_number	online_order	fraud
0	4.805367	1.379477	1.236960	1.0	0.0	0.0	0.0	0.0
1	27.052054	1.766070	0.415689	1.0	0.0	0.0	0.0	0.0

	distance_from_home	distance_from_last_transaction	ratio_to_median_purchase_price	repeat_retailer	used_chip	used_pin_number	online_order	fraud
0	11.188842	0.067784	1.659848	1.0	0.0	0.0	1.0	0.0
1	8.359728	0.186258	0.495259	1.0	1.0	0.0	0.0	0.0