Georgi Iliev
06/16/2023, 7:56 AMONNX files and uploading them to S3 automatically using "only" the catalog definition.
Broadly speaking, the main flow of what we're trying to build is the following:
1. There is a process that trains and creates some files (PCA, scaler, some K-Means models, etc.) and saves them as Pickle to use them between different nodes.
2. Once the main pipeline is done, we're ready to distribute the model to our services.
3. We're using ONNX because our services are not built in Python and the ONNX libraries we use are a bit faster.
4. So taking this into account, we have a publish pipeline now that takes this Picke files, converts them to ONNX using convert_sklearn , and then uploads to S3.
So, my main question here is: Is there a way to implement this so the transformation and the S3 upload is done automatically?
⢠I know that we can specify a S3 path in the catalog, but I didn't see how to set the .onnx file type.Juan Luis
06/16/2023, 8:03 AM# conf/base/catalog.yml
regressor:
type: kedro_onnx.io.OnnxDataSet
filepath: <s3://data/06_models/reg.onnx>
backend: sklearn
(adapted from https://kedro-onnx.readthedocs.io/en/latest/usage.html)
let me know if that works for you!Georgi Iliev
06/16/2023, 8:11 AM