Eduardo Romero López
07/12/2023, 12:37 PMEduardo Romero López
07/12/2023, 12:38 PMEduardo Romero López
07/12/2023, 12:39 PMJ. Camilo V. Tieck
07/12/2023, 9:32 PMLeslie Wu
07/13/2023, 12:51 PMkedro.io.core.DataSetError: Failed while loading data from data set ExcelDataSet(filepath=my/s3/path/file.xlsx, load_args={'engine': openpyxl, 'sheet_name': Sheet1}, protocol=s3, save_args={'index': False}, writer_args={'engine': xlsxwriter}).
my/s3/path/file.xlsx
Have no issues with other formats - parquet / csv / PDF. Anyone seen this before or have insights to where I am going wrong?
FYI, I am using kedro=0.17.7Michel van den Berg
07/13/2023, 1:00 PMMerel
07/13/2023, 4:09 PM.xlsx file into a Kedro SparkDataSet?Rachid Cherqaoui
07/14/2023, 5:30 PMNelson Zambrano
07/15/2023, 11:02 PMDawid Bugajny
07/17/2023, 9:14 AMwith KedroSession.create(...) as session:
context = session.load_context()
cat = context.catalog
return SequentialRunner().run(catalog=cat, pipeline=pipeline)[...]
I have just discovered, that my API is single-threaded and new requests have to wait untill previeous requests finish. Does anybody solution for this problem and knows how to make API multithreaded?Eduardo Romero López
07/17/2023, 9:49 AMJo Stichbury
07/17/2023, 3:41 PMHigor Carmanini
07/17/2023, 7:26 PMpylance incorrectly inferring that the pipeline function (as imported from kedro.pipeline is actually a module. It gets in the way of showing the proper documentation for kedro.pipeline.modular_pipeline.pipeline(), and I figure could turn some less Kedro-savvy devs away by thinking they're doing it wrong (me a while back 🙃)Rachid Cherqaoui
07/17/2023, 9:04 PMPartitionedDataSet function from <http://kedro.io|kedro.io> to load a data but I've just seen that this function doesn't take the delimiter into account, how can I solve this? (knowing that I'm working on csv files on my local, here is the code used : data_set = PartitionedDataSet(
path = "data/01_raw/Tableaux",
dataset= CSVDataSet,
filename_suffix= ".csv",
load_args= {"delimiter": ";", "header": 0,"encoding": "utf-8"}Marc Gris
07/18/2023, 4:47 AMJackson
07/18/2023, 6:54 AMclass VectorStore:
def __init__(
self,
client_path,
embedding_func) -> None:
self.collections = None
self.client = chromadb.PersistentClient(path=client_path)
self.embedding_func = embedding_func
def create_collections(self,collection_name):
self.collections = self.client.create_collection(collection_name,self.embedding_func)
return self.collections
def add_docs(
collections,
embeddings,
metadatas,
ids):
collections.add(
embeddings = embeddings,
metadatas = metadatas,
ids = ids
)
However, putting this inside nodes.py doesn't seems ideal due to I still have other classes (like model class) and I believe mixing everything inside a nodes is an anti-pattern. But if I write a standalone function in nodes.py like below seems redundant.
def create_collections(collections,collections_name):
collections.create_collections(collections_name)
So my question is what are the best way to separate classes and nodes, while avoiding code redundant at the same time?Daniel Lee
07/18/2023, 8:42 AMDataCatalog, I would like to pandas.ParquetDataset to partition by the date in the dataset and save into different folders by date in parquet like how we can do it for spark.SparkDataSet. Is there a way we could partition using pandas?Zemeio
07/18/2023, 9:26 AM{%- for item in mylist %}
out.blind_predictions_{{ item-}}:
type: pandas.CSVDataSet
filepath: ${filepath1}_{{ item-}}.csv
layer: out
{% endfor %}
Globals:
mystli:
- item1
- item2
(For obvious reasons I removed the actual names from the text here)
Does anyone know how to accomplish this? (do a for here)Marc Gris
07/18/2023, 1:12 PMcatalog.yml values that are defined in parameters.yml
ex:
in conf/base/parameters.yml
tenant_id: xyz
and in conf/base/catalog.yml
_tenant_id: ${tenant_id}
Thx in advanceRachid Cherqaoui
07/18/2023, 3:00 PMRachid Cherqaoui
07/19/2023, 9:29 AMkedro run --async, it takes less time (significant) compared to when I use the KedroSession.create().run() with FastAPI (knowing that in my post function I made the async def) my question is how can I use the async argument with kedrSession that it is at the level of hooks or otherwise, thank you in advance.Marc Gris
07/19/2023, 10:01 AMMarc Gris
07/19/2023, 11:19 AMRachid Cherqaoui
07/19/2023, 1:23 PMkedro run --async, it takes less time (significant) compared to when I use the KedroSession.create().run() with FastAPI (knowing that in my post function I made the async def) my question is how can I use the async argument with kedrSession that it is at the level of hooks or otherwise, thank you in advance.Marc Gris
07/19/2023, 2:11 PMModularPipelineError: Inputs should be free inputs to the pipeline
Could some kindly unpack / explain it ?
ThxCyril Verluise
07/19/2023, 5:55 PMVincent Liagre
07/20/2023, 11:44 AMsrc/my_module ) with pip install -e src ; now kedro is looking for data within the my_module folder from root for some reason. Any clue whats going on here and how I can solve this ?
Happy to provide more details if required 🙂Marc Gris
07/20/2023, 2:47 PMlocal/catalog.yml does not override the `base/catalog.yml`…
Any idea what could cause this behavior ?
Thx
M.Christos Malliopoulos
07/21/2023, 11:18 AMNok Lam Chan
07/21/2023, 3:32 PMdf.describe()
• Need to work in Windows and Linux so wc is not an option
• Need to be fast
• Bonus: is it possible to generalised to Excel filetype?