Polly
04/04/2023, 4:30 PMStephanie Kaiser
04/05/2023, 2:23 PMPolly
04/26/2023, 2:20 PMMerel
06/01/2023, 8:54 AMJuan Luis
06/21/2023, 8:54 AMNok Lam Chan
08/09/2023, 8:26 PMYetunde
08/22/2023, 5:16 PMcatalog.yml
& parameters.yml
straight from a Jupyter/Databricks/AWS SageMaker notebook without a project template or an IDE.
• party wizard Use the project creation wizard to add features to your project template. Don't need the files and folders created by linting, testing, and documentation? No worries! Just skip those to get a simpler template.
We'd love your help testing these ideas! If you can spare 30 minutes to try either of them, then indicate your interest with jupyter or party wizard. Your feedback will help make Kedro more flexible.datajoely
09/28/2023, 1:56 PMJuan Luis
10/13/2023, 12:17 PMDeepyaman Datta
10/22/2023, 1:32 PMPartitionedDataset
users out there! We have a question for you, related to enabling versioning for PartitionedDataset
--which of the below options makes the most sense to you?
1. https://github.com/kedro-org/kedro/pull/521 proposes to enable versioning of the underlying dataset, by specifying versioned: true
in the dataset config:
station_data:
type: PartitionedDataset
path: data/03_primary/station_data
dataset:
type: pandas.CSVDataset
versioned: true
On the plus side, having the versioned: true
config on the dataset
config makes it clear that the versioning is applied to the underlying dataset, not to the PartitionedDataset
. However, there are some edge cases (see https://github.com/kedro-org/kedro/pull/521#issuecomment-744653023, if you're keen).
2. Alternatively, we can move the versioned: true
flag to the top level PartitionedDataset
config:
station_data:
type: PartitionedDataset
path: data/03_primary/station_data
versioned: true
dataset:
type: pandas.CSVDataset
Note that the versioning is still of the underlying dataset (e.g. data/03_primary/station_data/first_station.csv/<version>/first_station.csv
), even though the config is at the top level.
3. None of these options make sense; what you really need is versioning of the top-level dataset. (Note that we don't have a solution designed for this case, but it would be great to know nonetheless!)
Please feel free to vote using 1️⃣2️⃣3️⃣, and elaborate further on your thoughts in the thread below!Juan Luis
11/02/2023, 1:37 PM~/.miniconda
, ~/.virtualenvs
), or next to the code (~/Projects/spaceflights/.venv
)?
• when you create a new Kedro project, what are the steps you usually follow? for example 1. create and activate conda environment, 2. pip install kedro
, 3. kedro new
• what do you think of the current process?
(please leave a reply on the thread 🧵, 1 comment per person to keep the conversation tidy)
your feedback and ideas are very much welcome 🙏🏼Роман Белый
11/02/2023, 1:53 PMJuan Luis
11/06/2023, 9:23 AMrequirements.txt
and them read them in pyproject.toml
https://github.com/kedro-org/kedro/blob/93dc1a91e4bb476287040ea3db4a610696cacb0c/k[…]project/%7B%7B%20cookiecutter.repo_name%20%7D%7D/pyproject.toml
but you can also just avoid requirements.txt
files entirely. what do you think of this approach?Juan Luis
11/06/2023, 10:20 AMkedro new
if you haven't installed Kedro yet? 🙃 cc @Lukas InnigJuan Luis
12/07/2023, 11:57 AMA node cannot have the same inputs and outputs
) so it requires you to define a read-only version of the dataset and an appendable version, both referring to the same underlying storage.
any thoughts on this approach?Nok Lam Chan
01/16/2024, 4:35 PMThomas Huyghebaert
01/29/2024, 4:15 PMDeepyaman Datta
02/05/2024, 12:29 PMdatajoely
02/23/2024, 1:54 PMrename_payments
node
https://datajoely.github.io/jaffle-shop-lineage/datajoely
02/23/2024, 1:54 PMNok Lam Chan
02/26/2024, 7:07 PMaadi
02/26/2024, 7:13 PMNok Lam Chan
03/06/2024, 3:50 PMUsage: kedro [OPTIONS] COMMAND [ARGS]...
Try 'kedro -h' for help.
Error: No such command 'run'.
In case you are running outside of a project, you will see a slightly more helpful message.
It may be considered a slight breaking changes for plugin developer if your plugin relies on the fact the Kedro command always running from project root, thoughts?Nok Lam Chan
03/08/2024, 10:08 AMNok Lam Chan
03/13/2024, 5:59 PMkedro run --runner ParalleRunner
to speed up your pipeline. If not, why? (Other than Spark doesn't work with multiprocess)Juan Luis
03/20/2024, 4:16 PMRashida Kanchwala
03/27/2024, 4:32 PMNok Lam Chan
04/02/2024, 10:15 AMJuan Luis
04/08/2024, 12:33 PMArtur Dobrogowski
04/12/2024, 6:45 PM