https://kedro.org/ logo
Join Slack
Powered by
# questions
  • n

    Nelson Zambrano

    07/23/2023, 8:25 PM
    Is it possible to disable
    _validate_unique_outputs(nodes)
    via hooks or by implementing a modified
    Pipeline
    class?
    d
    j
    n
    • 4
    • 3
  • b

    Baden Ashford

    07/24/2023, 9:59 AM
    Hi all, Has anyone used Kedro for building pipelines in a repo which also houses non-pipeline code, like lambda functions? I am bringing in Kedro, but also need some way of porting over our existing lambda functions to live in the same repo as our pipelines. Splitting them out into a separate repo is not really feasible due to the common code used by each and the extra work/dependency management that would introduce. We use aws sam as a framework of sorts within each of our lambda functions, so could just put them in
    src/my_repo/lambdas/
    next to
    src/my_repo/pipelines/
    and have some 3rd directory with shared code
    src/my_repo/shared/
    , but thought there may be a different way to go about this! Thanks!
    n
    d
    • 3
    • 9
  • a

    Aleksander Jaworski

    07/24/2023, 11:22 AM
    [Kedro-version : 0.18.6 currently] Hi, I am working on a sort of 'pipeline monorepo' where I have dozens of pipelines. I have a question: would some sort of lazy-configuration-validation be a useful feature for kedro? I have 2 reasons for asking: 1. It feels a bit cumbersome that even a simple
    hello_world.py
    will take several seconds to run when the configuration is large enough, as first you will see all the logs and all the setup will be done for the data catalog etc, none of which would actually end up being used in a
    hello_world.py
    2. When setting up the project for someone, it is impossible to provide a credentials file with just the required credentials. In kedro all of them need to be filled right now as it is all validated at once. In a sort of lazy version, only the dependencies that follow from the pipeline would need to be evaluated. Are there any solutions or modifications I could use to improve my approaches here? Thanks in advance! :)
    🎉 1
    n
    • 2
    • 11
  • s

    Sid Shetty

    07/24/2023, 2:13 PM
    Hello team, I was wondering if theres an approach to break a pandas dataframe into chunks, run a few operations on it and write each chunk to a parquet in append mode(without concatenating the chunks back)? So the kedro node would have multiple writes.
    d
    • 2
    • 7
  • j

    Jon Cohen

    07/24/2023, 3:15 PM
    Hi! My team wants to have separate client data ingestion pipelines which are kept separately from each other. We then want to be able to import our standard data processing pipeline from a central repo. Is it possible to use something like modular pipelines in this way?
    d
    m
    • 3
    • 4
  • j

    Jon Cohen

    07/24/2023, 3:17 PM
    Thank you! Wow, fast response time
    ⏩ 4
  • e

    Emilio Gagliardi

    07/24/2023, 5:40 PM
    hi kedronaughts, I'm trying to get my first pipeline working and I'm confused on a few pieces I'm hoping you can correct my thinking on. I have one custom DataSet that connects to an RSS feed. I have another custom DatSet that stores the processed feed items and saves them to a mongo db. I'm confused around how to setup the catalog entries and node functions in regards to how the catalog values get passed into the DataSets. how do I create a catalog entry that combines with values from credentials.yml? so 'mongo_url' contains my username and password which I stored in credentials.yml catalog entries:
    Copy code
    rss_feed_extract:
      type: kedro_workbench.extras.datasets.RSSDataSet.RSSFeedExtract
      url: <https://api.msrc.microsoft.com/update-guide/rss>
    
    rss_feed_load:
      type: kedro_workbench.extras.datasets.RSSDataSet.RSSFeedLoad
      mongo_url: "mongodb+srv://<username>:<password>@bighatcluster.wamzrdr.mongodb.net/"
      mongo_db: "TBD"
      mongo_collection: "TBD"
      mongo_table: "TBD"
      credentials: mongo_atlas
    nodes.py
    Copy code
    def extract_rss_feed() -> Dict[str, Any]:
        raw_rss_feed = RSSFeedExtract() # Q. how does the catalog 'url' value get passed to the __init__ method?
        raw_rss_feed.load()
        
        return {'key_1':'value_1', 'key_2': 'value_2'}
        
        
    def transform_rss_feed(raw_rss_feed: Dict[str, Any]) -> List[Dict[str, Any]]:
        
        return [{'key_1_T':'value_1_T', 'key_2_T': 'value_2_T'}]
        
        
    def load_rss_feed(prepped_rss_items: List[Dict[str, Any]]) -> None:
        rss_feed_load = RSSFeedLoad(prepped_rss_items) # not clear how to create the custom dataset that takes data from catalog and credentials and the previous node
        rss_feed_load.save()
    pipeline.py
    Copy code
    pipeline([
            node(
                func=extract_rss_feed,
                    inputs=None,
                    outputs='rss_feed_for_transforming',
                    name="extract_rss_feed",
            ),
            node(
                func=transform_rss_feed,
                    inputs="rss_feed_for_transforming",
                    outputs='rss_for_loading',
                    name="transform_rss_items",
            ),
            node(
                func=load_rss_feed,
                    inputs="rss_for_loading",
                    outputs="rss_feed_load",
                    name="load_rss_items",
            ),
    
        ])
    custom datasets
    Copy code
    class RSSFeedExtract(AbstractDataSet):
        def __init__(self, url: str):
            self._url = URL
    
    class RSSFeedLoad(AbstractDataSet):
        def __init__(self, mongo_url: str, mongo_db: str, mongo_collection: str, mongo_table: str, credentials: Dict[str, Any], data: Any = None):
            self._data = data # comes from the previous node
            self._mongo_url = mongo_url
            self._mongo_db = mongo_db
            self._mongo_collection = mongo_collection
            self._mongo_table = mongo_table
            self._username = credentials['username']
            self._password = credentials['password']
    d
    n
    • 3
    • 22
  • j

    Jon Cohen

    07/24/2023, 6:17 PM
    I'm noticing that warnings like SyntaxErrors and type errors are considered "warnings" by Kedro, which continues to try to run the pipeline. Is there a setting to escalate these to Errors so they can abort the pipeline run?
    d
    d
    n
    • 4
    • 12
  • j

    Jon Cohen

    07/24/2023, 6:25 PM
    I also noticed in Kedro Viz (this is from the modular pipelines part of the tutorial) that two pipelines with the same static structure are rendering differently, which is a little frustrating for visual scanning
    d
    • 2
    • 2
  • j

    Jon Cohen

    07/24/2023, 8:10 PM
    More newb questions (sorry). I'm having trouble following the tutorial for running a packaged project. I have made a new directory with a new Kedro project (we expect each client of ours to have their own Kedro project) and have installed the built wheel. However Kedro is looking for nodes and pipelines locally in the new project and can't find the ones in the installed project. Does this mean I have to copy over all of my pipelines manually from the installed Kedro project?
    j
    • 2
    • 4
  • v

    VIOLETA MARÍA RIVERA

    07/25/2023, 10:55 PM
    Hello, I am new to kedro so I was doing the spaceflights tutorial. When I try to use kedro viz, a tab in my browser opens up but it's just a white screen, everything is missing. I tried saving the pipeline to a .json file and it isn't empty, so I don't know what is causing this display issue. I'd be grateful for any help. Thanks!
    j
    j
    t
    • 4
    • 11
  • s

    Suyash Shrivastava

    07/26/2023, 3:23 PM
    Hi Everyone! Has anyone used matplotlib 2.0.0 with Kedro 0.17.7 before? I am getting error. I have installed PyQt5 and pyside2 but still getting the same error. I'd grateful for any help. Thanks a lot!
    Copy code
    File "/usr/local/lib/python3.6/site-packages/matplotlib/backends/qt_compat.py", line 175, in <module>
        "Matplotlib qt-based backends require an external PyQt4, PyQt5,\n"
    ImportError: Matplotlib qt-based backends require an external PyQt4, PyQt5,
    or PySide package to be installed, but it was not found.
    j
    n
    • 3
    • 12
  • s

    Sid Shetty

    07/26/2023, 5:24 PM
    Hello team, when I split a pandas dataframe and store using partitioned dataset, loading them back together appears to find schema differences. Since a few columns have
    nulls
    . Is there any workaround here that avoids me having to add another node to put these partitions together and ideally just read as a pandas.ParquetDataSet? Perhaps passing the schema of the original dataframe or even specifying it explicitly?
    j
    • 2
    • 3
  • l

    Lim H.

    07/26/2023, 6:51 PM
    Hi everyone, is it possible to pass credentials of the underlying dataset when using it with CachedDataSet? e.g.
    Copy code
    test:
      type: CachedDataset
      versioned: true
      dataset:
        type: pandas.JSONDataSet
        filepath: ...
        load_args:
          lines: True
        credentials: ...
    doesn’t work but this works
    Copy code
    test:
      type: pandas.JSONDataSet
      filepath: ...
      load_args:
        lines: True
      credentials: ...
    I thought this was working at some point? I might be hallucinating though. Just want to double check quickly before I create my own CachedDataSet
    ✅ 1
    👀 1
    n
    • 2
    • 6
  • j

    J. Camilo V. Tieck

    07/26/2023, 7:21 PM
    hi everyone, how can I access the current env from python? is there a env_name variable somewhere? I want to use the env_name as a suffix for loading a file.
    n
    • 2
    • 8
  • e

    Emilio Gagliardi

    07/26/2023, 8:57 PM
    I'm trying to get logging working and was hoping someone could point me in the right direction. when you load kedro and run it out of the box, kedro automatically writes nodes and pipeline details to the console, I'd like to keep that as it is. What I'm not figuring out is how to use logger inside a module to write log entries to a file and not the console. I have a large json object I want to print to file so I can look at it. I tried setting up my logging.yml file but I'm not understanding something.
    Copy code
    logging.yml
    handlers:
      ...other built-in kedro handlers...
      debug_file_handler:
        class: logging.handlers.RotatingFileHandler
        level: DEBUG
        formatter: simple
        filename: logs/debug.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8
        delay: True
    
    loggers:
      kedro:
        level: INFO
    
      kedro_workbench:
        level: INFO
    
      DataSets:
        level: DEBUG
        handlers: [debug_file_handler]
    
    root:
      handlers: [rich, info_file_handler, error_file_handler]
    Copy code
    in my module I used:
    import logging
    logger = logging.getLogger('DataSets')
    logger.debug(output)
    but when I run the pipeline, the contents of output are still written to the console. What am I missing here? thanks kindly!
    n
    • 2
    • 19
  • f

    Fazil B. Topal

    07/27/2023, 8:50 AM
    hey all, Is there someway where i can see the high level overview of how kedro functions? I find
    hooks
    to be nice but without high level order of execution, not sure if i can do what i want. Context: I am trying to play around with the data versioning to change it a bit since I would run each nodes in a different k8s pod ideally. That means, dataset versioning should match. From what i gather
    Session
    class has this info but Im trying to find a proper how to make sure same code version + some envs ends up using the same data versioning etc. Any help is appreciated 🙂
    d
    n
    m
    • 4
    • 30
  • r

    Rahul Kumar

    07/27/2023, 8:57 AM
    Hi all, Any specific reason why versioning is not supported in PartitionedDataset ?
    j
    d
    n
    • 4
    • 10
  • m

    meharji arumilli

    07/27/2023, 3:08 PM
    Hello all, I have packaged the kedro project as:
    kedro package
    and created dag:
    kedro airflow create
    This created the .whl and dags. Then using the below docker file, i have built the docker image for the kedro project:
    FROM apache/airflow:2.6.3-python3.8
    # install project requirements
    WORKDIR /opt/test-fi/
    COPY src/requirements.txt .
    USER root
    RUN chmod -R a+rwx /opt/test-fi/
    # Install necessary packages
    RUN sudo apt-get update && apt-get install -y wget gnupg2 libgomp1 && apt-get -y install git
    USER airflow
    COPY data/ data/
    COPY conf/ conf/
    COPY logs/ logs/
    COPY src/ src/
    COPY output/ output/
    COPY dist/ dist/
    COPY pyproject.toml .
    RUN --mount=type=bind,src=.env,dst=conf/.env . conf/.env && python -m pip install --upgrade pip && python -m pip install -r requirements.txt && python -m pip install dist/test_fi-0.1-py3-none-any.whl
    EXPOSE 8888
    CMD ["kedro", "run"]
    The docker image is built as:
    docker build -t test_fi .
    Then i have installed airflow using docker-compose.yml file in EC2 instance. And attached the docker image to the worker and scheduler services. Tested the docker image test_fi, by docker exec into the container and ran the command `kedro run`and the project runs as expected. However, with the airflow when the dag is triggered, i get the below error in airflow UI without much information in logs to debug. The below log is using
    logging_level = DEBUG
    *** Found local files:
    ***   * /opt/airflow/logs/dag_id=test-fi/run_id=scheduled__2023-06-27T14:37:54.602904+00:00/task_id=define-project-parameters/attempt=1.log
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1037} DEBUG - previous_execution_date was called
    [2023-07-27, 14:37:56 UTC] {__init__.py:51} DEBUG - Loading core task runner: StandardTaskRunner
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1037} DEBUG - previous_execution_date was called
    [2023-07-27, 14:37:56 UTC] {base_task_runner.py:68} DEBUG - Planning to run as the  user
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:789} DEBUG - Refreshing TaskInstance <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]> from DB
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1112} DEBUG - <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]> dependency 'Trigger Rule' PASSED: True, The task instance did not have any upstream tasks.
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1112} DEBUG - <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]> dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying.
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1112} DEBUG - <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]> dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set.
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1112} DEBUG - <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]> dependency 'Task Instance State' PASSED: True, Task state queued was valid.
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1112} DEBUG - <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]> dependency 'Task Instance Not Running' PASSED: True, Task is not in running state.
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]>
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1112} DEBUG - <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]> dependency 'Trigger Rule' PASSED: True, The task instance did not have any upstream tasks.
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1112} DEBUG - <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]> dependency 'Task Concurrency' PASSED: True, Task concurrency is not set.
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1112} DEBUG - <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]> dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying.
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1112} DEBUG - <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]> dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set.
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1112} DEBUG - <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]> dependency 'Pool Slots Available' PASSED: True, There are enough open slots in default_pool to execute the task
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [queued]>
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1308} INFO - Starting attempt 1 of 2
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1327} INFO - Executing <Task(KedroOperator): define-project-parameters> on 2023-06-27 14:37:54.602904+00:00
    [2023-07-27, 14:37:56 UTC] {standard_task_runner.py:57} INFO - Started process 85 to run task
    [2023-07-27, 14:37:56 UTC] {standard_task_runner.py:84} INFO - Running: ['***', 'tasks', 'run', 'test-fi', 'define-project-parameters', 'scheduled__2023-06-27T14:37:54.602904+00:00', '--job-id', '884', '--raw', '--subdir', 'DAGS_FOLDER/test_fi_dag.py', '--cfg-path', '/tmp/tmpu1fp72mc']
    [2023-07-27, 14:37:56 UTC] {standard_task_runner.py:85} INFO - Job 884: Subtask define-project-parameters
    [2023-07-27, 14:37:56 UTC] {cli_action_loggers.py:65} DEBUG - Calling callbacks: [<function default_action_log at 0x7f4f6b6038b0>]
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1037} DEBUG - previous_execution_date was called
    [2023-07-27, 14:37:56 UTC] {task_command.py:410} INFO - Running <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [running]> on host e1be34e2e4d4
    [2023-07-27, 14:37:56 UTC] {settings.py:353} DEBUG - Disposing DB connection pool (PID 85)
    [2023-07-27, 14:37:56 UTC] {settings.py:212} DEBUG - Setting up DB connection pool (PID 85)
    [2023-07-27, 14:37:56 UTC] {settings.py:285} DEBUG - settings.prepare_engine_args(): Using NullPool
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:789} DEBUG - Refreshing TaskInstance <TaskInstance: test-fi.define-project-parameters scheduled__2023-06-27T14:37:54.602904+00:00 [running]> from DB
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1037} DEBUG - previous_execution_date was called
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:868} DEBUG - Clearing XCom data
    [2023-07-27, 14:37:56 UTC] {retries.py:80} DEBUG - Running RenderedTaskInstanceFields.write with retries. Try 1 of 3
    [2023-07-27, 14:37:56 UTC] {retries.py:80} DEBUG - Running RenderedTaskInstanceFields._do_delete_old_records with retries. Try 1 of 3
    [2023-07-27, 14:37:56 UTC] {taskinstance.py:1545} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='***' AIRFLOW_CTX_DAG_ID='test-fi' AIRFLOW_CTX_TASK_ID='define-project-parameters' AIRFLOW_CTX_EXECUTION_DATE='2023-06-27T14:37:54.602904+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2023-06-27T14:37:54.602904+00:00'
    [2023-07-27, 14:37:56 UTC] {__init__.py:117} DEBUG - Preparing lineage inlets and outlets
    [2023-07-27, 14:37:56 UTC] {__init__.py:158} DEBUG - inlets: [], outlets: []
    `[2023-07-27, 143757 UTC] {store.py:32} INFO -
    read()
    not implemented for
    BaseSessionStore
    . Assuming empty store.`
    [2023-07-27, 14:37:57 UTC] {session.py:50} WARNING - Unable to git describe /opt/test-fi
    [2023-07-27, 14:37:57 UTC] {logging_mixin.py:150} INFO - Model version 20230727-143757
    [2023-07-27, 14:37:57 UTC] {common.py:123} DEBUG - Loading config file: '/opt/test-fi/conf/base/logging.yml'
    [2023-07-27, 14:37:57 UTC] {local_task_job_runner.py:225} INFO - Task exited with return code Negsignal.SIGABRT
    Can anyone offer help to fix this. It seems to be related to the line ``DEBUG - Loading config file: '/opt/test-fi/conf/base/logging.yml'``
    n
    • 2
    • 17
  • j

    jyoti goyal

    07/27/2023, 6:20 PM
    Hi everyone, I am working on a problem that requires a conditional data output i.e. only when the parameter is set to True it should export the dataset else no. Is there a way I can incorporate the logic in kedro? Any help is highly appreciated!
    n
    • 2
    • 6
  • e

    Emilio Gagliardi

    07/28/2023, 4:52 AM
    Still trying to wrap my head around custom datasets and how the pipeline works. So I created a custom dataset where the _save() method saves the data to a mongo db. In the pipeline, I define the node so that the inputs equal the data and the outputs equal the custom dataset. The part I don't understand clearly is if the class handles the actual save process, what do I put in the node function? the function doesn't do anything so I'm not sure what to do with it.
    Copy code
    pipeline([
            node(
                func=extract_rss_feed,
                    inputs='rss_feed_extract',
                    outputs='rss_feed_for_transforming',
                    name="extract_rss_feed",
            ),
            node(
                func=transform_rss_feed,
                    inputs=['rss_feed_for_transforming', 'params:rss_1'],
                    outputs='rss_feed_for_loading',
                    name="transform_rss_feed",
            ),
            node(
                func=load_rss_feed,
                    inputs='rss_feed_for_loading', <- incoming data (in memory)
                    outputs='rss_feed_load', <- calls the _save() of the class
                    name="load_rss_feed",
            ),
            
        ])
    nodes.py If all the save logic is in the class, then there's nothing for the function to do...what am I missing here? what typically goes in the function whose output is a dataset?
    Copy code
    def load_rss_feed(preprocessed_rss_feed):
        pass
    When I try to run the pipeline, I get the following error:
    DatasetError: Saving 'None' to a 'Dataset' is not allowed
    thanks for your thoughts!
    d
    • 2
    • 3
  • r

    Rachid Cherqaoui

    07/28/2023, 8:54 AM
    Hello everyone, I have a problem concerning the connection of sqlalchemy with mysqlserver which occurs every morning around 9:00 AM and then it does not reproduce. Knowing that my database exists and all is good, here is the code used in `catalog.yml`:
    Copy code
    _mysql : &mysql
      type: pandas.SQLQueryDataSet
      credentials: 
          con: mysql+mysqlconnector://${mysql_connect.username}:${mysql_connect.password}@${mysql_connect.host}:${mysql_connect.port}/${mysql_connect.database}
    
    table_insurers: 
      <<: *mysql
      sql: select * from underwriter_insurers
    
    table_ccns: 
      <<: *mysql
      sql: select * from underwriter_ccns
    
    table_departments: 
      <<: *mysql
      sql: select * from underwriter_departments
    and this is the error produced :
    Copy code
    2023-07-26 09:33:48 - src.api.tarificateur_compte - ERROR - An error occurred in tarificateur_compte():
    Traceback (most recent call last):
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1808, in _execute_context
        context = constructor(
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 1346, in _init_statement
        self.cursor = self.create_cursor()
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 1530, in create_cursor
        return self.create_default_cursor()
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 1533, in create_default_cursor
        return self._dbapi_connection.cursor()
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1494, in cursor
        return self.dbapi_connection.cursor(*args, **kwargs)
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/mysql/connector/connection_cext.py", line 678, in cursor
        raise OperationalError("MySQL Connection not available.")
    mysql.connector.errors.OperationalError: MySQL Connection not available.
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/kedro/io/core.py", line 210, in load
        return self._load()
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/kedro_datasets/pandas/sql_dataset.py", line 512, in _load
        return pd.read_sql_query(con=engine, **load_args)
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/pandas/io/sql.py", line 467, in read_sql_query
        return pandas_sql.read_query(
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/pandas/io/sql.py", line 1736, in read_query
        result = self.execute(sql, params)
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/pandas/io/sql.py", line 1560, in execute
        return self.con.exec_driver_sql(sql, *args)
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1772, in exec_driver_sql
        ret = self._execute_context(
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1814, in _execute_context
        self._handle_dbapi_exception(
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2326, in _handle_dbapi_exception
        raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1808, in _execute_context
        context = constructor(
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 1346, in _init_statement
        self.cursor = self.create_cursor()
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 1530, in create_cursor
        return self.create_default_cursor()
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 1533, in create_default_cursor
        return self._dbapi_connection.cursor()
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1494, in cursor
        return self.dbapi_connection.cursor(*args, **kwargs)
      File "/home/debian/anaconda3/envs/env_tarificateur/lib/python3.10/site-packages/mysql/connector/connection_cext.py", line 678, in cursor
        raise OperationalError("MySQL Connection not available.")
    sqlalchemy.exc.OperationalError: (mysql.connector.errors.OperationalError) MySQL Connection not available.
    Can anyone help me fix this problem because I have tried everything I can but I have not managed to solve it, thank you in advance.
    j
    • 2
    • 6
  • m

    Mate Scharnitzky

    07/28/2023, 11:21 AM
    Hi Team, I’m working in a SageMaker notebook which is in the
    /notebook
    directory. I’m trying to load some nodes that I created locally but it doesn’t find path. Two questions: ‱ How can load kedro context to this notebook? ‱ How can I load python modules developed as part of the kedro project? Thank you! I looked into this but somehow I can’t make it work: https://docs.kedro.org/en/0.18.11/notebooks_and_ipython/kedro_and_notebooks.html
    d
    j
    n
    • 4
    • 34
  • h

    Hygor Xavier AraĂșjo

    07/28/2023, 5:59 PM
    Hi, everyone. Is it possible to use pandas.CSVDataSet to read a compressed (zip) password protected CSV? It's a local file. The zip is password protected, not the csv inside it
    d
    • 2
    • 1
  • j

    J. Camilo V. Tieck

    07/28/2023, 8:28 PM
    hi all, I have a question regarding kedro docker. how can I build an image for a different platform? I have a mac, and for aws ECS I need to build the image with a different architecture. I use this command to build the images directly with docker:
    docker buildx build --platform=linux/amd64 -t <image-name> .
    Is there a ‘kedro docker’ way of doing this? thanks!
    f
    d
    • 3
    • 4
  • e

    Erwin

    07/29/2023, 2:36 AM
    Hello Team, I would like to know the recommended approach for implementing schema evolution in Delta tables within Databricks. Currently, I am encountering an issue with the dataset kedro_datasets.databricks.managed_table_dataset Whenever I attempt to add new columns using the upsert mode, an Exception is raised (there is check in the dataset implementation) Fortunately, I have control over the schema before performing the upsert operation. Hence, once I approve the schema changes, I expect to be able to utilize schema evolution during the upsert. In this context, I strongly believe that the exception raised during schema changes should be made optional, allowing for a smoother schema evolution process. For me, who should accept/deny schema evolution must be spark session
    # Enable automatic schema evolution
    spark.sql("SET spark.databricks.delta.schema.autoMerge.enabled = true")
    j
    • 2
    • 1
  • d

    Daniel Lee

    07/31/2023, 5:44 AM
    Hello team, I’m currently using M1 Mac with kedro version 0.18.3 and was trying to run kedro to import the Pick DataSet that needs to use lightgbm package. However, even after I ran
    brew install libomp
    , I’m encountering an error where it says
    (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64'))
    . Do you know how I can go around this issue if it’s related to the architecture? And normally how is this being resolved?
    m
    m
    • 3
    • 2
  • b

    Baden Ashford

    07/31/2023, 11:16 AM
    Hi team, How can I conduct parallel IO with kedro? I have a larger than memory partitioned dataset. I'd like to run each partition through the node in some parallel fashion. Can I utilise ParallelRunner for this? Thank you 😁
    d
    n
    • 3
    • 19
  • f

    Fazil B. Topal

    07/31/2023, 11:46 AM
    hey all, Is it possible to have this PR code as plugin to integrate with kedro? Not sure how general plugins work and i don't know whether it would work as a plugin or needs to be merged into main repo in order to work
    m
    n
    • 3
    • 5
1...2728293031Latest