https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • s

    stocky-television-65849

    12/10/2021, 8:17 PM
    Hello. Is dynamo db supported?
    i
    • 2
    • 4
  • c

    calm-airplane-47634

    12/10/2021, 9:05 PM
    Hi folks has anyone seen issues with postgresql ingestion? I am getting this error when i enable profiling
    Copy code
    215      if not sqlalchemy:
    --> 216          raise DatasourceInitializationError(
        217              name, "ModuleNotFoundError: No module named 'sqlalchemy'"
        ..................................................
         self = <great_expectations.datasource.sqlalchemy_datasource.SqlAlchemyDatasource object at 0x13624d2b0>
         name = 'my_sqlalchemy_datasource-bfb77e2d-9b3d-4e9b-a714-4675fa7c6f08'
         data_context = None
         data_asset_type = {'module_name': 'great_expectations.dataset',
                            'class_name': 'SqlAlchemyDataset'}
         credentials = {'url': <postgresql+psycopg2://username>:***@localhost:5432/databasename}
         batch_kwargs_generators = None
         kwargs = {'engine': <sqlalchemy.engine.base.Connection object at 0x13620c400>}
         sqlalchemy = None
         DatasourceInitializationError = <class 'great_expectations.exceptions.exceptions.DatasourceInitializationError'>
        ..................................................
    
    ---- (full traceback above) ----
    File "/Users/ankur.chauhan/.pyenv/versions/3.9.5/lib/python3.9/site-packages/great_expectations/data_context/data_context.py", line 1869, in _instantiate_datasource_from_config
        ] = self._build_datasource_from_config(name=name, config=config)
    File "/Users/ankur.chauhan/.pyenv/versions/3.9.5/lib/python3.9/site-packages/great_expectations/data_context/data_context.py", line 1935, in _build_datasource_from_config
        datasource = instantiate_class_from_config(
    File "/Users/ankur.chauhan/.pyenv/versions/3.9.5/lib/python3.9/site-packages/great_expectations/data_context/util.py", line 121, in instantiate_class_from_config
        class_instance = class_(**config_with_defaults)
    File "/Users/ankur.chauhan/.pyenv/versions/3.9.5/lib/python3.9/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 64, in sqlalchemy_datasource_init
        underlying_datasource_init(self, *args, **kwargs, engine=conn)
    File "/Users/ankur.chauhan/.pyenv/versions/3.9.5/lib/python3.9/site-packages/great_expectations/datasource/sqlalchemy_datasource.py", line 216, in __init__
        raise DatasourceInitializationError(
    
    DatasourceInitializationError: Cannot initialize datasource my_sqlalchemy_datasource-bfb77e2d-9b3d-4e9b-a714-4675fa7c6f08, error: ModuleNotFoundError: No module named 'sqlalchemy'
    i
    g
    • 3
    • 14
  • m

    mysterious-zebra-62364

    12/10/2021, 10:47 PM
    Does DataHub
    dbt
    ingestion work with the latest version of
    dbt==1.0.0
    ?
    l
    m
    i
    • 4
    • 5
  • c

    cuddly-telephone-51804

    12/12/2021, 10:42 AM
    Hi team, I've ingested metadata from my Oracle database, but it didn't include the "monthly queries" like the Snowflake demo table. Can I do this with my Oracle database?
    b
    l
    b
    • 4
    • 3
  • b

    breezy-controller-54597

    12/13/2021, 12:24 AM
    Hi, I tried to ingest metadata from PostgreSQL, but I got the following error. Do you have any idea?
    Copy code
    $ ./scripts/datahub_docker.sh ingest -c ./postgres.yml
    ................................
    OperationalError: (psycopg2.OperationalError) server didn't return client encoding'
    m
    • 2
    • 20
  • d

    delightful-jackal-88844

    12/13/2021, 11:39 AM
    Hi! How i can specify the name of the source in the ingest.yml? I have a different sources, and need something like: “hive_prod, hive_test, hive_cloud… etc”. Don’t found answer in documentation. ty
    b
    b
    • 3
    • 3
  • o

    orange-flag-48535

    12/13/2021, 11:50 AM
    Filed a new bug: https://github.com/linkedin/datahub/issues/3724
    plus1 1
    b
    • 2
    • 1
  • r

    red-pizza-28006

    12/13/2021, 1:42 PM
    Does anyone know when ingesting DBT files here - https://datahubproject.io/docs/metadata-ingestion/source_docs/dbt, if I can specify S3 files?
    b
    m
    +4
    • 7
    • 25
  • g

    gentle-florist-49869

    12/13/2021, 9:39 PM
    I'seeing this recipe (source x sink) about MsSQL that config just database (DemoData) , but is it possible ingest specific one table that we have into DemoData ?
    b
    b
    • 3
    • 3
  • b

    better-spoon-77762

    12/13/2021, 11:35 PM
    Hello, can metadata ingestion run over airflow with celery as the executor backend?
    b
    • 2
    • 2
  • b

    boundless-student-48844

    12/14/2021, 3:17 AM
    Hi team, we are looking into ingest ML metadata to DataHub. Can i check the rationale why DataHub decided to model feature groups (
    FeatureTable
    ) as first-class entity instead of features (
    Features
    ), which was proposed in this RFC? Without features as first-class entities, how could we enable dataset-to-feature and feature-to-model lineage?
    l
    v
    +3
    • 6
    • 11
  • h

    hallowed-article-64840

    12/14/2021, 8:09 AM
    Hi guys we are using airflow for ingestion and its working fine but we need a mechanism for detecting changes to inform other teams. for newly added table and metadata change for current tables we need to be informed. I was looking kafka topics and thought MetadataChangeEvent_v4 may can help, but its filled in each ingestion interval for each tables regardless of its changed or not. how we can do that in datahub ?
    e
    • 2
    • 5
  • h

    high-hospital-85984

    12/14/2021, 1:56 PM
    In the mongodb source, when NOT using the randomSampling, would it make sense to actually pull out the N latest documents, and not the oldest? (or at least have an option to reverse the order) My thinking is that in long-lived collections the later documents might be more representative of the "schema" at this moment. And there might also be a tiny performance advantage for huge collections (in-memory vs on disk)?
    m
    • 2
    • 2
  • o

    orange-flag-48535

    12/14/2021, 2:40 PM
    Is there any plan to support JSON Schema as ingestion format for Datahub?
    b
    m
    • 3
    • 10
  • g

    gentle-florist-49869

    12/14/2021, 4:07 PM
    type: "mysql" config: # Credentials username: datahub password: datahub # Coordinates host_port: localhost:3306 database: datahub table_pattern.allow: newteste
    i
    b
    b
    • 4
    • 5
  • g

    gentle-florist-49869

    12/14/2021, 4:08 PM
    receive the error: datahub ingest -c /home/fabiocastro/datahub/metadata-ingestion/examples/recipes/mysql_to_datahub.yml 1 validation error for MySQLConfig table_pattern.allow extra fields not permitted (type=value_error.extra)
    p
    l
    • 3
    • 8
  • l

    late-father-17108

    12/14/2021, 5:52 PM
    From what I can tell, datahub doesn't currently support openlineage ingestion for lineage. Is that true or is it in upcoming plans? I see datahub is part of the openlineage docs here: https://openlineage.io/getting-started/
    l
    w
    • 3
    • 9
  • b

    best-planet-6756

    12/14/2021, 7:06 PM
    Hello All, has anyone added to the dev.sh script to also start/call the airflow dock-compose file? I would like to start airflow with my datahub dev.sh script if possible.
    b
    • 2
    • 1
  • m

    miniature-eve-89383

    12/14/2021, 7:16 PM
    How is the injestion performed? With regular JDBC drivers? Does it mean that it would support any authentication methods supported by the drivers? I'm looking mostly at SSL certificate authentication (available with many DBMS) or PostgreSQL's scram-sha-256 auth.
    b
    • 2
    • 1
  • b

    brief-wolf-70822

    12/15/2021, 2:51 PM
    Hey, quick question, are column comments ingested? say for the Glue source or the generic sqlalchemy source?
    i
    s
    m
    • 4
    • 5
  • l

    limited-cricket-18852

    12/15/2021, 5:39 PM
    Hi! Did anyone could make make the Spark-Lineage work on Databricks? I added the dependency on my cluster but I get the following when running a job
    Copy code
    Caused by: java.lang.ClassNotFoundException: com.linkedin.datahub.lineage.spark.interceptor.DatahubLineageEmitter
    Does anyone know if there is anything special to do when running on databricks?
    b
    m
    +3
    • 6
    • 34
  • r

    red-window-75368

    12/15/2021, 5:45 PM
    I'm having an error when running:
    Copy code
    pip install 'acryl-datahub[hive]'
    m
    b
    • 3
    • 7
  • l

    late-father-17108

    12/15/2021, 10:45 PM
    In the lineage docs (https://datahubproject.io/docs/lineage/sample_code) why is a
    MetadataChangeProposalWrapper
    preferable over
    MetadataChangeEvent
    ?
    m
    • 2
    • 9
  • m

    melodic-helmet-78607

    12/16/2021, 2:33 AM
    Hi, has anyone successfully able to use acryl-datahub pipeline dynamically, handling permission errors/operational errors without using denylist?
    s
    • 2
    • 4
  • f

    few-air-56117

    12/16/2021, 8:57 AM
    Hi, i tried to ingest data from biquery using recipe
    Copy code
    source:
      type: bigquery-usage
      config:
        # Coordinates
        projects:
          -project
    
    sink:
      type: "datahub-rest"
      config:
        server: "<http://localhost:8080>"
    but it gives me just 1 dataset and only 2 tables. Dose anyone know what i am doing wrong?
    d
    b
    • 3
    • 8
  • f

    few-air-56117

    12/16/2021, 12:55 PM
    Hi, its posible to have a lineage from bigquery?( multiple projects)
    l
    • 2
    • 2
  • b

    brief-wolf-70822

    12/16/2021, 6:45 PM
    Hey, should we be able to profile Glue tables if we connect to Glue through the Hive interface
    l
    d
    • 3
    • 3
  • f

    full-leather-27343

    12/16/2021, 9:26 PM
    hey, from what I saw in this lineage video:

    https://www.youtube.com/watch?v=rONGpsndzRw▾

    , for Bigquery the lineage (dataset to dataset) should be created automatically. When is this happening? when you ingest data? how is it being refreshed? Thanks
    b
    l
    • 3
    • 6
  • m

    mysterious-lamp-91034

    12/17/2021, 12:11 AM
    Do we support Hive 1.2 ingestion? Looks like current hive ingestion used pyhive which is only compatible with Hive 2
    b
    l
    +2
    • 5
    • 12
  • c

    cool-painting-92220

    12/17/2021, 12:32 AM
    Hey everyone! For Snowflake metadata ingestion jobs, if I were to create a Snowflake user to access data through, what would be the bare minimum access privileges that I would need to grant the user (without any need for query stats or table lineage)?
    p
    e
    h
    • 4
    • 3
1...212223...144Latest