https://datahubproject.io logo
Join SlackCommunities
Powered by
# ingestion
  • b

    bumpy-journalist-41369

    08/18/2022, 6:48 AM
    Hello. I am setting up Datahub on Kubernetes cluster. My use case is ingesting data from an S3 DataLake. In the documentation (https://datahubproject.io/docs/generated/ingestion/sources/s3) I can see at the top of the page a capability for extracting S3 bucket/object tags. However in the config details I can see a field -use_s3_bucket_tags with the desciption “Whether or not to create tags in datahub from the s3 bucket” I cannot find a field for specifying tags as a search criteria. I also looked through the github repo examples -https://github.com/datahub-project/datahub/tree/master/metadata-ingestion/examples/recipes and couldn’t find a proper example that is similar to my usecase, so can anyone give help ?
    g
    c
    • 3
    • 6
  • s

    sparse-forest-98608

    08/18/2022, 6:49 AM
    Kindly guide me on extracting meta data from csv file, I dont have kubernetes / amazon s3 knowledge. I have some random / frictionless csv wit and without header. want to extract metadata and publish / ingest metadata to datahub
    c
    b
    • 3
    • 5
  • s

    sparse-forest-98608

    08/18/2022, 6:53 AM
    Any idea whether python emitter comes into picture for this / s3 lambda functions and event notification has to be studied
  • f

    famous-florist-7218

    08/18/2022, 8:52 AM
    Hi guys, I had a problem with S3 profiling data. In my S3 bucket, all the files have
    .json.gz
    extension. These are zipped new-line JSON, but unfortunately, data parser doesn’t support now. Is there anyway workaround? Thanks!
    g
    • 2
    • 1
  • s

    square-hair-99480

    08/18/2022, 9:24 AM
    Is there a CLI command I can use to list all my urns?
    b
    • 2
    • 12
  • t

    thankful-vr-12699

    08/18/2022, 9:32 AM
    Hi everyone! I would like to know if it's possible to add sort of container for dashboards? For exemple, a power bi workspace as a container (like schema name for tables). I know that the power bi recipe, doesn't do it. But maybe with a json or a transformer? And if we can change the browse path to have: PLATFORM/WORKSPACE/DASHBOARD If it doesn't exist yet, how can we implement it?
    g
    • 2
    • 2
  • a

    alert-fall-82501

    08/18/2022, 12:32 PM
    [2022-08-18, 104151 UTC] {pipeline.py:112} ERROR - failed to write record with workunit s3://xx.xxxx.xxxx.dev/aggregations/data/PartialPayload_daily/date=2021-07-28/part-00117-c19e20ab-b088-4e48-8d44-9855e3188256.c000.snappy.parquet with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status422] com.linkedin.metadata.entity.ValidationException: Failed to validate record with class com.linkedin.entity.Entity: ERROR ::
  • a

    alert-fall-82501

    08/18/2022, 12:32 PM
    can anybody suggest on this type of error
  • a

    alert-fall-82501

    08/18/2022, 12:32 PM
    ?
  • c

    clean-monkey-7245

    08/18/2022, 3:16 PM
    Team,
  • c

    clean-monkey-7245

    08/18/2022, 3:17 PM
    we are getting this error on kafka ingestion. not idea how to fix it Validation error of type FieldUndefined: Field 'latestVersion' in type 'GetSchemaBlameResult' is undefined @ 'getSchemaBlame/latestVersion' (code undefined)
    g
    • 2
    • 1
  • a

    ancient-apartment-23316

    08/18/2022, 4:21 PM
    Hi, I’m trying to use new source type `snowflake-beta`https://datahubproject.io/docs/generated/ingestion/sources/snowflake/#module-snowflake-beta and it didn’t work, I have an error
    Failed to create source due to 'Did not find a registered class for snowflake-beta'
    Copy code
    ingest-recipes kk$ python3 -m datahub ingest -c qweert.dhub.yaml
    [2022-08-18 19:13:16,184] INFO     {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.43.1
    [2022-08-18 19:13:16,192] INFO     {datahub.ingestion.run.pipeline:163} - Sink configured successfully.
    [2022-08-18 19:13:16,192] ERROR    {datahub.ingestion.run.pipeline:127} - 'Did not find a registered class for snowflake-beta'
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 172, in __init__
        source_class = source_registry.get(source_type)
      File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 123, in get
        raise KeyError(f"Did not find a registered class for {key}")
    KeyError: 'Did not find a registered class for snowflake-beta'
    [2022-08-18 19:13:16,198] INFO     {datahub.cli.ingest_cli:119} - Starting metadata ingestion
    [2022-08-18 19:13:16,198] INFO     {datahub.cli.ingest_cli:137} - Finished metadata ingestion
    
    Failed to create source due to 'Did not find a registered class for snowflake-beta'
    python3 -m datahub check plugins --verbose
    Copy code
    ...
    ...
    snowflake      SnowflakeSource
    snowflake-usage (disabled)          ModuleNotFoundError("No module named 'more_itertools'") 
    ...
    ...
    there is no
    snowflake-beta
    plugin I tried to install the plugin
    pip3 install 'acryl-datahub[snowflake-beta]'
    and nothing changed
    g
    m
    h
    • 4
    • 9
  • r

    rapid-house-76230

    08/18/2022, 6:07 PM
    Hello, I have a Hive/Databricks recipe that ran fine with CLI ingestion but failed with scheduled UI ingestion. 🧵
    b
    • 2
    • 44
  • w

    wooden-dress-49520

    08/18/2022, 9:51 PM
    Is there a way to add multiple databases to a single mssql source ?
    b
    g
    +2
    • 5
    • 8
  • a

    alert-fall-82501

    08/19/2022, 6:11 AM
    'failures': [{'error': 'Unable to emit metadata to DataHub GMS', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status422] '
  • a

    alert-fall-82501

    08/19/2022, 6:12 AM
    Please suggest on above error , I trying to send metadata from datalake s3 to some own datahub server as sink . I am getting only path but data is not coming .
    b
    g
    b
    • 4
    • 20
  • d

    damp-ambulance-34232

    08/19/2022, 6:47 AM
    Hello guys, how to delete all dataset from a .yml file
    g
    • 2
    • 1
  • c

    cuddly-arm-8412

    08/19/2022, 8:05 AM
    hi,team.In the datahub GMS project, I implemented custom authentication by inheriting the authenticator. I found that after my deployment, my ingestion module also reported an error when calling the ingest interface, indicating that 401 is not authorized. Is there a good solution to this problem? Can ingestion be set to skip custom authentication。 :(‘Unable to emit metadata to DataHub GMS’, {‘message’: ‘401 Client Error: Unauthorized to perform this action. for url: http:/xxxxxx/aspects?action=ingestProposal’}),cause:401
    h
    • 2
    • 1
  • a

    alert-fall-82501

    08/19/2022, 11:13 AM
    Hi Team - I have data table in s3 datalake parquet file . I am giving hard code path in base up to that folder , then issue is I am getting data with hardcore path resulted also file name also converted to folder and path . I only want exact path table at the output ,please suggest on this ?
    c
    • 2
    • 1
  • g

    gifted-bird-57147

    08/19/2022, 11:57 AM
    Hi team, since upgrading to v43 I get a warning in my ingest recipes from the yml schema checker in vscode. Does anyone know what's going on?
    teamwork 1
    h
    d
    • 3
    • 5
  • a

    aloof-ram-72401

    08/19/2022, 4:07 PM
    Hi, if a source is emitting globalTags for a dataset on ingestion, is there a way to make that not override tags that may have been added via UI? Looking for something similar to how editableSchemaMetadata works I think
    🚀 2
    h
    • 2
    • 4
  • m

    miniature-plastic-43224

    08/19/2022, 5:21 PM
    Hello, team! I have just found that LDAP ingestion appears to be not support a stateful ingestion, I have found this when added a "stateful_ingestion" into my source config for LDAP and it failed with an error: "stateful_ingestion extra fileds not permitted ". I am using configuration similar to what I used for datalake ingestion, so format I am using should be correct. Could you please confirm this is indeed a case? I see no notes in documentation about this. I am on version 8.32 and my GMS configuration has a True value for statefulIngestionCapable.
    h
    r
    g
    • 4
    • 42
  • m

    mammoth-bear-12532

    08/19/2022, 8:54 PM
    <!here> Happy Friday!
    datahub
    pypi version 0.8.43.2 just got released earlier today. 🆕
    snowflake-beta
    connector : try it out and give us early feedback! 🎉 host of improvements to other connectors, read the Release Notes for more info!
  • m

    melodic-monitor-75886

    08/19/2022, 9:00 PM
    I am trying to set up a demo for a client without live connections to their data sources. I have sample data and schemas from their system Kafka and one of their Mongo DBs right now. Can I upload these things using the Python emitter? If so, are there any complete examples of these things I can work off of?
    h
    • 2
    • 2
  • f

    famous-window-75800

    08/20/2022, 12:41 AM
    Hi Guys, I am not sure this is the right channel.. I am having hard time ingesting superset as source.. I deployed superset locally and datahub locally as well.. I see pipeline is failing.. Can someone help me. Thanks in advance..!!
    • 1
    • 1
  • b

    bland-orange-13353

    08/22/2022, 7:36 AM
    This message was deleted.
    b
    s
    • 3
    • 4
  • c

    crooked-optician-38646

    08/22/2022, 8:19 AM
    Hi guys! My datahub runs in Kubernetes. I have METADATA_SERVICE_AUTH_ENABLED = True setting and it works fine. Now, I need to remove some metadata from the datahub. When I do datahub ingest list-runs from acryl cli it tells me “Error 401 Unauthorized to perform this action” which sounds logical. I didn’t manage to find in docs the way how to pass Access Token inside that command. Could you give me a hint?
    b
    • 2
    • 3
  • s

    sparse-forest-98608

    08/22/2022, 9:28 AM
    from local path to datahub
  • b

    bland-orange-13353

    08/22/2022, 11:58 AM
    This message was deleted.
    d
    s
    g
    • 4
    • 9
  • b

    brave-pencil-21289

    08/22/2022, 8:39 AM
    I am trying to ingest DB2 for i series in datahub using SQL alchemy dielect but getting an error like ibm_db_dbi::programming error: IBM SQL1598N An attempt to connect to the database server failed because of a licensing problem.
    d
    • 2
    • 1
1...626364...144Latest