https://datahubproject.io logo
Join SlackCommunities
Powered by
# ingestion
  • w

    white-horse-97256

    02/13/2023, 7:14 PM
    Hi, I am getting the following error when trying to create a column-level lineage:
    Copy code
    The datum UpstreamLineageClass({'upstreams': [UpstreamClass({'auditStamp': AuditStampClass({'time': 0, 'actor': 'urn:li:corpuser:unknown', 'impersonator': None, 'message': None}), 'created': None, 'dataset': 'urn:li:dataset:(urn:li:dataPlatform:neo4j,labels.Asset,STG)', 'type': 'TRANSFORMED', 'properties': None})], 'fineGrainedLineages': [FineGrainedLineageClass({'upstreamType': 'FIELD_SET', 'upstreams': ['urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:neo4j,labels.Asset,STG),account_id)'], 'downstreamType': 'NONE', 'downstreams': [], 'transformOperation': None, 'confidenceScore': 1.0})]}) is not an example of the schema.
    ✅ 1
    a
    • 2
    • 5
  • p

    powerful-telephone-2424

    02/13/2023, 9:32 PM
    Hi folks, I’m trying to understand executors in DataHub Ingestion with the goal to write my own executor. I couldn’t find any documentation on how to do something like this. Are any pointers from the community on how I can get started?
    m
    • 2
    • 2
  • c

    cold-airport-17919

    02/13/2023, 9:57 PM
    Hi, I have metadata details i.e. field name, type, description on a spreadsheet for a dataset. Can I read this data into datahub? Is there an option? I was thinking of the csv-enricher module but I believe it works to enrich the existing dataset. Thank you Ballu
    ✅ 1
    b
    a
    • 3
    • 5
  • b

    bland-lighter-26751

    02/14/2023, 12:03 AM
    Hey everyone, Updated to v0.10.0, did a reingest, and it looks like lineage between Metabase and BigQuery doesn't work at all now? All the mappings are gone. Anyone else that uses the two see this?
    d
    • 2
    • 2
  • a

    ambitious-notebook-45027

    02/14/2023, 2:11 AM
    hello,i want ingest hive DB,and get error like this
    Copy code
    FAILED: SemanticException [Error 10056]:
        Queries against partitioned tables without a partition filter are disabled for safety reasons.
        If you know what you are doing, please set hive.strict.checks.no.partition.
        filter to false and make sure that hive.mapred.mode is not set to 'strict' to proceed.
        Note that you may get errors or incorrect results if you make a mistake while using some of the unsafe features.
        No partition predicate for Alias "lubian" Table "lubian"
    how can i do?@Mayuri N
    ✅ 1
    d
    • 2
    • 2
  • h

    hallowed-shampoo-52722

    02/14/2023, 5:10 AM
    https://datahubspace.slack.com/archives/C033H1QJ28Y/p1676323639995389
    d
    • 2
    • 3
  • p

    plain-cricket-83456

    02/14/2023, 7:26 AM
    @hundreds-photographer-13496 hello,keyloack is used to implement single sign-on (SSO). If an application performs keyloack logout (this application shares the keyloack service with datahub), how does datahub implement linkage logout?
    a
    • 2
    • 2
  • s

    shy-hairdresser-85182

    02/14/2023, 9:18 AM
    Hi Guys, I have made code changes to mssql plugin to read platform from recipe file and provided platform as synapse to ingest synapse data..it's able to ingest as expected in ui..but it's not reflecting it's default browse path by decoding dataset urn..how to enable that default behaviour without using transformer at recipe level?!
    a
    • 2
    • 1
  • c

    colossal-smartphone-90274

    02/14/2023, 12:24 PM
    Hi all, One of the data sources I am using is powerbi-report-server however due to a recent upgrade of our on-premise PowerBI system, the ingest appears to no longer work. The issue is to do with this function in the report_server.py file (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/powerbi_report_server/report_server.py) I am using this version -> Version1.15.8377.1837(September 2022)
    Copy code
    def get_all_reports(self) -> List[Any]:
            """
            Fetch all Reports from PowerBI Report Server
            """
            report_types_mapping: Dict[str, Any] = {
                Constant.REPORTS: Report,
                Constant.MOBILE_REPORTS: MobileReport,
                Constant.LINKED_REPORTS: LinkedReport,
                Constant.POWERBI_REPORTS: PowerBiReport,
            }
    On the PoC version of DataHub, I removed the MOBILE_REPORTS line of the code snippet and the ingest worked again however I will need a different strategy for my OpenShift deployment. Has anyone else had this issue with the ingest?
    d
    g
    • 3
    • 3
  • r

    rich-pager-68736

    02/14/2023, 2:43 PM
    Hi guys, during ingestion, we also extracted usage stats including the top users for our assets. However, due to some internal regulations we have to remove those. I already changed the recipe to not ingest that information anymore, but how can I delete those already ingested top users? Rolling back everything seems a bit crude... I have not found any way to do this - any advice?
    ✅ 1
    a
    • 2
    • 2
  • d

    dazzling-microphone-98929

    02/14/2023, 2:47 PM
    Hi everyone, I have a doubt about dataset type mapping. My Power BI data source is Redshift, can I ingest the data?
    d
    • 2
    • 2
  • l

    lemon-scooter-69730

    02/14/2023, 5:22 PM
    Suddenly bigquery ingest failing with this error
    Copy code
    ('Failed to load service account credentials from /tmp/tmpuvp2cqms', ValueError('Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', [_OpenSSLErrorWithText(code=503841036, lib=60, reason=524556, reason_text=b'error:1E08010C:DECODER routines::unsupported')]))
    b
    • 2
    • 8
  • t

    tall-caravan-42586

    02/14/2023, 5:53 PM
    Hi Team
    b
    • 2
    • 3
  • t

    tall-caravan-42586

    02/14/2023, 5:54 PM
    metadata-ingestion build failing with this error , please help me
  • f

    fancy-crayon-39356

    02/14/2023, 6:57 PM
    Hello team! Here at my company we recently rolled out datahub to production and we are becoming heavy users of it ❤️ However I've recently noticed a problem when ingesting both DBT and Snowflake sources. The bundled resource (dbt+snowflake) has duplicated columns. This is due to a lowercase urn being ingested with an uppercase urn. I'm running
    datahub cli
    on version
    v0.10.0
    . Digging into this problem I've found this PR: https://github.com/datahub-project/datahub/pull/7063/files that changed the
    DBTColumn
    name from
    catalog_column["name"].lower()
    to
    catalog_column["name"]
    . Essentially, making the column URN the same as in the catalog (which comes from snowflake and, in that case, its uppercased). The problem is that in the Snowflake recipe we are lowercasing urns by default (
    convert_urns_to_lowercase=True
    ), causing the mismatch. What is the standard going forward here? Are we sticking to lowercase urns to ensure cross-platform compatibility or DBT will use whatever is defined in the catalog? I'm happy to submit a PR to maybe introduce a
    convert_urns_to_lowercase
    flag to the DBT recipe as well, if that's the standard going forward.
    b
    c
    • 3
    • 6
  • b

    bland-barista-59197

    02/14/2023, 7:00 PM
    Hi Team, I have following question 1. Can datahub integrate with third party secret manager? 2. is there any way to trigger third party web api / process after ingestion like Google DLP or Microsoft Purview?
    b
    • 2
    • 4
  • e

    enough-lamp-79907

    02/14/2023, 7:21 PM
    Hello Team, I have multiple files in S3 bucket, which contains same kind of data but partitioned by date folder. I create a ingestion for s3 and it creates a ingestion for each parquet file (like 1000s injestions). Is it possible to kind of have single metadata for all those files because the metadata would be same but also keeping count and other info of the files.
    source:
    type: s3
    config:
    path_specs:
    - include: <s3://test/dumps/kafka/order/daily/{partition_key[0]}={partition[0]}/*.parquet>
    aws_config:
    aws_profile: dev
    aws_region: eu-central-1
    env: "dev"
    profiling:
    enabled: false
    ✅ 1
    b
    d
    • 3
    • 4
  • w

    white-horse-97256

    02/14/2023, 9:39 PM
    Hi again, I have created a mysql recipe file and executed it through datahub cli. I have later created and ran a new recipe file with another host name and credentials...i see that it replaced my first file config in the UI. Question is is there way to not override existing recipe files and create a new entry/job for new recipe file on UI?
    a
    d
    • 3
    • 11
  • p

    polite-actor-701

    02/15/2023, 1:55 AM
    Hi all. I have a question about Ingestion. When I ingest metadata from Tableau, some Entities(Workbooks/Dashbords/Charts) are missing. If I ingest the same Project again, some of the missing Entities are ingested. And others are still missing. But there is no error in gms or ingest logs. What's the problem? Is this bug?
    h
    • 2
    • 5
  • c

    calm-jewelry-98911

    02/15/2023, 3:21 AM
    Hey guys, I seem to have (soft) conked my gms service in local. Essentially I am trying to mark a user as inactive/Suspended. I set the corpUser's
    CorpUserStatus
    as suspended using the MCP below (and subsequently emitting it) ->
    Copy code
    In [47]: import time
        ...: mcp2 = MetadataChangeProposalWrapper(
        ...:     entityType="corpuser",
        ...:     changeType=ChangeTypeClass.UPSERT,
        ...:     entityUrn=make_user_urn('apanwar'),
        ...:     aspectName=CorpUserStatusClass.get_aspect_name(),
        ...:     aspect=CorpUserStatusClass(
        ...:         status='SUSPENDED',
        ...:         lastModified=AuditStampClass(
        ...:             time=int(time.time()*1000),
        ...:             actor='urn:li:corpuser:datahub'
        ...:         )
        ...:     ),
        ...: )
    The GMS logs have captured an error related to this -
    Copy code
    Caused by: java.lang.IllegalArgumentException: No enum constant com.linkedin.datahub.graphql.generated.CorpUserStatus.SUSPENDED
            at java.base/java.lang.Enum.valueOf(Enum.java:240)
            at com.linkedin.datahub.graphql.generated.CorpUserStatus.valueOf(CorpUserStatus.java:6)
            at com.linkedin.datahub.graphql.types.corpuser.mappers.CorpUserStatusMapper.apply(CorpUserStatusMapper.java:19)
            at com.linkedin.datahub.graphql.types.corpuser.mappers.CorpUserStatusMapper.map(CorpUserStatusMapper.java:13)
            at com.linkedin.datahub.graphql.types.corpuser.mappers.CorpUserMapper.lambda$apply$3(CorpUserMapper.java:64)
            at com.linkedin.datahub.graphql.types.common.mappers.util.MappingHelper.mapToResult(MappingHelper.java:22)
            at com.linkedin.datahub.graphql.types.corpuser.mappers.CorpUserMapper.apply(CorpUserMapper.java:63)
            at com.linkedin.datahub.graphql.types.corpuser.mappers.CorpUserMapper.map(CorpUserMapper.java:46)
            at com.linkedin.datahub.graphql.types.corpuser.CorpUserType.lambda$batchLoad$0(CorpUserType.java:95)
            at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
            at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
            at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
            at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
            at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
            at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
            at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
            at com.linkedin.datahub.graphql.types.corpuser.CorpUserType.batchLoad(CorpUserType.java:96)
            ... 18 common frames omitted
    Was wondering if I missed something here? I thought SUSPENDED would be a valid value for the
    CorpUserStatus.status
    as mentioned in the schema class' getter and setter ->
  • p

    plain-nest-12882

    02/15/2023, 5:29 AM
    Howdy, Does datahub supports a custom action defined by great expectations(gx)? If so, can I have the datahub Post API that is used to push the results to datahub using non sql alchemy engines
    ✅ 1
    h
    • 2
    • 5
  • n

    numerous-account-62719

    02/15/2023, 8:46 AM
    Hi Team, Is InfluxDB supported in datahub? If yes then how to ingest the data from InfluxDB
    g
    d
    c
    • 4
    • 10
  • r

    rich-policeman-92383

    02/15/2023, 10:05 AM
    Helllo LDAP source creates user based on sAMAccountName AD attribute https://github.com/datahub-project/datahub/blob/v0.9.5/metadata-ingestion/src/datahub/ingestion/source/ldap.py#L47 Is there a way to use a filter like the one we use in datahub-fronted "AUTH_OIDC_USER_NAME_CLAIM=email; AUTH_OIDC_USER_NAME_CLAIM_REGEX=([^@]+)". Problem is that the user created by LDAP source is different from the one created by frontend. datahub version: v0.9.5
    • 1
    • 1
  • b

    broad-wire-76841

    02/15/2023, 10:49 AM
    hello team, I am creating a pipeline(emitter) which tags an owner to a an entity. now wanted to know, what to do in scenarios when the said user does not exist in datahub yet. ? is there a way to identify if user exists and if not create first programatically?
    ✅ 1
    d
    • 2
    • 2
  • r

    ripe-eye-60209

    02/15/2023, 6:31 PM
    Hello Team, given the powerbi ingestion, the source is connected successfully but we get no events generated. what could be the issue here? any idea?
    d
    g
    • 3
    • 5
  • l

    limited-forest-73733

    02/15/2023, 6:50 PM
    Hey team i ingested my snowflake tables and enabled the profiling but its showing uknown in each columns. I am attaching my recipe and UI table stats. can anyone please help me out
    ✅ 1
    d
    • 2
    • 24
  • w

    white-horse-97256

    02/15/2023, 7:34 PM
    Hi Team, question regarding ingestion, is there any difference between scheduling a python function to run the mysql ingestion via Python emitter vs scheduling a recipe yml script in datahub tool for ingestion?
    d
    • 2
    • 2
  • c

    calm-jewelry-98911

    02/15/2023, 7:55 PM
    Hey team, still looking for answers on my question. Basically, what is the best way to mark a CorpUser as SUSPENDED/Inactive (I used
    CorpUserStatus
    but faced issues as described in the OG question) - https://datahubspace.slack.com/archives/CUMUWQU66/p1676431303025729
    b
    • 2
    • 2
  • s

    silly-dog-87292

    02/15/2023, 7:57 PM
    Hello Team, I am trying to ingest my spark application lineage to data hub (running on seperate EC2 machine with docker quickstart). I followed the steps in the documentation , however on running the spark application , i still dont see any metadata/lineage getting captured in my datahub system (EC2). Any thoughts on what could be the problem?
    d
    • 2
    • 1
  • w

    white-horse-97256

    02/15/2023, 10:28 PM
    Hi Team, is there a java package to create a pipeline to ingest from mysql source similar to this python script https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/programatic_pipeline.py
    ✅ 1
    a
    • 2
    • 6
1...103104105...144Latest