https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • b

    busy-computer-98970

    08/30/2022, 6:36 PM
    Hey guys, someone already deployed datahub in a AWS Fargate Spot?
  • k

    kind-whale-32412

    08/31/2022, 4:44 PM
    Hello I read this file path spec v2: https://github.com/datahub-project/datahub/blob/master/docs/advanced/field-path-spec-v2.md Is there a util class for java to convert v2 format into v1?
    b
    • 2
    • 2
  • f

    future-queen-2284

    08/31/2022, 6:38 PM
    Good afternoon, folks. Quite new to DataHub, looking forward to explore a bit more of this intriguing tool. During setup, I find a small issue: Port 8081 (default for schema-registry) is used by my office VPN, so I can't kill it. I tried
    datahub docker quickstart --schema-registry-port 8090
    but it still builds on 8081 and error fails. Any ideas how can I get around this? Thank you in advance!
    g
    • 2
    • 4
  • m

    mysterious-lamp-91034

    09/01/2022, 4:58 AM
    Hello. I tried to create a glossary term, associate to a column, but I cannot find it. Here is the details
    plus1 1
    b
    • 2
    • 43
  • l

    late-truck-7887

    09/01/2022, 6:53 AM
    Hi datahub team, new to datahub here and want to store a dataset name that is distinct from urn — i.e. in my use case the URN is a path on s3, but goal is to have dataset name in datahub be a nice human-readable name. What I tried so far (in Python):
    Copy code
    aspects = [
            DatasetPropertiesClass(
                name=nice_human_readable_name,
                customProperties=properties,
                description=description,
                externalUrl=url
            ),
        ]
    or alternatively:
    Copy code
    aspects = [
            DatasetPropertiesClass(
                qualifiedName=nice_human_readable_name,
                customProperties=properties,
                description=description,
                externalUrl=url
            ),
        ]
    The mcps that are generated have proper format:
    connection.submit_change_proposals
    Copy code
    [MetadataChangeProposalWrapper(entityType='dataset', changeType='UPSERT', entityUrn='urn:li:dataset:(urn:li:dataPlatform:s3,test_s3_dataset3567c322-fd92-4417-98f0-90a66e32101b,PROD)', entityKeyAspect=None, auditHeader=None, aspectName='ownership', aspect=OwnershipClass({'owners': [OwnerClass({'owner': 'urn:li:corpuser:etl', 'type': 'DATAOWNER', 'source': OwnershipSourceClass({'type': 'SERVICE', 'url': None})})], 'lastModified': AuditStampClass({'time': 1661399154, 'actor': 'urn:li:corpuser:etl', 'impersonator': None, 'message': None})}), systemMetadata=None), MetadataChangeProposalWrapper(entityType='dataset', changeType='UPSERT', entityUrn='urn:li:dataset:(urn:li:dataPlatform:s3,test_s3_dataset3567c322-fd92-4417-98f0-90a66e32101b,PROD)', entityKeyAspect=None, auditHeader=None, aspectName='datasetProperties', aspect=DatasetPropertiesClass({'customProperties': {'here3567c322-fd92-4417-98f0-90a66e32101b': 'are some fake properties', 'that_are': 'used_for_testing'}, 'externalUrl': None, 'name': 'test_s3_dataset3567c322-fd92-4417-98f0-90a66e32101b', 'qualifiedName': None, 'description': 'This is a fake description of a dataset', 'uri': None, 'tags': []}), systemMetadata=None), MetadataChangeProposalWrapper(entityType='dataset', changeType='UPSERT', entityUrn='urn:li:dataset:(urn:li:dataPlatform:s3,test_s3_dataset3567c322-fd92-4417-98f0-90a66e32101b,PROD)', entityKeyAspect=None, auditHeader=None, aspectName='institutionalMemory', aspect=InstitutionalMemoryClass({'elements': [InstitutionalMemoryMetadataClass({'url': '<https://www.google.com/>', 'description': 'link3567c322-fd92-4417-98f0-90a66e32101b', 'createStamp': AuditStampClass({'time': 1661399154, 'actor': 'urn:li:corpuser:etl', 'impersonator': None, 'message': None})})]}), systemMetadata=None), MetadataChangeProposalWrapper(entityType='dataset', changeType='UPSERT', entityUrn='urn:li:dataset:(urn:li:dataPlatform:s3,test_s3_dataset3567c322-fd92-4417-98f0-90a66e32101b,PROD)', entityKeyAspect=None, auditHeader=None, aspectName='globalTags', aspect=GlobalTagsClass({'tags': [TagAssociationClass({'tag': 'urn:li:tag:tag13567c322-fd92-4417-98f0-90a66e32101b', 'context': None}), TagAssociationClass({'tag': 'urn:li:tag:tag_23567c322-fd92-4417-98f0-90a66e32101b', 'context': None})]}), systemMetadata=None)]
    but then this rather crytpic error message (see attached screenshot). Any advise appreciated! Thanks!
    g
    w
    • 3
    • 4
  • b

    breezy-shoe-41523

    09/02/2022, 8:09 AM
    hello team i’m looking for adding
    Copy code
    "index.routing.allocation.require.boxtype": "hot"
    to all indices that datahub creates in Elasticsearch i’m deploying datahub with helm chart is there any easiest way to do this??
  • m

    melodic-beach-18239

    09/02/2022, 8:24 AM
    Hi, all, i am new to Datahub. And i used docker with command "datahub docker quickstart --mysql-port 53306"
  • m

    melodic-beach-18239

    09/02/2022, 8:25 AM
    But i cannot add MySQL ingestion, caused by "ModuleNotFoundError: No module named 'pymysql'"
  • m

    melodic-beach-18239

    09/02/2022, 8:25 AM
    how can i do?
  • h

    helpful-london-56362

    09/02/2022, 10:52 AM
    Hello, I'm a new platform engineer at Acrotrend. I'm learning the ropes and trying to build a Datahub-frontend docker image with DockerKit. I've consistently run into this error. Here is the DockerKit command I use, straight from the docs. I'm using an M1
    Copy code
    DOCKER_BUILDKIT=1 docker build -t my_frontend -f ./docker/datahub-frontend/Dockerfile .
    d
    • 2
    • 3
  • v

    victorious-spoon-76468

    09/02/2022, 5:27 PM
    Hello! I'm currently looking for user activity logs (who accessed what, who ingested what, etc) but couldn't find anything about it. Is it possible to get this kind of information?
  • d

    delightful-zebra-4875

    09/05/2022, 9:30 AM
    I want to replace elasticsearch7.9.3 with elasticsearch6.8 version, can this be achieved, I will try it now and get an error
  • b

    best-fireman-42901

    09/05/2022, 2:01 PM
    Hi. We have a deployment in AWS. Its deployed ok and we can access the UI, however my colleague (who's used datahub himself but using docker) is saying we need to be able to access gms by adding :8080 on the end of the URL - this doesnt work for us and times out. We can see the gms service listed when running 'kubectl get services' and we can find the LB in the console, but i cant see anywhere in the documentation that advises how to confiure this so we can access it? The documentation appears to just stop with the front end instructions? I thought this would just work since we're deploying it into AWS/EKS using default configuration? Can anyone help please? Thanks
    l
    b
    • 3
    • 2
  • s

    shy-kitchen-7972

    09/05/2022, 2:19 PM
    Hi all, when we add a related term as term A contains term B, I would expect that on the page of term B we see the term A in the inherits section. Is this indeed the expected behavior and can you confirm that you are not able to do this. I never see the relation in both directions.
    b
    • 2
    • 4
  • b

    breezy-shoe-41523

    09/06/2022, 4:25 AM
    hi Team, I have question about es, mysql is es used for searching in datahub? if es is used for searching in datahub then what happens if i delete all indices in elasticsearch and keep mysql ? Is there any logic in datahub to make sync right between elasticsearch and mysql? Thanks
    b
    • 2
    • 8
  • f

    flaky-soccer-57765

    09/06/2022, 3:28 PM
    Hello all, newbie here, I have a MSQL source which I have ingested in to datahub. I also have a Yaml file for each table that has more meta data info about that table in key:value pair. What would be best approach to load those details in the properties page in the data hub UI please?
    h
    • 2
    • 5
  • w

    wonderful-egg-79350

    09/07/2022, 11:14 AM
    Hi All. Is there any way to back up data(column description, table documentation, owners, tags etc?
    b
    • 2
    • 1
  • m

    modern-electrician-34675

    09/07/2022, 11:42 PM
    Hi. I'm looking into metadata service auth here and it seems like the REST sink supports auth, while the kafka sink does not. How is this separation implemented from an API perspective? Does kafka hit the metadata service directly on some endpoint that doesn't require auth?
  • r

    rhythmic-nest-54679

    09/08/2022, 2:40 AM
    is this just a tool for scratching like ddl from MySQL? Could I ingestion all real data from mysql distributed in different systems
    b
    • 2
    • 2
  • s

    salmon-rose-54694

    09/08/2022, 5:58 AM
    hi team, Akka changed the license, will this affect datahub ? https://www.lightbend.com/blog/why-we-are-changing-the-license-for-akka
    • 1
    • 1
  • r

    ripe-tiger-90198

    09/08/2022, 11:10 AM
    Hello Team, I’m trying to set up the Google Auth OIDC while using the DataHub quickstart command. Not sure how I can update the docker.env file to apply the new OIDC settings. Please help with the required steps.
    b
    b
    • 3
    • 5
  • d

    dry-cpu-22075

    09/08/2022, 8:29 PM
    Hey all, I'm trying to get DataHub running locally with Airflow, using Astronomer (runtime 5.0.5), which is also running locally. I'm running into the same error as some other users have posted relating to connection issues:
    Copy code
    datahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS', {'message': "HTTPSConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /aspects?action=ingestProposal (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2263a568e0>: Failed to establish a new connection: [Errno 111] Connection refused'))"})
    with my connection set in airflow as:
    conn_id: datahub_rest_default conn_type: datahub_rest	 host: <http://localhost:8080>
    Not sure what's going on with the connection here as I don't typically have issues connecting to other providers locally (e.g. my Snowflake connection is working fine) Thanks in advance!
    m
    • 2
    • 7
  • j

    jolly-library-86177

    09/09/2022, 8:40 AM
    👋 Hi everyone!
  • f

    famous-fall-59477

    09/09/2022, 12:44 PM
    Hello! I am looking for some help in building docker images from source. I have cloned a fork of the repository on my local and am running this to build the required images:
    docker compose build mysql-setup elasticsearch-setup kafka-setup datahub-gms datahub-frontend-react
    However, I am running into the error. The error log is pasted in the attached file. In particular, I think the issue is being caused because of gradle 6.9.2 being used instead of gradle 7. Can anyone shed any light on this issue? P.S. Also do tell if I should take this doubt to #contribute instead.
    datahub_docker_compose_error.txt
  • b

    bland-sundown-49496

    09/09/2022, 7:24 PM
    Hello, I have started setting up Datahub in my local MAC and trying to configure S3 as a data source. I am getting the below error message. I am able to list the bucket from AWS CLI succesfully. Would someone please help me? Here is the my S3 source yaml. source: type: "s3" config: platform: s3 path_spec: include: "s3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset" aws_config: aws_access_key_id: XXX aws_secret_access_key: XXX aws_region: us-east-1 env: "PROD" profiling: enabled: false # see https://datahubproject.io/docs/metadata-ingestion/sink_docs/file for complete documentation sink: type: "datahub-rest" config: server: "http://localhost:8080" ERROR MESSAGE WHEN I RAN: datahub --debug ingest -c s3-datahub.yaml File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 196, in run pipeline = Pipeline.create( File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 317, in create return cls( File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 202, in init self._record_initialization_failure( File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 129, in _record_initialization_failure raise PipelineInitError(msg) from e PipelineInitError: Failed to configure source (s3) [2022-09-09 142150,735] DEBUG {datahub.entrypoints:198} - DataHub CLI version: 0.8.44.1 at /Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/__init__.py [2022-09-09 142150,735] DEBUG {datahub.entrypoints:201} - Python version: 3.8.8 (default, Apr 13 2021, 125945) [Clang 10.0.0 ] at /Users/hgopu/opt/anaconda3/bin/python3 on macOS-10.16-x86_64-i386-64bit [2022-09-09 142150,735] DEBUG {datahub.entrypoints:204} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.44', 'commit': '2115d5bf1dc4dcfd73dbff6d41aaa08a279b62c0'}}, 'managedIngestion': {'defaultCliVersion': '0.8.42', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'quickstart'}, 'noCode': 'true'}
    b
    h
    • 3
    • 6
  • b

    breezy-shoe-41523

    09/12/2022, 3:58 PM
    Hello team i have some question about ldap login module with jaas.conf i had jaas.conf with ldap module like this
    Copy code
    WHZ-Authentication {
    com.sun.security.auth.module.LdapLoginModule sufficient
    java.naming.security.authentication="simple"
    userProvider="<ldap://ldap-server.mycompany.com>"
    authIdentity="cn={USERNAME},OU={GROUP1},OU=user,OU=COMPANY,DC=mycompanyname,DC=com"
    useSSL=false
    debug=true;
    };
    and i need to add another group like
    Copy code
    cn={USERNAME},OU={GROUP2},OU=user,OU=COMPANY,DC=mycompanyname,DC=com
    is there any method to this?? please help
    plus1 1
    b
    n
    b
    • 4
    • 4
  • b

    bland-sundown-49496

    09/12/2022, 10:53 PM
    Hello, I am setting up Datahup local (in my mac), I am getting error ({"exceptionClass":"{"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315). I see the docker container "gms" is up and running on port 8080 Can some one please help me ?
    d
    h
    g
    • 4
    • 7
  • b

    breezy-shoe-41523

    09/13/2022, 7:39 AM
    Hi Team, I have a little question how can i edit Team in the Edit Profile Through Rest.li API? I found this but i cannot find how to edit team here
    curl '<http://localhost:8080/entities?action=ingest>' -X POST --data '{
    "entity": {
    "value": {
    "com.linkedin.metadata.snapshot.CorpUserSnapshot": {
    "urn": "urn:li:corpuser:aseem.bansal",
    "aspects": [{
    "com.linkedin.identity.CorpUserInfo": {
    "active": true,
    "displayName": "Aseem Bansal",
    "email": "<mailto:aseem+example@acryl.io|aseem+example@acryl.io>",
    "title": "Software Engineer",
    "fullName": "Aseem Bansal"
    }
    }]
    }
    }
    }
    }'
    b
    • 2
    • 4
  • f

    fast-potato-13714

    09/13/2022, 10:25 AM
    Hello everybody, I'm new with datahub we are trying the whole suit and we've seen that is possible to have all the airflow runs on each table on datahub. We don't use airflow, but our own tool developed years ago. Is it possible to integrate something similar to datahub in order to see the status of all the loads of each table? Thanks in advance
    plus1 1
    d
    • 2
    • 3
  • p

    powerful-jewelry-60069

    09/13/2022, 2:15 PM
    👋 Hi everyone! I'm neawly join on datahub, i am trying to make a lineage for MySql connection but not getting.if it is possible ,please help me for that . Thanks in advance
    b
    h
    • 3
    • 3
1...414243...80Latest