https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • b

    brave-tomato-16287

    08/02/2022, 1:30 PM
    Hello All. If I decide to delete some objects using the API, what permissions should I have?
    g
    • 2
    • 1
  • b

    bright-diamond-60933

    08/02/2022, 2:32 PM
    Is anyone using an IDE other than IntelliJ for working with the DataHub code?
    g
    • 2
    • 1
  • b

    busy-analyst-8258

    08/02/2022, 3:32 PM
    Hello Everyone, I am using the emitter = DatahubRestEmitter("http://datahub-datahub-gms:8080") it was working fine, until we turned on Authorization =true, now when i use the DatahubRestEmitter with the token it says syntax error. how to pass the token values in the DatahubRestEmitter?
    g
    • 2
    • 1
  • t

    thousands-secretary-3263

    08/02/2022, 4:41 PM
    Hello! I am trying to deploy datahub on a virtual machine, but when I run 'datahub docker quickstart' I get this error message.
    g
    • 2
    • 3
  • a

    astonishing-lizard-90580

    08/02/2022, 5:48 PM
    Hey folks, I just wanted to check -- has anyone been able to successfully run the quickstart, locally on a Windows machine, recently? (meaning the newest version of quickstart) • Following the quickstart guide exactly https://datahubproject.io/docs/quickstart/ • With no modifications whatsoever • Meeting all the requirements (8gb ram, etc.)
    s
    g
    • 3
    • 13
  • b

    bumpy-pilot-52145

    08/02/2022, 7:50 PM
    Hi Datahub community! I’m sure this has been asked before, but is there a way to know what the status of RBAC RFC is? cc @big-carpet-38439 @wonderful-author-3020
    g
    • 2
    • 6
  • q

    quiet-wolf-56299

    08/02/2022, 8:12 PM
    Running into an issue with the quickstart script where I am getting Java library initialization failed - unable to allocate file descriptor table - out of memory===> ENV Variables ... and similar from Zookeeper, Broker, and Schema Registry. Obviously this is preventing the app from properly launching. Fedora in VMware Fusion, 12gb ram, 40gb HDD, 4 cpu cores. From what I understand docker should have all the resources it needs available to it since i’m running in a linux host. I did try a couple of things. I was able to work around this one time by setting the default ulimit nofile for docker.service to something like 65535. However on a new fresh vm that workaround is no longer working. I’ve also tried editing the compose file and using a custom version to provide more memory to the JVM via the -Xmx parameter for the relevant containers. Neither have seemed to solve my issue this time around.
    g
    • 2
    • 14
  • a

    alert-ram-30868

    08/03/2022, 6:12 AM
    Copy code
    class SchemaFieldClass(DictWrapper):
        """SchemaField to describe metadata related to dataset schema."""
        
        RECORD_SCHEMA = get_schema_type("com.linkedin.pegasus2avro.schema.SchemaField")
        def __init__(self,
            fieldPath: str,
            type: "SchemaFieldDataTypeClass",
            nativeDataType: str,
            jsonPath: Union[None, str]=None,
            nullable: Optional[bool]=None,
            description: Union[None, str]=None,
            recursive: Optional[bool]=None,
            globalTags: Union[None, "GlobalTagsClass"]=None,
            glossaryTerms: Union[None, "GlossaryTermsClass"]=None,
            isPartOfKey: Optional[bool]=None,
            jsonProps: Union[None, str]=None,
        ):
            super().__init__()
            
            self.fieldPath = fieldPath
            self.jsonPath = jsonPath
            if nullable is None:
                # default: False
                self.nullable = self.RECORD_SCHEMA.fields_dict["nullable"].default
            else:
                self.nullable = nullable
            self.description = description
            self.type = type
            self.nativeDataType = nativeDataType
            if recursive is None:
                # default: False
                self.recursive = self.RECORD_SCHEMA.fields_dict["recursive"].default
            else:
                self.recursive = recursive
            self.globalTags = globalTags
            self.glossaryTerms = glossaryTerms
            if isPartOfKey is None:
                # default: False
                self.isPartOfKey = self.RECORD_SCHEMA.fields_dict["isPartOfKey"].default
            else:
                self.isPartOfKey = isPartOfKey
            self.jsonProps = jsonProps
    In Schema field Class , What is the recursive in this ? , is it related to nested fields in the dataset?
  • l

    lemon-engine-23512

    08/03/2022, 8:18 AM
    Hello all, glue properties have information like base tables and pipeline name. Can datahub use this to create lineage?
  • f

    full-chef-85630

    08/03/2022, 8:34 AM
    Hi all k8s missing file
    Copy code
    Error: INSTALLATION FAILED: template: datahub/charts/datahub-mce-consumer/templates/deployment.yaml:145:12: executing "datahub/charts/datahub-mce-consumer/templates/deployment.yaml" at <include "datahub-jmxexporter.container" .>: error calling include: template: datahub/charts/datahub-jmxexporter/templates/_container.tpl:4:28: executing "datahub-jmxexporter.container" at <.Values.exporters.jmx.image.repository>: nil pointer evaluating interface {}.repository
    i
    • 2
    • 1
  • b

    breezy-shoe-41523

    08/03/2022, 9:25 AM
    Hello Dears, I need some help. I’ve deployed datahub with ldap authentification and i ingested ldap user data with ldap source ingestion it works fine, but now i need to find way to add auto-provisioning users logged in via LDAP is there anyway to do this? any comment is welcome
    i
    • 2
    • 1
  • b

    billions-football-16062

    08/03/2022, 10:41 AM
    Hello everyone,I just ingest mysql metadata into datahub, but the lineage don't show up, what else do I need to do that I can check the lineage?
    g
    l
    • 3
    • 33
  • m

    mysterious-pager-59554

    08/03/2022, 3:14 PM
    Hi Team, Can the source "data lake files" be used to ingest data from Azure Data Lake or is it just limited to AWS
    plus1 1
    h
    g
    • 3
    • 2
  • f

    flaky-parrot-48828

    08/04/2022, 9:18 AM
    Hello, hope you are doing well, i got a query : for DB we are using Redshift but for pipeline we are not using Airflow.. we are using LAMBDA..so can we ingest our lineage
    d
    • 2
    • 14
  • f

    flaky-parrot-48828

    08/04/2022, 9:18 AM
    if yes then can you share relevant documentation
  • d

    damp-businessperson-33501

    08/04/2022, 11:49 AM
    Hi folks, does anyone have an example for how we add our datahubbbb Glossary Terms into our dbt files so they'll show up at column level? I'm not sure where they should be added. Or is this something that needs to be configured in meta mapping? TY in advance :)
    b
    g
    • 3
    • 12
  • q

    quiet-wolf-56299

    08/04/2022, 1:34 PM
    Not necessarily a DH issue, but it seems the MySql 5.7 container has some form of memory leak on RHEL based systems (RHEL, Fedora, CentOS) based on the default ulimit. Watching container stats, I watched MySql jump immediately to 100% memory usage, using all the memory in the machine, however setting ulimits in the compose file for the mysql container tamed it down to only use about 200mb instead of 12gb. -The more you know
    i
    • 2
    • 1
  • q

    quiet-wolf-56299

    08/04/2022, 2:53 PM
    Am I reading correctly that there are no standard amd64 packages for docker on RHEL?
    i
    • 2
    • 1
  • q

    quiet-jewelry-72419

    08/05/2022, 1:29 AM
    I have a question, if JSON is used in Kafka layer and integration is happening to Datahub. Will the schema information and cataloguing still happen. What's the limitation of json vs avro here
    i
    • 2
    • 1
  • t

    tall-butcher-30509

    08/05/2022, 2:58 AM
    How can I resolve OIDC authentication to policies? Is it required to set this for every user? Is there a way to assign a policy based on external IAM, e.g. in Google Cloud?
    g
    • 2
    • 4
  • k

    kind-whale-32412

    08/05/2022, 7:16 PM
    Hello I realized this page exists for tags: https://datahubproject.io/docs/tags and then this page for the schema: https://datahubproject.io/docs/schema-history If my tags change over time, can I see the history of it somewhere? Are they stored anywhere?
    g
    • 2
    • 3
  • k

    kind-whale-32412

    08/05/2022, 7:51 PM
    ^^^ I also noticed adding tags keep a historic data in metadata_aspect_v2 table with an aspect row entry called
    editableSchemaMetadata
    . Are these entries persisted forever? Also I noticed the order is kept as version 0 being the current; and version 1 is the oldest and all the subsequent versions are later additions
    g
    • 2
    • 7
  • l

    lemon-engine-23512

    08/08/2022, 9:28 AM
    Hello, i would like to understand more about lineage format, how to build custom lineage for a source. Example from glue to build lineage based on the basetables . Apart from git hub is there a place i can refer more
    l
    • 2
    • 1
  • f

    full-chef-85630

    08/08/2022, 1:30 PM
    Hi all,If Kafka and zookeeper are not containerized, what should be paid attention to in the version.For example, mysql, ES and neo4j do not need to be containerized. What should we pay attention to
    e
    • 2
    • 1
  • f

    famous-fireman-41042

    08/08/2022, 4:12 PM
    Hello everyone, I want to start using DataHub in our company, I need to come up with some answers before I start it out, can you help me? 1. I want the bare minimum instance of DataHub to start showing its value and grow as needed, does the helm setup (on EKS) considered the bare minimum? is there an EC2 setup/guide? (AWS) 2. With that said, is there any predictability on the resource costs? (if EKS, how much is needed, if EC2, how much is needed? - obviously, id depends on the answer to question 1) 3. Does all of the requirements for DataHub on EKS/EC2 containerized? if I start the cluster/EC2 instance and then delete it - nothing changed in my AWS subscriptions? none of the services is managed? Thank you!
    i
    • 2
    • 7
  • h

    helpful-greece-26038

    08/08/2022, 8:41 PM
    I was able to get DataHub installed with a lot of effort and I am able to login to the UI using the default credentials. However, it appears that the default user is not authorized to do anything. For example, I tried to create a domain and got an unauthorized error. When I try to view my profile, I also get an unauthorized error. Does anyone have suggestions for how to resolve this?
    b
    • 2
    • 1
  • s

    square-hair-99480

    08/09/2022, 9:25 AM
    Hi friends, any tips on best practices for creating users and groups and managing privileges? I was reading this (https://datahubproject.io/docs/authorization/policies) and although it is helpful it does not say a lot. For instance, I am using docker-compose to deploy Datahub and I do not have clear where the file
    policies.json
    and if I can manage it under git control? If you are aware of any further documentation, webinars or blogs on it please share!
    e
    • 2
    • 6
  • a

    agreeable-army-26750

    08/09/2022, 10:05 AM
    HI guys! I have an M1 Mac and tried to quickstart the application with the following config file: docker-compose -f docker-compose-without-neo4j-m1.quickstart.yml up -d But unfortunately the gms service stops after a few seconds. Can you help me please resolve the issue? Thanks in advance! The docker log is this:
    Copy code
    + echo
    + grep -q ://
    + NEO4J_HOST=http://
    + [[ ! -z '' ]]
    + [[ -z '' ]]
    + ELASTICSEARCH_AUTH_HEADER='Accept: */*'
    + [[ '' == true ]]
    + ELASTICSEARCH_PROTOCOL=http
    + WAIT_FOR_EBEAN=
    + [[ '' != true ]]
    + [[ '' == ebean ]]
    + [[ -z '' ]]
    + WAIT_FOR_EBEAN=' -wait <tcp://mysql:3306> '
    + WAIT_FOR_CASSANDRA=
    + [[ '' == cassandra ]]
    + WAIT_FOR_KAFKA=
    + [[ '' != true ]]
    ++ echo broker:29092
    ++ sed 's/,/ -wait tcp:\/\//g'
    + WAIT_FOR_KAFKA=' -wait <tcp://broker:29092> '
    + WAIT_FOR_NEO4J=
    + [[ elasticsearch != elasticsearch ]]
    + OTEL_AGENT=
    + [[ '' == true ]]
    + PROMETHEUS_AGENT=
    + [[ '' == true ]]
    + auth_resource_dir=/etc/datahub/plugins/auth/resources
    + COMMON='
         -wait <tcp://mysql:3306>            -wait <tcp://broker:29092>           -timeout 240s     java -Xms1g -Xmx1g                -jar /jetty-runner.jar     --jar jetty-util.jar     --jar jetty-jmx.jar --classes /etc/datahub/plugins/auth/resources     --config /datahub/datahub-gms/scripts/jetty.xml     /datahub/datahub-gms/bin/war.war'
    + [[ '' != true ]]
    + exec dockerize -wait <http://elasticsearch:9200> -wait-http-header 'Accept: */*' -wait <tcp://mysql:3306> -wait <tcp://broker:29092> -timeout 240s java -Xms1g -Xmx1g -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --classes /etc/datahub/plugins/auth/resources --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war
    2022/08/09 10:00:12 Waiting for: <http://elasticsearch:9200>
    2022/08/09 10:00:12 Waiting for: <tcp://mysql:3306>
    2022/08/09 10:00:12 Waiting for: <tcp://broker:29092>
    2022/08/09 10:00:12 Connected to <tcp://mysql:3306>
    2022/08/09 10:00:12 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:12 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:13 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:13 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:14 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:14 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:15 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:15 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:16 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:16 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:17 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:17 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:18 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:18 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:19 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:19 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:20 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:20 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:21 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:21 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:22 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:22 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:23 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:23 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:24 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:24 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:25 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:25 Problem with dial: dial tcp 172.21.0.8:29092: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:26 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:26 Connected to <tcp://broker:29092>
    2022/08/09 10:00:27 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:28 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:29 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:30 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:31 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:32 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:33 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:34 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:35 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:36 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:37 Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.21.0.3:9200: connect: connection refused. Sleeping 1s
    2022/08/09 10:00:38 Received 200 from <http://elasticsearch:9200>
    2022-08-09 10:00:38.917:INFO::main: Logging initialized @546ms to org.eclipse.jetty.util.log.StdErrLog
    WARNING: jetty-runner is deprecated.
             See Jetty Documentation for startup options
             <https://www.eclipse.org/jetty/documentation/>
    ERROR: No such classes directory file:///etc/datahub/plugins/auth/resources
    Usage: java [-Djetty.home=dir] -jar jetty-runner.jar [--help|--version] [ server opts] [[ context opts] context ...] 
    Server opts:
     --version                           - display version and exit
     --log file                          - request log filename (with optional 'yyyy_mm_dd' wildcard
     --out file                          - info/warn/debug log filename (with optional 'yyyy_mm_dd' wildcard
     --host name|ip                      - interface to listen on (default is all interfaces)
     --port n                            - port to listen on (default 8080)
     --stop-port n                       - port to listen for stop command (or -DSTOP.PORT=n)
     --stop-key n                        - security string for stop command (required if --stop-port is present) (or -DSTOP.KEY=n)
     [--jar file]*n                      - each tuple specifies an extra jar to be added to the classloader
     [--lib dir]*n                       - each tuple specifies an extra directory of jars to be added to the classloader
     [--classes dir]*n                   - each tuple specifies an extra directory of classes to be added to the classloader
     --stats [unsecure|realm.properties] - enable stats gathering servlet context
     [--config file]*n                   - each tuple specifies the name of a jetty xml config file to apply (in the order defined)
    Context opts:
     [[--path /path] context]*n          - WAR file, web app dir or context xml file, optionally with a context path
    2022/08/09 10:00:38 Command exited with error: exit status 1
    i
    • 2
    • 23
  • f

    famous-fireman-41042

    08/09/2022, 10:18 AM
    Hello everyone, We are thinking of starting out with EC2 instance (4 cores, 16GB) as @incalculable-ocean-74010 advised. What if later on, we would like to migrate to the K8s solution, would we be able to do so without losing data? How difficult would that process be? Someone has ever done this kind of migration? It makes sense that just other data sources, datahub could be one source and another datahub could be the sink. - does this feature exists? Thank you!
    i
    • 2
    • 1
  • a

    agreeable-army-26750

    08/09/2022, 1:29 PM
    Hi guys! Is there any way to create glossary terms with custom properties with the OpenAPI? (If there is a way, can you provide an example?) Thanks in advance!
    o
    • 2
    • 6
1...373839...80Latest