https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • m

    magnificent-honey-40185

    05/30/2023, 8:49 PM
    Trying to create redshift as a source using python. Getting the below error:
    Copy code
    Traceback (most recent call last):
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 120, in _add_init_error_context
        yield
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 220, in __init__
        source_class = source_registry.get(source_type)
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/datahub/ingestion/api/registry.py", line 183, in get
        tp = self._ensure_not_lazy(key)
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/datahub/ingestion/api/registry.py", line 127, in _ensure_not_lazy
        plugin_class = import_path(path)
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/datahub/ingestion/api/registry.py", line 57, in import_path
        item = importlib.import_module(module_name)
      File "/usr/bin/lib/python3.8/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 843, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/datahub/ingestion/source/redshift/redshift.py", line 41, in <module>
        from datahub.ingestion.source.redshift.lineage import RedshiftLineageExtractor
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/datahub/ingestion/source/redshift/lineage.py", line 11, in <module>
        from sqllineage.runner import LineageRunner
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/sqllineage/__init__.py", line 41, in <module>
        _monkey_patch()
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/sqllineage/__init__.py", line 35, in _monkey_patch
        _patch_updating_lateral_view_lexeme()
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/sqllineage/__init__.py", line 24, in _patch_updating_lateral_view_lexeme
        if regex("LATERAL VIEW EXPLODE(col)"):
    TypeError: 'str' object is not callable
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/tmp/interpreter-input-ba4237f7-4df7-4ccb-8375-0d65c6af6170.tmp", line 4, in <module>
        pipeline = Pipeline.create(
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 334, in create
        return cls(
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 220, in __init__
        source_class = source_registry.get(source_type)
      File "/usr/bin/lib/python3.8/contextlib.py", line 131, in __exit__
        self.gen.throw(type, value, traceback)
      File "/usr/share/tomcat8/.local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 122, in _add_init_error_context
        raise PipelineInitError(f"Failed to {step}: {e}") from e
    datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type redshift: 'str' object is not callable
    Script failed with status: 1
    Below is the code :
    Copy code
    from datahub.ingestion.run.pipeline import Pipeline
    
    # The pipeline configuration is similar to the recipe YAML files provided to the CLI tool.
    pipeline = Pipeline.create(
        {
            "source": {
                "type": "redshift",
                "config": {
                    "username": "username",
                    "password": "password",
                    "database": "db",
                    "host_port": "host:5439",
                    "default_schema":"schema"
                },
            },
            "sink": {
                "type": "datahub-rest",
                "config": {
                  "server": "<http://host/api/gms>",
                  "token" : "token"
                },
            },
        }
    )
    
    # Run the pipeline and report the results.
    pipeline.run()
    pipeline.pretty_print_summary()
    ✅ 1
    c
    s
    • 3
    • 2
  • n

    nutritious-salesclerk-57675

    05/31/2023, 6:40 AM
    Good day. I am trying to use a shared ES instance for my datahub deployment. Before we proceed, I would like to check out the list of indices that datahub uses so that we can ensure we dont have a conflicting index already present in our current ES instance. Can someone point me to the resource?
    d
    b
    • 3
    • 6
  • r

    rich-policeman-92383

    05/31/2023, 11:57 AM
    Hello Team We are using postgres as the datastore for datahub. Recently we have started seeing below errors: On searching online we have found that we either need to reindex the pg_toast table or delete the corrupted rows. We tried reindexing the pg_toast but it did not help. Is there a way to restore postgres data from ES data. datahub version: v0.9.6.1 Error:
    Copy code
    ERROR: missing chunk number 0 for toast value 734921 in pg_toast_83651
    o
    • 2
    • 1
  • w

    wide-kilobyte-73035

    05/31/2023, 11:59 AM
    Hello! When I connect to the UI (datahub-frontend url), I connect intermittently and slowly. (about 20 s : Loading 000... [ex: Loading tokens...] ) As a result of checking with the developer tool, what is the cause of the delay in the initial connection? It is currently configured as NLB node port. In the logs, there is no record of the mentioned part. (kubectl logs {datahub-frontend pod name}) If you need more information, please leave a comment. please help! thanks
    o
    • 2
    • 5
  • e

    elegant-nightfall-29115

    05/31/2023, 11:30 PM
    Hey Team, just want to bring this back to your attention. The first solution came with unwanted consequences of not being able to login lol so hoping we can find another solution https://datahubspace.slack.com/archives/CV2UVAPPG/p1684274714333849
    m
    a
    c
    • 4
    • 5
  • c

    chilly-boots-22585

    06/01/2023, 9:36 AM
    Hi Datahub support Team. I am using below yaml for source and sink for data ingestion but i am having issue as failed.
    Copy code
    source:
        type: starburst-trino-usage
        config:
            host_port: '<http://datamesh.conest.com:443|datamesh.conest.com:443>'
            database: tpch
            username: ds-starburst
            include_views: true
            include_tables: true
            profiling:
                enabled: true
                profile_table_level_only: true
            stateful_ingestion:
                enabled: true
            password: '${starburst-trino-cred}'
    sink:
        type: datahub-rest
        config:
            server: 'datahub-datahub-gms:8080'
    I am receiving this error datahub.ingestion.run.pipeline.PipelineInitError: Failed to set up framework context: Failed to instantiate a valid DataHub Graph instance One more thing is that i have a gms running with ALB endpoint so what should be the gms value in sink ? it should be like above one or "http://datahub-datahub-gms.svc.cluster.local:8080" or datahub-datahub-gms LoadBalancer 10.100.85.45 a6af33dc651074b21-608523110.eu-west-1.elb.amazonaws.com 808030858/TCP,431830170/TCP 37h
    a
    • 2
    • 1
  • p

    proud-dusk-671

    06/01/2023, 11:04 AM
    Hey team, I am trying to deploy Datahub through helm on K8s. I wanted to know how can I disable the datahub default admin user. It is a security concern after all
    c
    a
    • 3
    • 8
  • p

    purple-salesmen-12745

    06/01/2023, 4:02 PM
    What is your actual cost deploying Datahub on AWS ?
    ✅ 1
    d
    • 2
    • 1
  • i

    important-vegetable-80842

    06/01/2023, 5:40 PM
    https://github.com/datahub-project/datahub/issues/8133 About a month ago I was able to quickstart and test ingestion successfully as part of a broader project. I moved on and had to repurpose the original VM. Last week I tried to reproduce my steps and quickstart is consistently failing for me (see screenshot)
    a
    • 2
    • 4
  • i

    important-vegetable-80842

    06/01/2023, 5:41 PM
    As you can see, I specified the version, and the container “Creating” phase just sits and spins indefinitely.
  • i

    important-vegetable-80842

    06/01/2023, 5:41 PM
    Any advice would be greatly appreciated!
  • f

    future-table-91845

    06/01/2023, 7:02 PM
    Hello Support Team - Is there a way to enforce SSL for connections originating from DataHub
  • f

    future-table-91845

    06/01/2023, 7:04 PM
    Like I want that all ODBC/JDBC connections from Data Hub to Data Sources (MySQL, Snowflake etc.) should be secure. My InfoSec Team is asking to enforce a policy at DataHub Platform (Source DB Connections already enforce the policy).
    g
    • 2
    • 1
  • c

    creamy-ram-28134

    06/02/2023, 4:34 PM
    Hi Team - I am getting this error while deploying the new version of datahub - ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2 does anyone know how to fix this ?
    a
    • 2
    • 2
  • c

    creamy-van-28626

    06/02/2023, 5:13 PM
    Hey team, I have created a custom action which is running through action pipeline (yaml file as a configmap) and mount on /etc/datahub/actions/conf . Does we need to run datahub actions - c config.yaml command in order run this action or it should automatically run as mentioned in start.sh file
    a
    • 2
    • 1
  • m

    mysterious-table-75773

    06/04/2023, 7:46 PM
    Hey team, we are using datahub but I have few questions: • is there a way to deploy datahub without elasticsearch? from the architecture I don’t think we use that feature in datahub, all we do is connect to db to see fetched data, which is only
    SELECT
    • does elasticsearch, kafka, postgres, system-update-job and no-code-migrations, cleanup job, restore indices job are mandatory as well, why? • with version v0.10.3 I have removed schema-registry from deployment and using kafka configs, is there something I should know doing this move?
    a
    • 2
    • 3
  • i

    icy-caravan-72551

    06/05/2023, 1:53 PM
    Trying to deploy a basic demo of datahub using docker and the quickstart. which version of docker should I use? Desktop or engine? is there a recommended linux OS for docker+datahub? the quickstart links to a docker desktop page https://datahubproject.io/docs/quickstart/
    a
    • 2
    • 1
  • s

    steep-doctor-17127

    06/05/2023, 8:55 PM
    Hello team , I want to add this bellow ==== security.protocol=SASL_SSL sasl.mechanism=AWS_MSK_IAM sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required; sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler === to value.yaml How can I do this? Thanks in advance
    a
    • 2
    • 4
  • s

    steep-doctor-17127

    06/05/2023, 9:43 PM
    kafka: ====== I want to add this part ===== # client: # properties: # security.protocol: "SASL_SSL" # sasl.mechanism: "AWS_MSK_IAM" # sasl.jaas.config: software.amazon.msk.auth.iam.IAMLoginModule required; # sasl.client.callback.handler.class: software.amazon.msk.auth.iam.IAMClientCallbackHandler bootstrap: server: "b-2.elantestdatahubnew.q54zfo.c21.kafka.us-east-1.amazonaws.com:9098,b-1.elantestdatahubnew.q54zfo.c21.kafka.us-east-1.amazonaws.com:9098,b-3.elantestdatahubnew.q54zfo.c21.kafka.us-east-1.amazonaws.com:9098" zookeeper: server: "z-1.elantestdatahubnew.q54zfo.c21.kafka.us-east-1.amazonaws.com:2181,z-2.elantestdatahubnew.q54zfo.c21.kafka.us-east-1.amazonaws.com:2181,z-3.elantestdatahubnew.q54zfo.c21.kafka.us-east-1.amazonaws.com:2181" topics: #metadata_change_event_name: "MetadataChangeEvent_v4" # failed_metadata_change_event_name: "FailedMetadataChangeEvent_v4" # metadata_audit_event_name: "MetadataAuditEvent_v4" # datahub_usage_event_name: "DataHubUsageEvent_v1" # metadata_change_proposal_topic_name: "MetadataChangeProposal_v1" # failed_metadata_change_proposal_topic_name: "FailedMetadataChangeProposal_v1" # metadata_change_log_versioned_topic_name: "MetadataChangeLog_Versioned_v1" # metadata_change_log_timeseries_topic_name: "MetadataChangeLog_Timeseries_v1" # platform_event_topic_name: "PlatformEvent_v1" # datahub_upgrade_history_topic_name: "DataHubUpgradeHistory_v1" metadata_change_event_name: "_schemas" schemaregistry: type: AWS_GLUE glue: region: "us-east-1" registry: noor-new partitions: 3 replicationFactor: 3
    a
    • 2
    • 6
  • s

    steep-doctor-17127

    06/05/2023, 9:43 PM
    image.png
  • s

    stocky-plumber-3084

    06/06/2023, 4:47 AM
    Hi team, I kept getting the "Network is unreachable" error when trying to ingest BigQuery. My company uses proxy for outbound connection. I've set proxy in both linux os and docker and tested the connection fine. But still can't reach out to BQ in Datahub. Does BQ connector support the use of proxy? I've even included this line in the docker-compose file for the container datahub-frontend-react and still doesn't work:
    "JAVA_OPTS=-Xms512m -Xmx512m -Dhttp.port=9002 -Dhttp.proxyHost=<IP> -Dhttp.proxyPort=<port> -Dhttps.proxyHost=<IP> -Dhttps.proxyPort=<port> -Dhttp.nonProxyHosts=localhost -Dconfig.file=datahub-frontend/conf/application.conf -Djava.security.auth.login.config=datahub-frontend/conf/jaas.conf -Dlogback.configurationFile=datahub-frontend/conf/logback.xml -Dlogback.debug=false -Dpidfile.path=/dev/null"
    BQ ingestion error:
    The error was: Deadline of 600.0s exceeded while calling target function, last exception: HTTPSConnectionPool(host='<http://oauth2.googleapis.com|oauth2.googleapis.com>', port=443): Max retries exceeded with url: /token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f3e9cbd7310>: Failed to establish a new connection: [Errno 101] Network is unreachable')) Stacktrace: Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/site-packages/urllib3/connection.py\", line 174, in _new_conn\n    conn = connection.create_connection(\n  File \"/usr/local/lib/python3.10/site-packages/urllib3/util/connection.py\", line 95, in create_connection\n    raise err\n  File \"/usr/local/lib/python3.10/site-packages/urllib3/util/connection.py\", line 85, in create_connection\n    sock.connect(sa)\nOSError: [Errno 101] Network is unreachable\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py\", line 703, in urlopen\n    httplib_response = self._make_request(\n  File \"/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py\", line 386, in _make_request\n    self._validate_conn(conn)\n  File \"/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py\", line 1042, in _validate_conn\n    conn.connect()\n  File \"/usr/local/lib/python3.10/site-packages/urllib3/connection.py\", line 363, in connect\n    self.sock = conn = self._new_conn()\n  File \"/usr/local/lib/python3.10/site-packages/urllib3/connection.py\", line 186, in _new_conn\n    raise NewConnectionError(\nurllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f3e9cbd7310>: Failed to establish a new connection: [Errno 101] Network is unreachable\n\n
    (edited)
    a
    • 2
    • 2
  • b

    brief-afternoon-9651

    06/06/2023, 4:58 AM
    Hello, Can I deploy DataHub with Docker, if Open MetaData (another Data Cataloging Tool) is already deployed with Docker. Open MetaData has four containers (MySQL, Elasticsearch, Apache Airflow, OpenMetadata UI). Will there be any problem when deploying DataHub as it also has MySQL and ElasticSearch as containers?
    c
    b
    • 3
    • 8
  • s

    shy-dog-84302

    06/06/2023, 5:29 AM
    Hi! I am experiencing the following problem with DataHub cleanup job(logs in 🧵) which is deployed in k8s through Helm charts on version 0.10.3
    a
    b
    • 3
    • 6
  • c

    calm-scientist-99377

    06/06/2023, 6:53 AM
    Hi Team, I am configuring OAuth for Datahub. The frontend is routing to the SSO page but after authentication, its not redirecting back to frontend. I am setting this up in k8s. I used the extra env -
    Copy code
    extraEnvs: # []
      # - name: MY_ENVIRONMENT_VAR
      #   value: the_value_goes_here
      - name: AUTH_OIDC_ENABLED
        value: "true"
      - name: AUTH_OIDC_CLIENT_ID
        value: "****"
      - name: AUTH_OIDC_CLIENT_SECRET
        value: "***"
      - name: AUTH_OIDC_DISCOVERY_URI
        value: "https://***/v1/.well-known/openid-configuration"
      - name: AUTH_OIDC_BASE_URL
        value: "<https://localhost:9092>"
        # your-datahub-url
      - name: AUTH_OIDC_SCOPE
        value: "openid profile email groups"
    After login, I end up here -
    Copy code
    <https://localhost:9092/callback/oidc?code=rl2nxdhcoasxpbwdw4na2gqwb&state=c56dbb3953>
    ✅ 2
    q
    b
    • 3
    • 13
  • b

    best-river-568

    06/06/2023, 9:46 AM
    Hi Team, While setting up datahub we haven't enabled datahub-upgrade jobs. Now, We are planning to upgrade datahub version from 0.9.2 to 0.10.2. It seems like, datahub upgrade is mandatory now. Can some one help here to understand how are we going to provision/deploy these components ?
    a
    b
    b
    • 4
    • 4
  • m

    magnificent-honey-40185

    06/06/2023, 12:58 PM
    Deployed Datahub, but getting the below errors. Would it be possible to point me in the right direction?
    a
    b
    +3
    • 6
    • 22
  • c

    calm-scientist-99377

    06/06/2023, 5:35 PM
    Hi Team, I have deployed datahub using helm charts with OIDC integration. How can we specify initialAdmins via the helm chart.
    a
    b
    • 3
    • 5
  • a

    alert-traffic-45034

    06/07/2023, 1:53 AM
    hi Team, may I know whether it’s possible to disable kafka as component in the helm chart when deployment? Currently, in the use case where I am working on, low-latency update / async request is not necessary, I wonder whether it’s possible to drop the entire kafka part. Cheers
    plus1 1
    ✅ 1
    d
    • 2
    • 2
  • f

    fierce-agent-11572

    06/07/2023, 9:00 AM
    Hello how can i deploy datahub with Docker compose ( not with datahub quickstart command please) when i use the cmd :
    Copy code
    datahub docker quickstart --quickstart-compose-file quickstart/docker-compose.quickstart.yml
    the front doesn't start
    ✅ 1
    b
    b
    h
    • 4
    • 16
  • l

    lemon-scooter-69730

    06/07/2023, 1:00 PM
    Can we please add a concurrencyPolicy to this cron template: e.g.
    Copy code
    spec:
      ...
      concurrencyPolicy: Replace
    d
    b
    +2
    • 5
    • 8
1...454647...53Latest