https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • m

    microscopic-mechanic-13766

    05/04/2022, 10:03 AM
    Hi, I am trying to create the timeline of a dataset called "linaje". Firstly I ingested the dataset and then manually changed its schema. Then I executed the following command but obtained this error:
    datahub timeline --urn "urn:li:dataset:(urn:li:dataPlatform:hive,nasalogs.linaje,PROD)" --category documentation --start 10daysago
    Using DH v0.8.33 and CLI v0.8.32.1
    Timeline log.txt
    h
    o
    • 3
    • 8
  • i

    icy-portugal-26250

    05/04/2022, 10:40 AM
    Hello! I've been trying to locally deploy a datahub instance using the
    sh docker/quiickstart.sh
    , but I couldn't log in in the front end with the standard 'datahub' user (invalid credentials). When checking the
    docker ps
    status, I notice that the
    linkedin/datahub-gms
    container was unhealthy I sshed into the container to run manually the entrypoint command and get this error message:
    Copy code
    /datahub/datahub-gms/scripts $ ./start.sh
    + grep -q ://
    + echo
    + NEO4J_HOST=http://
    + [[ ! -z '' ]]
    + [[ -z '' ]]
    + ELASTICSEARCH_AUTH_HEADER='Accept: */*'
    + [[ '' == true ]]
    + ELASTICSEARCH_PROTOCOL=http
    + WAIT_FOR_EBEAN=
    + [[ '' != true ]]
    + [[ '' == ebean ]]
    + [[ -z '' ]]
    + WAIT_FOR_EBEAN=' -wait <tcp://mysql:3306> '
    + WAIT_FOR_CASSANDRA=
    + [[ '' == cassandra ]]
    + WAIT_FOR_KAFKA=
    + [[ '' != true ]]
    ++ echo broker:29092
    ++ sed 's/,/ -wait tcp:\/\//g'
    + WAIT_FOR_KAFKA=' -wait <tcp://broker:29092> '
    + WAIT_FOR_NEO4J=
    + [[ elasticsearch != elasticsearch ]]
    + OTEL_AGENT=
    + [[ '' == true ]]
    + PROMETHEUS_AGENT=
    + [[ '' == true ]]
    + COMMON='
         -wait <tcp://mysql:3306>            -wait <tcp://broker:29092>           -timeout 240s     java -Xms1g -Xmx1g                -jar /jetty-runner.jar     --jar jetty-util.jar     --jar jetty-jmx.jar     --config /datahub/datahub-gms/scripts/jetty.xml     /datahub/datahub-gms/bin/war.war'
    + [[ '' != true ]]
    + exec dockerize -wait <http://elasticsearch:9200> -wait-http-header 'Accept: */*' -wait <tcp://mysql:3306> -wait <tcp://broker:29092> -timeout 240s java -Xms1g -Xmx1g -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war
    2022/05/04 09:48:10 Waiting for: <http://elasticsearch:9200>
    2022/05/04 09:48:10 Waiting for: <tcp://mysql:3306>
    2022/05/04 09:48:10 Waiting for: <tcp://broker:29092>
    2022/05/04 09:48:10 Connected to <tcp://mysql:3306>
    2022/05/04 09:48:10 Connected to <tcp://broker:29092>
    2022/05/04 09:48:10 Received 200 from <http://elasticsearch:9200>
    2022-05-04 09:48:13.645:INFO::main: Logging initialized @2896ms to org.eclipse.jetty.util.log.StdErrLog
    WARNING: jetty-runner is deprecated.
             See Jetty Documentation for startup options
             <https://www.eclipse.org/jetty/documentation/>
    ERROR: No such jar file:///datahub/datahub-gms/scripts/jetty-util.jar
    Usage: java [-Djetty.home=dir] -jar jetty-runner.jar [--help|--version] [ server opts] [[ context opts] context ...]
    Server opts:
     --version                           - display version and exit
     --log file                          - request log filename (with optional 'yyyy_mm_dd' wildcard
     --out file                          - info/warn/debug log filename (with optional 'yyyy_mm_dd' wildcard
     --host name|ip                      - interface to listen on (default is all interfaces)
     --port n                            - port to listen on (default 8080)
     --stop-port n                       - port to listen for stop command (or -DSTOP.PORT=n)
     --stop-key n                        - security string for stop command (required if --stop-port is present) (or -DSTOP.KEY=n)
     [--jar file]*n                      - each tuple specifies an extra jar to be added to the classloader
     [--lib dir]*n                       - each tuple specifies an extra directory of jars to be added to the classloader
     [--classes dir]*n                   - each tuple specifies an extra directory of classes to be added to the classloader
     --stats [unsecure|realm.properties] - enable stats gathering servlet context
     [--config file]*n                   - each tuple specifies the name of a jetty xml config file to apply (in the order defined)
    Context opts:
     [[--path /path] context]*n          - WAR file, web app dir or context xml file, optionally with a context path
    2022/05/04 09:48:13 Command exited with error: exit status 1
    Does anyone have any tips on how to troubleshoot that? PS. while writing this message, I also tried through the
    datahub docker quickstart
    (which most likely differs from my local version), but got issues with the datahub-gms too:
    Copy code
    Unable to run quickstart - the following issues were detected:
    - datahub-gms is still starting
    I am attaching the latter command log.
    datahub-cli-output.log
    • 1
    • 1
  • h

    handsome-football-66174

    05/04/2022, 9:24 PM
    Hi Everyone, Trying to use Great expectation for Data Validation. The checkpoint runs, but the Validations are not getting displayed in Datahub. Anything that I need to take care of expect the Datahub Action Added this is in checkpoint configuration
    Copy code
    - name: datahub_action
      action:
        module_name: datahub.integrations.great_expectations.action
        class_name: DataHubValidationAction
        server_url: <https://hostname>
    Getting message when checkpoint runs
    Copy code
    great_expectations checkpoint run postgres_checkpoint
    Using v3 (Batch Request) API
    Calculating Metrics: 0it [00:00, ?it/s]
    WARNING: Enable parse_table_names_from_sql in DatahubValidationAction config              to try to parse the tables being asserted from SQL query
    Validation succeeded!
    
    Suite Name                  Status   Expectations met
    - public.tablename.suite      ✔ Passed  0 of 0 (100 %)
    h
    h
    +2
    • 5
    • 18
  • b

    bland-orange-13353

    05/05/2022, 12:17 PM
    This message was deleted.
    ✅ 1
    m
    n
    • 3
    • 3
  • k

    kind-psychiatrist-76973

    05/05/2022, 2:00 PM
    We are stateful ingestion enabled. However, we would like to not show the
    DEV
    database on datahub but it is still showing:
    Copy code
    pipeline_name: "snowflake_platform"
    source:
      type: snowflake
      config:
        # Coordinates
        host_port: ${SNOWFLAKE_ACCOUNT}
        warehouse: "AGGREGATION_COMPUTE"
    
        # Credentials
        username: ${SNOWFLAKE_USERNAME}
        password: ${SNOWFLAKE_PASSWORD}
        role: "ACCOUNTADMIN"
    
        env: "PROD"
    
        profiling:
          enabled: False
    
        database_pattern:
          allow:
            - "SENNDERDWH"
            - "VISIBILITY"
            - "CARRIER_STRATEGY_AND_PLANNING"
            - "SHIPPER_STRATEGY_AND_PLANNING"
            - "NETSUITE"
            - "MARKETING"
            - "GLOBAL_OPERATIONS"
            - "CENTRAL_STRATEGY_AND_PLANNING"
            - "FINANCE"
          deny:
            - "DEV"
            - "ANALYST_DEV"
    
        table_pattern:
          ignoreCase: False
    
        include_tables: True
        include_views: True
        include_table_lineage: False
        stateful_ingestion:
          enabled: True
          remove_stale_metadata: True
    
    sink:
      type: "datahub-rest"
      config:
        server: ${DATAHUB_GMS_HOST}
    Are we doing something wrong?
    h
    • 2
    • 1
  • l

    lemon-terabyte-66903

    05/05/2022, 2:08 PM
    Is there a guide on upgrading datahub running on kubernetes?
    h
    b
    • 3
    • 5
  • s

    square-solstice-69079

    05/05/2022, 4:59 PM
    My Glue ingestion is not working, do you have any idea what could be wrong? It worked some weeks ago. Upgraded to 8.33 after it worked, but not sure if that is the issue. It found some tables, but fails with 401 Unauthorized.
    h
    • 2
    • 10
  • n

    numerous-eve-42142

    05/05/2022, 6:49 PM
    Hi! I'm trying to ingest some metadata fro redshift to datahub. But 2 things are happening: 1. When I try to allow 55 tables and profilings from my db, the task (on airflow) take several minutes and i receive different erro logs like:
    Copy code
    Pipeline finished with failures
    'downstream_total_latency_in_seconds': 9783.199817}
    'info': {'message': "('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))"}}],
    failures': [{'error': 'Unable to emit metadata to DataHub GMS',
    'warnings': [],
    {'records_written': 106,
    Sink (datahub-rest) report:
                         'query_exceptions': 3}}
                         'queries_combined': 54,
                         'combined_queries_issued': 54,
                         'uncombined_queries_issued': 57,
      'query_combiner': {'total_queries': 108,
      'soft_deleted_stale_entities': [],
    ...
    2. I can't find a way to ingest exclusively 1 table's profiling. That's why: Allowing like:
    Copy code
    profile_pattern:
          allow:
            - "^db.schema.table\$"
    Column level profiling is filtered Allowing like:
    Copy code
    profile_pattern:
          allow:
            - "db.schema.table"
    Other tables with names like "table" are profiled and they're not in datahub. Someone can save me?
    h
    • 2
    • 4
  • w

    witty-laptop-49489

    05/06/2022, 7:30 AM
    Hi, we have DataHub version v0.8.34 deployed on Kubernetes. Why might the “Analytics” tab not work? Except the “Analytics” tab everything works well.
    h
    g
    h
    • 4
    • 5
  • n

    numerous-camera-74294

    05/06/2022, 10:23 AM
    hi! is there any way to remove a particular profile run from a dataset? maybe using the restli api?
    h
    h
    • 3
    • 4
  • i

    icy-portugal-26250

    05/06/2022, 12:29 PM
    hi! Some of the gradle tasks are failing because they can't resolve a jar from the linkedin's artifactory: linkedin.jfrog.io Have you have ever experienced such kind of failures?
    Copy code
    > Could not resolve com.linkedin.pegasus:gradle-plugins:28.3.7.
             > Could not get resource '<https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/gradle-plugins/28.3.7/gradle-plugins-28.3.7.pom>'.
                > Could not GET '<https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/gradle-plugins/28.3.7/gradle-plugins-28.3.7.pom>'.
                   > Connect to <http://linkedin.jfrog.io:443|linkedin.jfrog.io:443> [<http://linkedin.jfrog.io/104.198.68.46|linkedin.jfrog.io/104.198.68.46>] failed: connect timed out
    h
    o
    • 3
    • 4
  • m

    millions-notebook-72121

    05/06/2022, 3:13 PM
    Hi all - I am updating Datahub using the helm charts, I am having some issues with the dependencies as per the screenshots. I think the subcharts versions may be slightly off? for example for gms it's 0.2.4 in the subchart but 0.2.5 in the main chart?
    h
    b
    • 3
    • 2
  • c

    clean-coat-28016

    05/07/2022, 5:29 AM
    I am using the GraphiQL interface on datahub v0.8.29. Please see query snippet in screenshot. As expected, the query returned 3 "groups" and 0 entities. But, data.browse.count in the result was 10. Is that correct? From my understanding, it should be 3 because the result contained just 3 groups.
    b
    • 2
    • 1
  • m

    modern-zoo-97059

    05/09/2022, 2:15 AM
    1. Is there any way to intialize datahub mysql db?? 🤔 2. or I wanna know starting point when below table(datahub.metadata_aspect_v2) create.. 🤪
    Copy code
    Caused by: org.mariadb.jdbc.internal.util.exceptions.MariaDbSqlException: Table 'datahub.metadata_aspect_v2' doesn't exist
    b
    • 2
    • 5
  • b

    breezy-portugal-43538

    05/09/2022, 7:38 AM
    Hello, I wanted to report an issue with running the
    datahub get --urn
    command and retrieving urn name in general. I was able to ingest data to datahub but after clicking the icon on a website to get urn name, nothing is copied and the button does not seem to take any effect. I had tested this on three different web browsers. After trying to get the urn name by running the
    datahub get --urn
    command I receive following error:
    Copy code
    $ datahub get --urn "urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/some_data/results/data_info.csv,DEV)"
    /home/mluser/.local/lib/python3.8/site-packages/cryptography/hazmat/backends/openssl/x509.py:14: CryptographyDeprecationWarning: This version of cryptography contains a temporary pyOpenSSL fallback path. Upgrade pyOpenSSL now.
      warnings.warn(
    [2022-05-09 10:24:11,560] ERROR    {datahub.entrypoints:152} - File "/home/mluser/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 138, in main
        135  def main(**kwargs):
        136      # This wrapper prevents click from suppressing errors.
        137      try:
    --> 138          sys.exit(datahub(standalone_mode=False, **kwargs))
        139      except click.exceptions.Abort:
        ..................................................
         kwargs = {}
         datahub = <Group datahub>
         click.exceptions.Abort = <class 'click.exceptions.Abort'>
        ..................................................
    
    File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
        1135  def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:
     (...)
    --> 1137      return self.main(*args, **kwargs)
        ..................................................
         self = <Group datahub>
         args = ()
         t.Any = typing.Any
         kwargs = {'standalone_mode': False}
        ..................................................
    
    File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1062, in main
        rv = self.invoke(ctx)
    File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
    File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 763, in invoke
        return __callback(*args, **kwargs)
    File "/home/mluser/.local/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
    File "/home/mluser/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 304, in wrapper
        247  def wrapper(*args: Any, **kwargs: Any) -> Any:
     (...)
        300                  "status": "error",
        301                  "error": get_full_class_name(e),
        302              },
        303          )
    --> 304          raise e
        ..................................................
         args = (<click.core.Context object at 0x7f82fa591940>, )
         Any = typing.Any
         kwargs = {'urn': 'urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/s
                   ome_data/results/data_info.csv,DEV)',
                   'aspect': ()}
        ..................................................
    
    File "/home/mluser/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 256, in wrapper
        247  def wrapper(*args: Any, **kwargs: Any) -> Any:
     (...)
        252      telemetry_instance.ping(
        253          "function-call", {"function": function, "status": "start"}
        254      )
        255      try:
    --> 256          res = func(*args, **kwargs)
        257          telemetry_instance.ping(
        ..................................................
         args = (<click.core.Context object at 0x7f82fa591940>, )
         Any = typing.Any
         kwargs = {'urn': 'urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/s
                   ome_data/results/data_info.csv,DEV)',
                   'aspect': ()}
         telemetry_instance.ping = <method 'Telemetry.ping' of <datahub.telemetry.telemetry.Telemetry object at 0x7f82eca5eb20> telemetry.py:201>
         function = 'datahub.cli.get_cli.get'
         func = <function 'get' get_cli.py:14>
        ..................................................
    
    File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/get_cli.py", line 38, in get
        25   def get(ctx: Any, urn: Optional[str], aspect: List[str]) -> None:
     (...)
        34           urn = ctx.args[0]
        35           logger.debug(f"Using urn from args {urn}")
        36       click.echo(
        37           json.dumps(
    --> 38               get_aspects_for_entity(entity_urn=urn, aspects=aspect, typed=False),
        39               sort_keys=True,
        ..................................................
         get = <Command get>
         ctx = <click.core.Context object at 0x7f82fa591940>
         Any = typing.Any
         urn = 'urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/s
                   ome_data/results/data_info.csv,DEV)'
         Optional = typing.Optional
         aspect = ()
         List = typing.List
         ctx.args = []
         logger.debug = <method 'Logger.debug' of <Logger datahub.cli.get_cli (INFO)> __init__.py:1424>
         json.dumps = <function 'dumps' __init__.py:183>
        ..................................................
    
    File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/cli_utils.py", line 658, in get_aspects_for_entity
        648  def get_aspects_for_entity(
        649      entity_urn: str,
        650      aspects: List[str],
        651      typed: bool = False,
        652      cached_session_host: Optional[Tuple[Session, str]] = None,
        653  ) -> Dict[str, Union[dict, DictWrapper]]:
        654      # Process non-timeseries aspects
        655      non_timeseries_aspects: List[str] = [
        656          a for a in aspects if a not in timeseries_class_to_aspect_name_map.values()
        657      ]
    --> 658      entity_response = get_entity(
        659          entity_urn, non_timeseries_aspects, cached_session_host
        ..................................................
         entity_urn = 'urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/s
                   ome_data/results/data_info.csv,DEV)'
         aspects = ()
         List = typing.List
         typed = False
         cached_session_host = None
         Optional = typing.Optional
         Tuple = typing.Tuple
         Session = <class 'requests.sessions.Session'>
         Dict = typing.Dict
         Union = typing.Union
         DictWrapper = <class 'avrogen.dict_wrapper.DictWrapper'>
         non_timeseries_aspects = []
        ..................................................
    
    File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/cli_utils.py", line 508, in get_entity
        492  def get_entity(
        493      urn: str,
        494      aspect: Optional[List] = None,
        495      cached_session_host: Optional[Tuple[Session, str]] = None,
        496  ) -> Dict:
     (...)
        504          encoded_urn: str = urn
        505      elif urn.startswith("urn:"):
        506          encoded_urn = urllib.parse.quote(urn)
        507      else:
    --> 508          raise Exception(
        509              f"urn {urn} does not seem to be a valid raw (starts with urn:) or encoded urn (starts with urn%3A)"
        ..................................................
         urn = 'urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/s
                   ome_data/results/data_info.csv,DEV)'
         aspect = []
         Optional = typing.Optional
         List = typing.List
         cached_session_host = None
         Tuple = typing.Tuple
         Session = <class 'requests.sessions.Session'>
         Dict = typing.Dict
         urllib.parse.quote = <function 'quote' parse.py:799>
        ..................................................
    
    ---- (full traceback above) ----
    File "/home/mluser/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 138, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
    File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
        return self.main(*args, **kwargs)
    File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1062, in main
        rv = self.invoke(ctx)
    File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
    File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 763, in invoke
        return __callback(*args, **kwargs)
    File "/home/mluser/.local/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
    File "/home/mluser/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 304, in wrapper
        raise e
    File "/home/mluser/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 256, in wrapper
        res = func(*args, **kwargs)
    File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/get_cli.py", line 38, in get
        get_aspects_for_entity(entity_urn=urn, aspects=aspect, typed=False),
    File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/cli_utils.py", line 658, in get_aspects_for_entity
        entity_response = get_entity(
    File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/cli_utils.py", line 508, in get_entity
        raise Exception(
    
    Exception: urn urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/some_data/results/data_info.csv,DEV) does not seem to be a valid raw (starts with urn:) or encoded urn (starts with urn%3A)
    [2022-05-09 10:24:11,561] INFO     {datahub.entrypoints:161} - DataHub CLI version: 0.8.31.6 at /home/mluser/.local/lib/python3.8/site-packages/datahub/__init__.py
    [2022-05-09 10:24:11,561] INFO     {datahub.entrypoints:164} - Python version: 3.8.10 (default, Nov 26 2021, 20:14:08)
    [GCC 9.3.0] at /usr/bin/python3 on Linux-5.4.0-96-generic-x86_64-with-glibc2.29
    [2022-05-09 10:24:11,561] INFO     {datahub.entrypoints:167} - GMS config {}
    The gms logs after running
    docker logs -f datahub-gms
    does not update when I am either clicking the get urn running the website button or when running the command mentioned above. There are no logs pointing to any error. This issue was mentioned earlier in my older threads and @hundreds-photographer-13496 was investigating the server logs that I had provided (by the way huge thanks for help!). Could you take a look and help resolve this issue or give some steps on how to get more info about the problem? Thanks in advance!
    h
    o
    h
    • 4
    • 15
  • w

    worried-motherboard-80036

    05/09/2022, 4:57 PM
    Hi, I am trying to ingest some data from an elasticsearch 7.x cluster. I'm running datahub locally, and ingesting the following recipe:
    Copy code
    source:
      type: "elasticsearch"
      config:
        # Coordinates
        host: '<https://the_host:9200>'
        # Credentials
        username: the-user
        password: the-pass
        ca_certs: False
        verify_certs: False
        # Options
        # url_prefix: "" # optional url_prefix
        env: "DEV"
        # index_pattern:
          # allow: [".*some_index_name_pattern*"]
          # deny: [".*skip_index_name_pattern*"]
    sink:
        type: "datahub-rest"
        config:
            server: "<http://localhost:8080>"
    Running the ingestion I get:
    Copy code
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/source/elastic_search.py", line 359, in _extract_mcps
        340  def _extract_mcps(self, index: str) -> Iterable[MetadataChangeProposalWrapper]:
     (...)
        355      # 1.1 Generate the schema fields from ES mappings.
        356      index_mappings = raw_index_metadata["mappings"]
        357      index_mappings_json_str: str = json.dumps(index_mappings)
        358      md5_hash = md5(index_mappings_json_str.encode()).hexdigest()
    --> 359      schema_fields = list(
        360          ElasticToSchemaFieldConverter.get_schema_fields(index_mappings)
        ..................................................
         self = ElasticsearchSource(ctx=<datahub.ingestion.api.common.PipelineContext object at 0x13026d280>)
         index = '.signals_watches_trigger_state'
         Iterable = typing.Iterable
         MetadataChangeProposalWrapper = <class 'datahub.emitter.mcp.MetadataChangeProposalWrapper'>
         index_mappings = {}
         raw_index_metadata = {'aliases': {},
                               'mappings': {},
                               'settings': {'index': {...}}}
         index_mappings_json_str = '{}'
         json.dumps = <function 'dumps' __init__.py:183>
         md5_hash = '99914b932bd37a50b983c5e7c90ae93b'
        ..................................................
    
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/source/elastic_search.py", line 158, in get_schema_fields
        152  def get_schema_fields(
        153      cls, elastic_mappings: Dict[str, Any]
        154  ) -> Generator[SchemaField, None, None]:
        155      converter = cls()
        156      properties = elastic_mappings.get("properties")
        157      if not properties:
    --> 158          raise ValueError(
        159              f"Missing 'properties' in elastic search mappings={json.dumps(elastic_mappings)}!"
        ..................................................
         cls = <class 'datahub.ingestion.source.elastic_search.ElasticToSchemaFieldConverter'>
         elastic_mappings = {}
         Dict = typing.Dict
         Any = typing.Any
         Generator = typing.Generator
         SchemaField = <class 'datahub.metadata.schema_classes.SchemaFieldClass'>
         converter = <datahub.ingestion.source.elastic_search.ElasticToSchemaFieldConverter object at 0x134858940>
         properties = None
        ..................................................
    
    ---- (full traceback above) ----
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/entrypoints.py", line 149, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
    File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
    File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 1055, in main
        rv = self.invoke(ctx)
    File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
    File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 760, in invoke
        return __callback(*args, **kwargs)
    File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/telemetry/telemetry.py", line 317, in wrapper
        raise e
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/telemetry/telemetry.py", line 269, in wrapper
        res = func(*args, **kwargs)
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/utilities/memory_leak_detector.py", line 102, in wrapper
        res = func(*args, **kwargs)
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/cli/ingest_cli.py", line 128, in run
        raise e
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/cli/ingest_cli.py", line 114, in run
        pipeline.run()
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/run/pipeline.py", line 214, in run
        for wu in itertools.islice(
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/source/elastic_search.py", line 308, in get_workunits
        for mcp in self._extract_mcps(index):
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/source/elastic_search.py", line 359, in _extract_mcps
        schema_fields = list(
    File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/source/elastic_search.py", line 158, in get_schema_fields
        raise ValueError(
    
    ValueError: Missing 'properties' in elastic search mappings={}!
    [2022-05-09 17:25:49,344] INFO     {datahub.entrypoints:176} - DataHub CLI version: 0.0.0.dev0 at /Users/cristicalugaru/Development/data-hub/datahub/metadata-ingestion/src/datahub/__init__.py
    [2022-05-09 17:25:49,344] INFO     {datahub.entrypoints:179} - Python version: 3.8.12 (default, Jan 31 2022, 11:27:11)
    [Clang 13.0.0 (clang-1300.0.27.3)] at /Users/cristicalugaru/.pyenv/versions/my-env/bin/python on macOS-12.0.1-arm64-arm-64bit
    [2022-05-09 17:25:49,344] INFO     {datahub.entrypoints:182} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.34', 'commit': '9422578e419a30231bdb83bd5f4cd42607781942'}}, 'managedIngestion': {'defaultCliVersion': '0.8.34.1', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'noCode': 'true'}
    I see some indexes ingested but not the main ones:
    Copy code
    [2022-05-09 17:25:49,375] INFO     {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.searchguard_resource_owner
    [2022-05-09 17:25:49,424] INFO     {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.searchguard_resource_owner
    [2022-05-09 17:25:49,808] INFO     {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.kibana-event-log-7.13.2
    [2022-05-09 17:25:49,820] INFO     {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.kibana-event-log-7.13.2
    [2022-05-09 17:25:49,835] INFO     {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.kibana-event-log-7.13.2
    [2022-05-09 17:25:49,888] INFO     {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.ds-ilm-history-5-2022.03.10-000001
    [2022-05-09 17:25:49,911] INFO     {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.ds-ilm-history-5-2022.03.10-000001
    [2022-05-09 17:25:49,926] INFO     {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.ds-ilm-history-5-2022.03.10-000001
    Any idea if I'm doing something wrong here?
    h
    • 2
    • 13
  • m

    mammoth-fall-12031

    05/09/2022, 4:59 PM
    I'm trying to run the datahub gms service for local development, the build went successful with the below command
    Copy code
    ./gradlew :metadata-service:war:build
    But when I try to run it using
    Copy code
    ./gradlew :metadata-service:war:run
    The build is getting stuck at 99%. Below is the last few lines of logs
    Copy code
    2022-05-09 21:30:59.653:INFO:oejs.AbstractConnector:main: Started ServerConnector@77afea7d{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
    2022-05-09 21:30:59.666:INFO:oejs.Server:main: Started @12086ms
    2022-05-09 21:34:00.979:WARN:oejs.HttpChannel:qtp580024961-11: /auth/generateSessionTokenForUser
    java.lang.NullPointerException
            at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
            at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
            at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
            at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
            at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
            at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
            at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1700)
            at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
            at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
            at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
            at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
            at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1667)
            at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
            at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
            at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
            at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
            at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
            at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
            at org.eclipse.jetty.server.Server.handle(Server.java:505)
            at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
            at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
            at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
            at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
            at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
            at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
            at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
            at java.lang.Thread.run(Thread.java:748)
    Can anyone help me resolve this?
    h
    o
    • 3
    • 7
  • w

    wide-dawn-46249

    05/09/2022, 5:19 PM
    👋 Hoping someone can help me debug this error message when I try to post a MCP event to GMS 🧵
    👍 1
    o
    b
    • 3
    • 13
  • l

    lively-jackal-83760

    05/10/2022, 10:56 AM
    Hi guys. I'm working with your Java lib datahub-client:0.8.34 Now I want to create and push MCP event with dataset schema, and this schema is stored in Avro .avsc files. I can read this files as string, or convert them to org.apache.avro.Schema object, but don't know how to convert it to Datahub's schema. I found some of your avro-to-pegasus translator https://linkedin.github.io/rest.li/avro_translation tried to use it and seems it's working, but I got conflicts with datahub-client library. io.acryl:datahub-client and com.linkedin.pegasus:data-avro - both have com.linkedin.data module with DataShema class. They look similar, but a little bit different. How I can convert Avro schema to the correct Datahub's schema and push it using your Java lib?
    o
    • 2
    • 3
  • e

    early-vr-295

    05/10/2022, 12:02 PM
    I applied google auth, but the user cannot log out. When the user presses the Logout button, it is always redirected to the login status. And when one user logs in for the first time, he or she logs in with another user’s account.
    m
    • 2
    • 1
  • p

    prehistoric-knife-90526

    05/10/2022, 2:51 PM
    Hi All 👋. New member here and new to DataHub. This looks like an amazing community. I have DataHub deployed in AWS EKS following the Quickstart Guide. It all works great in my dev environment, however I can’t get Redshift metadata ingestion to work in production. After consuming metadata from one table, it just hangs without any errors in the UI or in any of the pod logs (including prerequisites). Can this happen if it is lacking sufficient resources? I’m not certain how to debug this but I have ruled out MySQL and Elasticsearch by moving both to AWS managed services
    d
    • 2
    • 4
  • a

    adorable-receptionist-20059

    05/10/2022, 6:27 PM
    I am trying to run an AWS Glue ingestions job from the UI and getting an error half way through, but don't really understand what's happening. Will attach the error in thread.
    • 1
    • 2
  • m

    modern-zoo-97059

    05/11/2022, 6:58 AM
    Hello Everyone. 🤣 what file shoud i edit host of mysql, elastic search for datahub GMS?
    b
    e
    • 3
    • 26
  • r

    rich-policeman-92383

    05/11/2022, 8:40 AM
    With release v0.8.34 how can we disable cassandra. MAE & MCE are continuously failing while trying to connect cassandra.
    this 1
    m
    • 2
    • 1
  • b

    busy-dusk-4970

    05/11/2022, 1:01 PM
    Good morening, I'm trying to run
    ./gradlew build
    locally on an M1 mac and I'm running into this error
    d
    s
    +3
    • 6
    • 22
  • f

    fresh-napkin-5247

    05/11/2022, 1:38 PM
    Hello. What would be the GraphQL query for me to get all the Published Datasources already ingested from Tableau? Having a hard time getting this from the docs 😄 .Thank you! I am using this endpoint on my local installation to try a bunch of queries
    localhost:8080/api/graphiql
    a
    • 2
    • 17
  • a

    adventurous-apple-98365

    05/11/2022, 3:20 PM
    Hi all - I'm trying to add an aspect for domains to a custom entity. We are using the snapshots so I think I need to associate it via union of aspects. It's working for all the other types (ownership, globaltags, etc) but adding domain fails. I can properly use an MCP to add the domain..but can't seem to get it on the snapshot so it fails in my snapshots gql mapper The
    validateModels
    gradle task fails with “found invalid relationship with name AssociatedWith at path /domains/*. Invalid entityTypes(s) provided” I see that the domains aspect has that relationship in the Pegasus files. Is there something I'm doing wrong to add the domain to my snapshot?
    o
    • 2
    • 1
  • g

    gorgeous-telephone-63628

    05/11/2022, 6:42 PM
    I am trying to build datahub locally but I keep running into an issue. I hoping someone might be able to offer some suggestions. I am building on a Mac with an x86 chip using node 16.15.0, npm 8.5.5, yarn 1.22.15
    Copy code
    > Task :datahub-web-react:yarnGenerate FAILED
    yarn run v1.22.0
    $ graphql-codegen --config codegen.yml
    node:internal/modules/cjs/loader:936
      throw err;
      ^
    
    Error: Cannot find module './_baseClone'
    Require stack:
    - /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/lodash/clone.js
    - /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/builder.js
    - /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/generated/index.js
    - /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/utils/react/cleanJSXElementLiteralChild.js
    - /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/react/buildChildren.js
    - /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/index.js
    - /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/index.cjs.js
    - /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/code-file-loader/index.cjs.js
    - /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-codegen/cli/bin.js
        at Function.Module._resolveFilename (node:internal/modules/cjs/loader:933:15)
        at Function.Module._load (node:internal/modules/cjs/loader:778:27)
        at Module.require (node:internal/modules/cjs/loader:1005:19)
        at require (node:internal/modules/cjs/helpers:94:18)
        at Object.<anonymous> (/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/lodash/clone.js:1:17)
        at Module._compile (node:internal/modules/cjs/loader:1101:14)
        at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
        at Module.load (node:internal/modules/cjs/loader:981:32)
        at Function.Module._load (node:internal/modules/cjs/loader:822:12)
        at Module.require (node:internal/modules/cjs/loader:1005:19) {
      code: 'MODULE_NOT_FOUND',
      requireStack: [
        '/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/lodash/clone.js',
        '/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/builder.js',
        '/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/generated/index.js',
        '/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/utils/react/cleanJSXElementLiteralChild.js',
        '/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/react/buildChildren.js',
        '/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/index.js',
        '/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/index.cjs.js',
        '/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/code-file-loader/index.cjs.js',
        '/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-codegen/cli/bin.js'
      ]
    }
    error Command failed with exit code 1.
    info Visit <https://yarnpkg.com/en/docs/cli/run> for documentation about this command.
    o
    g
    v
    • 4
    • 10
  • w

    wonderful-egg-79350

    05/12/2022, 5:57 AM
    Hello everyone. How could I change database of DataHub from mysql to other DB? Is there a path configuration? I deployed DataHub using docker container.
    o
    • 2
    • 1
  • m

    most-plumber-32123

    05/12/2022, 5:59 AM
    Hi All Am newbie to the datahub world. Am trying to use Datahub to read metadata from the Snowflake and facing an connection refused error when trigger the ingest command. when check the docker logs datahub-frontend-react am getting an error as below.
    b
    • 2
    • 33
1...282930...119Latest