https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • h

    helpful-london-56362

    09/22/2022, 10:57 AM
    Hello, I keep running into this error when trying to build the frontend docker image. I'm using an M1 machine. Any help please?
    Copy code
    #12 632.3 FAILURE: Build failed with an exception.
    #12 632.3
    #12 632.3 * What went wrong:
    #12 632.3 Execution failed for task ':li-utils:generateDataTemplate'.
    #12 632.3 > Could not resolve all files for configuration ':li-utils:pegasusPlugin'.
    #12 632.3    > Could not resolve com.linkedin.pegasus:data-avro-generator:29.22.16.
    #12 632.3      Required by:
    #12 632.3          project :li-utils
    #12 632.3       > Could not resolve com.linkedin.pegasus:data-avro-generator:29.22.16.
    #12 632.3          > Could not get resource '<https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/data-avro-generator/29.22.16/data-avro-generator-29.22.16.pom>'.
    #12 632.3             > Could not GET '<https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/data-avro-generator/29.22.16/data-avro-generator-29.22.16.pom>'.
    #12 632.3                > Connect to <http://linkedin.jfrog.io:443|linkedin.jfrog.io:443> [<http://linkedin.jfrog.io/104.198.68.46|linkedin.jfrog.io/104.198.68.46>] failed: connect timed out
    #12 632.3    > Could not resolve com.linkedin.pegasus:generator:29.22.16.
    #12 632.3      Required by:
    #12 632.3          project :li-utils
    #12 632.3       > Could not resolve com.linkedin.pegasus:generator:29.22.16.
    #12 632.3          > Could not get resource '<https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/generator/29.22.16/generator-29.22.16.pom>'.
    #12 632.3             > Could not GET '<https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/generator/29.22.16/generator-29.22.16.pom>'.
    #12 632.3                > Connect to <http://linkedin.jfrog.io:443|linkedin.jfrog.io:443> [<http://linkedin.jfrog.io/104.198.68.46|linkedin.jfrog.io/104.198.68.46>] failed: connect timed out
    #12 632.3    > Could not resolve com.linkedin.pegasus:restli-tools:29.22.16.
    #12 632.3      Required by:
    #12 632.3          project :li-utils
    #12 632.3       > Could not resolve com.linkedin.pegasus:restli-tools:29.22.16.
    #12 632.3          > Could not get resource '<https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/restli-tools/29.22.16/restli-tools-29.22.16.pom>'.
    #12 632.3             > Could not GET '<https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/restli-tools/29.22.16/restli-tools-29.22.16.pom>'.
    #12 632.3                > Connect to <http://linkedin.jfrog.io:443|linkedin.jfrog.io:443> [<http://linkedin.jfrog.io/104.198.68.46|linkedin.jfrog.io/104.198.68.46>] failed: connect timed out
    #12 632.3
    #12 632.3 * Try:
    #12 632.3 Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
    #12 632.3
    #12 632.3 * Get more help at <https://help.gradle.org>
    #12 632.3
    #12 632.3 BUILD FAILED in 1m 51s
    h
    g
    • 3
    • 3
  • a

    astonishing-lizard-90580

    09/22/2022, 1:55 PM
    Hey folks, just wondering if there is a paid/SaaS version via Acryl available yet? Thanks!
    h
    b
    l
    • 4
    • 3
  • s

    steep-advantage-66572

    09/22/2022, 3:18 PM
    hi folks, this is probably a very dumb question. When I am browsing a specific data set, where do I see the details on how to actually access that data set? for example, if I am on the demo page at: https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:bigquery,calm[…]ustomers,PROD)/Schema?is_lineage_mode=false&amp;schemaFilter=, my first question would probably be — ok, I’d like to use this dataset now. Where is that info? I see a property called, eg. dbt_file_path, but if I am new user, how do I know there that path is?
    b
    • 2
    • 1
  • b

    bland-sundown-49496

    09/22/2022, 4:02 PM
    [ec2-user@ip-10-110-112-21 ~]$ datahub --debug docker quickstart [2022-09-22 160446,456] DEBUG {datahub.telemetry.telemetry:210} - Sending init Telemetry --- Logging error --- Traceback (most recent call last): File "/home/ec2-user/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 980, in _wrap_create_connection return await self._loop.create_connection(*args, **kwargs) # type: ignore[return-value] # noqa File "/usr/lib64/python3.7/asyncio/base_events.py", line 962, in create_connection raise exceptions[0] File "/usr/lib64/python3.7/asyncio/base_events.py", line 949, in create_connection await self.sock_connect(sock, address) File "/usr/lib64/python3.7/asyncio/selector_events.py", line 473, in sock_connect return await fut File "/usr/lib64/python3.7/asyncio/selector_events.py", line 503, in _sock_connect_cb raise OSError(err, f'Connect call failed {address}') ConnectionRefusedError: [Errno 111] Connect call failed ('127.0.0.1', 8080) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/upgrade/upgrade.py", line 123, in get_server_version_stats server_config = await get_server_config(host, token) File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/upgrade/upgrade.py", line 110, in get_server_config async with session.get(config_endpoint) as dh_response: File "/home/ec2-user/.local/lib/python3.7/site-packages/aiohttp/client.py", line 1141, in aenter self._resp = await self._coro File "/home/ec2-user/.local/lib/python3.7/site-packages/aiohttp/client.py", line 537, in _request req, traces=traces, timeout=real_timeout File "/home/ec2-user/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 540, in connect proto = await self._create_connection(req, traces, timeout) File "/home/ec2-user/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 901, in _create_connection _, proto = await self._create_direct_connection(req, traces, timeout) File "/home/ec2-user/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 1206, in _create_direct_connection raise last_exc File "/home/ec2-user/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 1187, in _create_direct_connection client_error=client_error, File "/home/ec2-user/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 988, in _wrap_create_connection raise client_error(req.connection_key, exc) from exc aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host localhost:8080 ssl:default [Connect call failed ('127.0.0.1', 8080)] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib64/python3.7/logging/__init__.py", line 1025, in emit msg = self.format(record) File "/usr/lib64/python3.7/logging/__init__.py", line 869, in format return fmt.format(record) File "/usr/lib64/python3.7/logging/__init__.py", line 608, in format record.message = record.getMessage() File "/usr/lib64/python3.7/logging/__init__.py", line 369, in getMessage msg = msg % self.args TypeError: not all arguments converted during string formatting Call stack: File "/home/ec2-user/.local/bin/datahub", line 8, in <module> sys.exit(main()) File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/entrypoints.py", line 149, in main sys.exit(datahub(standalone_mode=False, **kwargs)) File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1130, in call return self.main(*args, **kwargs) File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/upgrade/upgrade.py", line 386, in async_wrapper loop.run_until_complete(run_func_check_upgrade()) File "/usr/lib64/python3.7/asyncio/base_events.py", line 574, in run_until_complete self.run_forever() File "/usr/lib64/python3.7/asyncio/base_events.py", line 541, in run_forever self._run_once() File "/usr/lib64/python3.7/asyncio/base_events.py", line 1786, in _run_once handle._run() File "/usr/lib64/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/upgrade/upgrade.py", line 126, in get_server_version_stats log.debug("Failed to get a valid server", e) Message: 'Failed to get a valid server' Arguments: (ClientConnectorError(ConnectionKey(host='localhost', port=8080, is_ssl=False, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=129552675203528702), ConnectionRefusedError(111, "Connect call failed ('127.0.0.1', 8080)")),) [2022-09-22 160446,811] DEBUG {datahub.telemetry.telemetry:243} - Sending Telemetry No Datahub Neo4j volume found, starting with elasticsearch as graph service. To use neo4j as a graph backend, run
    datahub docker quickstart --quickstart-compose-file ./docker/quickstart/docker-compose.quickstart.yml
    from the root of the datahub repo Fetching docker-compose file https://raw.githubusercontent.com/datahub-project/datahub/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml from GitHub [2022-09-22 160447,001] DEBUG {datahub.cli.docker_cli:610} - Copied to /home/ec2-user/.datahub/quickstart/docker-compose.yml Pulling docker images... unknown shorthand flag: 'f' in -f See 'docker --help'.
    h
    • 2
    • 1
  • f

    fierce-baker-1392

    09/23/2022, 7:53 AM
    Hi folks, does datahub support proto2 files? I load a protoc(syntax=proto2) into datahub, but it doesn’t show the fields of protoc. Has some reference document for this question?
    b
    m
    • 3
    • 6
  • a

    astonishing-lizard-90580

    09/23/2022, 2:09 PM
    Just a heads up, the demo project https://demo.datahubproject.io/ is in an infinite reloading loop of doom 🙂
    thanks bear 1
    i
    • 2
    • 1
  • b

    broad-summer-63536

    09/23/2022, 3:12 PM
    hey folks, happy friday! I am trying out DataHub using this doc - and I finished all the steps under Deploying DataHub. I would like to try to ingest the metadata (doc here) and I didn’t find any info about which
    localhost
    I should go to with Docker up & running - could someone guide me through this?
    b
    • 2
    • 2
  • b

    billowy-alarm-46123

    09/23/2022, 4:43 PM
    👋 Happy Friday everyone - sorry if this question was already asked. I wasn't able to find an answer to my question regarding the search. You can do the following easily: search: "platform:my_platform and upstreams:upstream_platform" and this would give me a list of all datasets in my platform which has upstream as upstream_platform - and this is amazing. However, I would like to search for DataSet - which is in my_platform but does not have any upstream set. Basically I would like to do: search: "platform:my_platform and upstreams:none" or search: "platform:my_platform and not upstreams" Or some other logic Any suggestions. Thank you very much Arturas
    ✅ 1
    b
    g
    • 3
    • 4
  • s

    shy-kitchen-7972

    09/27/2022, 3:46 PM
    Hi all, I was wondering something that I noticed in screenshots of datahub UI in demos and blogpost. I saw that at the top of the datahub UI left of your user menu and settings that there are several menu to navigate to. In our environment, we can see govern, ingestion and analytics. However, in blogposts & demos I sometimes also see one for 'My Requests' and 'Tests'. Is this because these screenshots are from a future version of datahub that is not yet released? I do have administrator rights so all the permissions are granted. See for example this article: https://blog.datahubproject.io/datahub-workflows-for-data-platform-governance-leads-796ae3110418
    👀 1
    g
    • 2
    • 1
  • w

    wonderful-notebook-20086

    09/27/2022, 11:44 PM
    I don't really know what channel this would be best suited for, but have there been any discussion around surfacing metrics around
    GetDataAccess
    events emitted by AWS Lake Formation in CloudTrail? It's a pretty narrow use case, I guess, but I was wondering if there might be a way to present this on the "Analytics" portal on the UI?
  • n

    narrow-cat-3364

    09/28/2022, 7:50 AM
    Hello, Guys. Anyone know how to consume DataHub data from Tableau Desktop? I didn't find any tutorial on this part.
    b
    • 2
    • 2
  • a

    agreeable-river-32119

    09/28/2022, 8:26 AM
    Hello,Guys.I try to use lineage_emitter_kafka.py - emits simple dataset-to-dataset lineage via Kafka as MetadataChangeEvent.But it threw the following exceptions: %3|1664353555.956|FAIL|rdkafka#producer-1| [thrdlocalhost9092/1]: localhost9092/1 Connect to ipv6#[:1]9092 failed: Connection refused (after 0ms in state CONNECT) %3|1664353556.239|FAIL|rdkafka#producer-1| [thrdlocalhost9092/1]: localhost9092/1 Connect to ipv4#127.0.0.1:9092 failed: Connection refused (after 0ms in state CONNECT) %3|1664353556.660|FAIL|rdkafka#producer-1| [thrdlocalhost9092/1]: localhost9092/1 Connect to ipv6#[:1]9092 failed: Connection refused (after 0ms in state CONNECT) %3|1664353558.394|FAIL|rdkafka#producer-1| [thrdlocalhost9092/1]: localhost9092/1 Connect to ipv6#[:1]9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
    d
    • 2
    • 4
  • s

    strong-australia-51849

    09/29/2022, 8:09 AM
    Hi guys, Im exploring DataHub and Im just curious if we can export and import metadata from one Datahub to another Datahub instance ?
    b
    g
    • 3
    • 4
  • r

    rapid-potato-4736

    09/29/2022, 8:37 AM
    hello. I want to register owners in batches in a data table that does not have an owner. Is there a way? I am just a user on datahub with permission to add owner. I can go into individual tables and register the plus button.
  • i

    important-night-50346

    09/29/2022, 2:58 PM
    Hello. Is there an option to capture only certain tags from Airflow to Datahub when capture_tags_info enabled? We have lots of tags in airflow and some of them are technical, so we’d like to omit them from ingestion…
    plus1 1
    d
    • 2
    • 1
  • r

    rapid-potato-4736

    09/30/2022, 1:33 AM
    hello. Is it possible to manage (or edit) the tables displayed in DataHub by Excel or other tools? For example, Firstly download from datahub to Excel about the data table and related information managed by the datahub, and then in Excel I can modify the data table information value (description, owner, etc.). Then, is it possible to register the modified Excel data to the datahub and reflect it to datahub?
    b
    m
    +2
    • 5
    • 16
  • r

    rich-pager-68736

    09/30/2022, 7:48 AM
    Hi everyone. We are currently evaluating datahub in our company and I wonder if there is already support or planned support for JSON Schema ingestion? I've seen you support inferring schema information from JSON files, but we already have our schemas defined.
    plus1 1
    m
    • 2
    • 7
  • g

    glamorous-microphone-33484

    09/30/2022, 9:36 AM
    Hi I was trying to load graphiql in offline environment. However, the code is pointing to a public cdn to retrieve the required js files like graphiql, react-dom and etc. Anyone is able to load local copies of these files in their offline env and get it to work (not thinking of setting a internal cdn)? The code I am referring to is https://github.com/datahub-project/datahub/blob/master/metadata-service/auth-impl/src/main/resources/graphiql/index.html
    b
    • 2
    • 2
  • r

    rapid-potato-4736

    10/04/2022, 6:47 AM
    Hi I wondering list of table that don't have owners. How can I figure out? Can I use NO owner filter or sort function?
    l
    • 2
    • 1
  • b

    better-orange-49102

    10/04/2022, 8:04 AM
    is there a lag time before a revoked token is no longer valid? i am still able to create a graphobject in python and emit information to GMS after it is revoked. (admittedly, this is older code, but i dont see a fix that addresses revoked tokens)
    i
    • 2
    • 2
  • c

    chilly-library-82062

    10/04/2022, 1:12 PM
    Hello all, I have found (and it is confusing to me) a lot af @Deprecated inside EntityService class. As those methods doesn't look like they will be soon delete as there is no replacement for them, does anybody know why they are @Deprecated?
  • w

    white-knife-12883

    10/04/2022, 7:51 PM
    I found https://datahubproject.io/docs/generated/ingestion/sources/s3/ Am I reading this correctly that the S3 ingestion tool can describe "data lake" style tables in S3, but that "object store buckets" are not a first class DataHub entity? That is there is no
    allow: '.*'
    to just import all of the "buckets" as DataHub entities so I can give them eg owners?
  • g

    gentle-camera-33498

    10/04/2022, 8:25 PM
    Hello everyone! Are there plans to tweak GraphQL to return 5xx error codes? I'm getting 200 for every call even when there are errors. This makes tracking errors difficult.
    plus1 1
    a
    • 2
    • 2
  • r

    rapid-potato-4736

    10/05/2022, 6:00 AM
    Hello! Q1. Can I register terms in the fields of table? in the similar way as csv enricher. Q2. Csv enricher can handle only table unit? right?
    b
    • 2
    • 5
  • m

    microscopic-tailor-94417

    10/06/2022, 2:44 PM
    Hello everyone, I am working as a data engineer and very new in DataHub. Me and my team are looking for a data governance and quality tool for our new project. We've been searching for the right tool for a while and we're stuck between two options: Datahub and Dataplex. As you may know Dataplex is a Google Cloud tool and we are working on Google BiqQuery but a bit confused about which tool to choose 🧐 Datahub also seems like a tool we can use as well. Could you please give some details about the advantages/disadvantages of Datahub over Dataplex? I wish you all have a great day!
    m
    • 2
    • 2
  • w

    white-knife-12883

    10/06/2022, 7:21 PM
    I'm working with a series of tables that contain multiple time series in the same table. That is the schema of an example table
    time_series_forecasts
    is the along the lines of:
    | ts_series_id | objectid | timestamp | value
    There are >1000 different series inside this one table. I'm like to be able to extend lineage all the way to the individual "series" inside (since just saying "we pull from the giant table" isn't super interesting) but I'm not sure how to express this "series inside a table" with DataHub.
  • m

    miniature-dentist-23033

    10/07/2022, 2:15 AM
    Hello, i am an error in datahub version exec, i need help, please, thanks
    b
    • 2
    • 2
  • a

    average-dinner-25106

    10/07/2022, 9:23 AM
    Hello. I'm now running docker quickstart. To navigate to the Datahub ui, I tried to access localhost:9002 . However, page not found error occurs. Since my work should be credential, I can't connect the internet due to the policy about network segmentation. Does this policy affect page not found error. Also, I found error message that "Error while pulling images. Going go attempt to move on to docker compose up assuming ~" . Does this relate to page not found also?
    b
    m
    • 3
    • 3
  • s

    salmon-jackal-36326

    10/07/2022, 12:20 PM
    Hello! I'd like to know if its possible to see a "preview data" in the tool? If so, where exactly? Thx 🙂
    b
    l
    • 3
    • 2
  • b

    better-orange-49102

    10/10/2022, 6:35 AM
    is there a reason why the docker-compose version of datahub uses mySQL5.7 versus mySQL 8.0.29 for the helm chart version?
    m
    • 2
    • 1
1...434445...80Latest