Thread
#getting-started
    n

    narrow-painting-12219

    1 year ago
    Hello I just did run datahub with docker-compose and opened frontend. How do I learn how to "load data" to datahub? It's kind of foggy to me, yet
    b

    big-carpet-38439

    1 year ago
    Hi Cesar -- To load sample data, you can use the
    ./ingestion.sh
    script under
    docker/ingestion
    . If you want to start loading in your own metadata, you can use the Python Ingestion framework 🙂 cc @gray-shoe-75895
    n

    narrow-painting-12219

    1 year ago
    I see, just loaded the sample data. My main issue now is how to figure out, let's say, how to create metadata about postgresql/greenplum or oracle's schemas/tables in my company
    Is there a step by step about that?
    g

    gray-shoe-75895

    1 year ago
    Yes the Python ingestion framework is perfect for that - see https://github.com/linkedin/datahub/tree/master/metadata-ingestion. We already have support for Postgres as a metadata source, and adding other databases is fairly straightforward as well
    Let me know if you need any help with it! It'd also be helpful to know if you find anything confusing so that I can improve the docs
    n

    narrow-painting-12219

    1 year ago
    I'll check that, thanks
    Where do I run
    datahub ingest -c examples/recipes/file_to_file.yml
    ?
    i

    incalculable-ocean-74010

    1 year ago
    In the metadata-ingest folder of the project
    Make sure to read the read me there, you will have to compile and install things
    n

    narrow-painting-12219

    1 year ago
    What if I see things like
    Failed building wheel for avro-python3
    or
    Failed building wheel for avro-gen
    and others? (pip install -e .)
    error: invalid command 'bdist_wheel'
    m

    mammoth-bear-12532

    1 year ago
    @gray-shoe-75895: ^^
    g

    gray-shoe-75895

    1 year ago
    huh that's odd - can you try
    pip install wheel
    ?
    n

    narrow-painting-12219

    1 year ago
    Sure, that solved! New problem:
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    g

    gray-shoe-75895

    1 year ago
    That means that you're missing some python headers - did you run
    sudo apt install librdkafka-dev python3-dev python3-venv
    ?
    n

    narrow-painting-12219

    1 year ago
    Yep, will do again 🙂
    g

    gray-shoe-75895

    1 year ago
    Also, what python version are you using?
    n

    narrow-painting-12219

    1 year ago
    (venv) me:~/git-github/datahub/metadata-ingestion$ sudo apt install librdkafka-dev python3-dev python3-venv
    [sudo] password for cesarribeiro: 
    Reading package lists... Done
    Building dependency tree       
    Reading state information... Done
    librdkafka-dev is already the newest version (0.11.3-1build1).
    python3-dev is already the newest version (3.6.7-1~18.04).
    python3-venv is already the newest version (3.6.7-1~18.04).
    The following packages were automatically installed and are no longer required:
      golang-docker-credential-helpers python-asn1crypto python-backports.ssl-match-hostname python-cached-property python-certifi python-cffi-backend python-chardet
      python-cryptography python-docker python-dockerpty python-dockerpycreds python-docopt python-enum34 python-funcsigs python-functools32 python-idna python-ipaddress
      python-jsonschema python-mock python-openssl python-pbr python-requests python-six python-texttable python-urllib3 python-websocket python-yaml
    Use 'sudo apt autoremove' to remove them.
    0 upgraded, 0 newly installed, 0 to remove and 128 not upgraded.
    (venv) me:~/git-github/datahub/metadata-ingestion$ python --version
    Python 3.6.9
    g

    gray-shoe-75895

    1 year ago
    That all seems fine
    What was the full error you got with the pip install?
    n

    narrow-painting-12219

    1 year ago
    Several 😄 1 sec..
    https://gist.github.com/carrbrpoa/f441a3e1a4bc59125e970d557c1e1070 I'm not familiar with python, sorry if it's something trivial 😅
    g

    gray-shoe-75895

    1 year ago
    That's a new error to me as well - definitely not trivial 🙂
    It seems there's a version mismatch between the librdkafka you have installed and the python wrapper library. The other surprising thing is that it's trying to build from source instead of using the prebuilt packages
    Can you try running
    pip install confluent_kafka==1.5.0
    ?
    n

    narrow-painting-12219

    1 year ago
    g

    gray-shoe-75895

    1 year ago
    Great - that seems to have worked. Now can you try the original pip install command again?
    n

    narrow-painting-12219

    1 year ago
    I tried! Line 7 in that gist 😄
    g

    gray-shoe-75895

    1 year ago
    Ah didn't catch that
    Then you should be good to go!
    n

    narrow-painting-12219

    1 year ago
    Yep, will follow next steps! Thanks for the help
    g

    gray-shoe-75895

    1 year ago
    Glad I could help, and I'll be updating the docs so people don't run into these issues in the future 🙂
    n

    narrow-painting-12219

    1 year ago
    g

    gray-shoe-75895

    1 year ago
    Yet another odd one - can you try
    pip install avro-python3==1.10.0
    n

    narrow-painting-12219

    1 year ago
    Done; just datahub ingest again?
    Same error running again; maybe I should redo some step?
    g

    gray-shoe-75895

    1 year ago
    Yeah perhaps
    Can you list your installed packages with
    pip freeze
    n

    narrow-painting-12219

    1 year ago
    avro-gen==0.3.0
    avro-python3===file-.avro-VERSION.txt
    certifi==2020.12.5
    chardet==4.0.0
    click==7.1.2
    confluent-kafka==1.5.0
    dataclasses==0.8
    -e git+<https://github.com/linkedin/datahub.git@12ff330a54bf1eb69b4364a3d622464077cfac5e#egg=datahub&subdirectory=metadata-ingestion>
    fastavro==1.3.2
    frozendict==1.2
    idna==2.10
    mypy-extensions==0.4.3
    pkg-resources==0.0.0
    pydantic==1.7.3
    pytz==2021.1
    PyYAML==5.4.1
    requests==2.25.1
    six==1.15.0
    SQLAlchemy==1.3.23
    toml==0.10.2
    typing-extensions==3.7.4.3
    tzlocal==2.1
    urllib3==1.26.3
    g

    gray-shoe-75895

    1 year ago
    That avro-python3 part looks weird
    it should’ve been 1.10.0
    n

    narrow-painting-12219

    1 year ago
    here's what it showed when i tried that command to install 1.10:
    (venv) me@carrbrpoa:~/git-github/datahub/metadata-ingestion$ pip install avro-python3==1.10.0
    Collecting avro-python3==1.10.0
      Downloading <https://files.pythonhosted.org/packages/b2/5a/819537be46d65a01f8b8c6046ed05603fb9ef88c663b8cca840263788d58/avro-python3-1.10.0.tar.gz>
      Requested avro-python3==1.10.0 from <https://files.pythonhosted.org/packages/b2/5a/819537be46d65a01f8b8c6046ed05603fb9ef88c663b8cca840263788d58/avro-python3-1.10.0.tar.gz#sha256=a455c215540b1fceb1823e2a918e94959b54cb363307c97869aa46b5b55bde05>, but installing version file-.avro-VERSION.txt
    Building wheels for collected packages: avro-python3
      Running setup.py bdist_wheel for avro-python3 ... done
      Stored in directory: /home/me/.cache/pip/wheels/3f/15/cd/fe4ec8b88c130393464703ee8111e2cddebdc40e1b59ea85e9
    Successfully built avro-python3
    Installing collected packages: avro-python3
      Found existing installation: avro-python3 file-.avro-VERSION.txt
        Uninstalling avro-python3-file-.avro-VERSION.txt:
          Successfully uninstalled avro-python3-file-.avro-VERSION.txt
    Successfully installed avro-python3-file-.avro-VERSION.txt
    g

    gray-shoe-75895

    1 year ago
    Hmm it tried to build from source, which I suspect is broken. Can you try
    pip uninstall avro-python3 && pip cache purge && pip install -e .
    n

    narrow-painting-12219

    1 year ago
    pip cache purge didn't work. ok?
    ERROR: unknown command "cache" - maybe you meant "check"
    g

    gray-shoe-75895

    1 year ago
    do you have an older version of pip? maybe a
    pip install --upgrade pip
    would help
    n

    narrow-painting-12219

    1 year ago
    could be! will do
    Rerunning ingest:
    (venv) me@carrbrpoa:~/git-github/datahub/metadata-ingestion$ datahub ingest -c examples/recipes/file_to_file.yml
    Traceback (most recent call last):
      File "/home/me/git-github/datahub/venv/bin/datahub", line 11, in <module>
        load_entry_point('datahub', 'console_scripts', 'datahub')()
      File "/home/me/git-github/datahub/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 480, in load_entry_point
        return get_distribution(dist).load_entry_point(group, name)
      File "/home/me/git-github/datahub/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2693, in load_entry_point
        return ep.load()
      File "/home/me/git-github/datahub/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2324, in load
        return self.resolve()
      File "/home/me/git-github/datahub/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2330, in resolve
        module = __import__(self.module_name, fromlist=['__name__'], level=0)
      File "/home/me/git-github/datahub/metadata-ingestion/src/datahub/entrypoints.py", line 13, in <module>
        from datahub.ingestion.run.pipeline import Pipeline
      File "/home/me/git-github/datahub/metadata-ingestion/src/datahub/ingestion/run/pipeline.py", line 9, in <module>
        from datahub.ingestion.sink import sink_class_mapping
      File "/home/me/git-github/datahub/metadata-ingestion/src/datahub/ingestion/sink/__init__.py", line 6, in <module>
        from .datahub_kafka import DatahubKafkaSink
      File "/home/me/git-github/datahub/metadata-ingestion/src/datahub/ingestion/sink/datahub_kafka.py", line 13, in <module>
        from datahub.metadata.schema_classes import SCHEMA_JSON_STR
    ImportError: cannot import name 'SCHEMA_JSON_STR'
    g

    gray-shoe-75895

    1 year ago
    Huh the codegen also didn't run correctly. Can you try
    pip uninstall avro-gen && pip cache purge && pip install -e .
    - my bet is that the old pip installed an old avro-gen
    n

    narrow-painting-12219

    1 year ago
    Hello, resuming today Worked! Thanks a lot Now, I'll try to follow the pg recipe
    It's me again in this thread 😄 Today I tried to install things in another environment (windows 10 -
    ./gradlew :metadata-events:mxe-schemas:build
    step) and got several errors: https://gist.github.com/carrbrpoa/04e3c5bb5fe9c92b596089375b1f4c1c (Python 3.8.2)
    (this step doesn't depend on datahub services running, right?)
    g

    gray-shoe-75895

    1 year ago
    It shouldn’t depend on anything else. That step is actually included as part of the normal build, so it points to a broader issue here
    If you run a simple ./gradlew build, what happens?
    n

    narrow-painting-12219

    1 year ago