narrow-painting-12219
02/18/2021, 7:08 PMbig-carpet-38439
02/18/2021, 7:21 PM./ingestion.sh
script under docker/ingestion
. If you want to start loading in your own metadata, you can use the Python Ingestion framework 🙂 cc @gray-shoe-75895narrow-painting-12219
02/18/2021, 7:24 PMnarrow-painting-12219
02/18/2021, 7:25 PMgray-shoe-75895
02/18/2021, 7:27 PMgray-shoe-75895
02/18/2021, 7:28 PMnarrow-painting-12219
02/18/2021, 7:29 PMnarrow-painting-12219
02/18/2021, 7:36 PMdatahub ingest -c examples/recipes/file_to_file.yml
?incalculable-ocean-74010
02/18/2021, 7:36 PMnarrow-painting-12219
02/18/2021, 7:49 PMFailed building wheel for avro-python3
or Failed building wheel for avro-gen
and others? (pip install -e .)narrow-painting-12219
02/18/2021, 7:53 PMerror: invalid command 'bdist_wheel'
mammoth-bear-12532
gray-shoe-75895
02/18/2021, 7:56 PMpip install wheel
?narrow-painting-12219
02/18/2021, 7:58 PMerror: command 'x86_64-linux-gnu-gcc' failed with exit status 1
gray-shoe-75895
02/18/2021, 7:59 PMsudo apt install librdkafka-dev python3-dev python3-venv
?narrow-painting-12219
02/18/2021, 8:00 PMgray-shoe-75895
02/18/2021, 8:00 PMnarrow-painting-12219
02/18/2021, 8:02 PM(venv) me:~/git-github/datahub/metadata-ingestion$ sudo apt install librdkafka-dev python3-dev python3-venv
[sudo] password for cesarribeiro:
Reading package lists... Done
Building dependency tree
Reading state information... Done
librdkafka-dev is already the newest version (0.11.3-1build1).
python3-dev is already the newest version (3.6.7-1~18.04).
python3-venv is already the newest version (3.6.7-1~18.04).
The following packages were automatically installed and are no longer required:
golang-docker-credential-helpers python-asn1crypto python-backports.ssl-match-hostname python-cached-property python-certifi python-cffi-backend python-chardet
python-cryptography python-docker python-dockerpty python-dockerpycreds python-docopt python-enum34 python-funcsigs python-functools32 python-idna python-ipaddress
python-jsonschema python-mock python-openssl python-pbr python-requests python-six python-texttable python-urllib3 python-websocket python-yaml
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 128 not upgraded.
(venv) me:~/git-github/datahub/metadata-ingestion$ python --version
Python 3.6.9
gray-shoe-75895
02/18/2021, 8:02 PMgray-shoe-75895
02/18/2021, 8:02 PMnarrow-painting-12219
02/18/2021, 8:03 PMnarrow-painting-12219
02/18/2021, 8:05 PMgray-shoe-75895
02/18/2021, 8:07 PMgray-shoe-75895
02/18/2021, 8:10 PMgray-shoe-75895
02/18/2021, 8:11 PMpip install confluent_kafka==1.5.0
?narrow-painting-12219
02/18/2021, 8:15 PMgray-shoe-75895
02/18/2021, 8:15 PMnarrow-painting-12219
02/18/2021, 8:18 PMgray-shoe-75895
02/18/2021, 8:18 PMgray-shoe-75895
02/18/2021, 8:19 PMnarrow-painting-12219
02/18/2021, 8:20 PMgray-shoe-75895
02/18/2021, 8:21 PMnarrow-painting-12219
02/18/2021, 8:24 PMgray-shoe-75895
02/18/2021, 8:36 PMpip install avro-python3==1.10.0
narrow-painting-12219
02/18/2021, 8:38 PMnarrow-painting-12219
02/18/2021, 8:41 PMgray-shoe-75895
02/18/2021, 8:48 PMgray-shoe-75895
02/18/2021, 8:48 PMpip freeze
narrow-painting-12219
02/18/2021, 8:48 PMavro-gen==0.3.0
avro-python3===file-.avro-VERSION.txt
certifi==2020.12.5
chardet==4.0.0
click==7.1.2
confluent-kafka==1.5.0
dataclasses==0.8
-e git+<https://github.com/linkedin/datahub.git@12ff330a54bf1eb69b4364a3d622464077cfac5e#egg=datahub&subdirectory=metadata-ingestion>
fastavro==1.3.2
frozendict==1.2
idna==2.10
mypy-extensions==0.4.3
pkg-resources==0.0.0
pydantic==1.7.3
pytz==2021.1
PyYAML==5.4.1
requests==2.25.1
six==1.15.0
SQLAlchemy==1.3.23
toml==0.10.2
typing-extensions==3.7.4.3
tzlocal==2.1
urllib3==1.26.3
gray-shoe-75895
02/18/2021, 8:50 PMgray-shoe-75895
02/18/2021, 8:50 PMnarrow-painting-12219
02/18/2021, 8:51 PM(venv) me@carrbrpoa:~/git-github/datahub/metadata-ingestion$ pip install avro-python3==1.10.0
Collecting avro-python3==1.10.0
Downloading <https://files.pythonhosted.org/packages/b2/5a/819537be46d65a01f8b8c6046ed05603fb9ef88c663b8cca840263788d58/avro-python3-1.10.0.tar.gz>
Requested avro-python3==1.10.0 from <https://files.pythonhosted.org/packages/b2/5a/819537be46d65a01f8b8c6046ed05603fb9ef88c663b8cca840263788d58/avro-python3-1.10.0.tar.gz#sha256=a455c215540b1fceb1823e2a918e94959b54cb363307c97869aa46b5b55bde05>, but installing version file-.avro-VERSION.txt
Building wheels for collected packages: avro-python3
Running setup.py bdist_wheel for avro-python3 ... done
Stored in directory: /home/me/.cache/pip/wheels/3f/15/cd/fe4ec8b88c130393464703ee8111e2cddebdc40e1b59ea85e9
Successfully built avro-python3
Installing collected packages: avro-python3
Found existing installation: avro-python3 file-.avro-VERSION.txt
Uninstalling avro-python3-file-.avro-VERSION.txt:
Successfully uninstalled avro-python3-file-.avro-VERSION.txt
Successfully installed avro-python3-file-.avro-VERSION.txt
gray-shoe-75895
02/18/2021, 8:53 PMpip uninstall avro-python3 && pip cache purge && pip install -e .
narrow-painting-12219
02/18/2021, 8:53 PMnarrow-painting-12219
02/18/2021, 8:54 PMERROR: unknown command "cache" - maybe you meant "check"
gray-shoe-75895
02/18/2021, 8:55 PMpip install --upgrade pip
would helpnarrow-painting-12219
02/18/2021, 8:55 PMnarrow-painting-12219
02/18/2021, 8:58 PM(venv) me@carrbrpoa:~/git-github/datahub/metadata-ingestion$ datahub ingest -c examples/recipes/file_to_file.yml
Traceback (most recent call last):
File "/home/me/git-github/datahub/venv/bin/datahub", line 11, in <module>
load_entry_point('datahub', 'console_scripts', 'datahub')()
File "/home/me/git-github/datahub/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 480, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/home/me/git-github/datahub/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2693, in load_entry_point
return ep.load()
File "/home/me/git-github/datahub/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2324, in load
return self.resolve()
File "/home/me/git-github/datahub/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2330, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/home/me/git-github/datahub/metadata-ingestion/src/datahub/entrypoints.py", line 13, in <module>
from datahub.ingestion.run.pipeline import Pipeline
File "/home/me/git-github/datahub/metadata-ingestion/src/datahub/ingestion/run/pipeline.py", line 9, in <module>
from datahub.ingestion.sink import sink_class_mapping
File "/home/me/git-github/datahub/metadata-ingestion/src/datahub/ingestion/sink/__init__.py", line 6, in <module>
from .datahub_kafka import DatahubKafkaSink
File "/home/me/git-github/datahub/metadata-ingestion/src/datahub/ingestion/sink/datahub_kafka.py", line 13, in <module>
from datahub.metadata.schema_classes import SCHEMA_JSON_STR
ImportError: cannot import name 'SCHEMA_JSON_STR'
gray-shoe-75895
02/18/2021, 9:00 PMpip uninstall avro-gen && pip cache purge && pip install -e .
- my bet is that the old pip installed an old avro-gennarrow-painting-12219
02/19/2021, 11:39 AMnarrow-painting-12219
02/22/2021, 6:34 PM./gradlew :metadata-events:mxe-schemas:build
step) and got several errors: https://gist.github.com/carrbrpoa/04e3c5bb5fe9c92b596089375b1f4c1c
(Python 3.8.2)narrow-painting-12219
02/22/2021, 6:51 PMgray-shoe-75895
02/22/2021, 6:59 PMgray-shoe-75895
02/22/2021, 6:59 PMnarrow-painting-12219
02/22/2021, 7:10 PM