Hey guys! I'am trying to reproduce "No Code Modeli...
# ingestion
d
Hey guys! I'am trying to reproduce "No Code Modeling" at docs. I run
./gradlew clean build
then 
./docker/dev.sh
 and everything sounds good. But when I try to create a new "Service entity" I got this error:
Copy code
{"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Parameters of method 'ingest' failed validation with error 'ERROR :: /entity/value :: \"com.linkedin.metadata.snapshot.ServiceSnapshot\" is not a member type of union [ { \"type\" : \"record\", \"name\" : \"ChartSnapshot\", \"namespace\" : \"com.linkedin.metadata.snapshot\", \"doc\" : \"A metadata snapshot for a specific Chart entity.\", \"fields\" : [ { \"name\" : \"urn\", \"type\" : { \"type\" : \"typeref\", \"name\" : \"ChartUrn\", \"namespace\" : \"com.linkedin.common\", \"doc\" : \"Standardized chart identifier\", \"ref\" : \"string\", \"java\" : { \"class\" : \"com.linkedin.common.urn.ChartUrn\" }, \"validate\" : { \"com.linkedin.common.validator.TypedUrnValidator\" : { \"accessible\" : true, \"constructable\" : true, \"doc\" : \"Standardized chart identifier\", \"entityType\" : \"chart\", \"fields\" : [ { \"doc\" : \"The name of the dashboard tool such as looker, redash etc.\", \"maxLength\" : 20, \"name\" : \"dashboardTool\", \"type\" : \"string\" }, { \"doc\" : \"Unique id for the chart. This id should be globally unique for a dashboarding tool even when there are multiple deployments of it. As an example, chart URL could be used here for Looker such as '<http://looker.linkedin.com/looks/1234|looker.linkedin.com/looks/1234>'\", \"maxLength\" : 200, \"name\" : \"chartId\", \"type\" : \"string\" } ], \"maxLength\" : 236, \"name\" : \"Chart\", \"namespace\" : \"li\", \"owners\" : [ \"urn:li:corpuser:fbar\", \"urn:li:corpuser:bfoo\" ], \"owningTeam\" : \"urn:li:internalTeam:datahub\" } } }, \"doc\" : \"URN for the entity the metadata snapshot is associated with.\" }, { \"name\" : \"aspects\", \"type\" : { \"type\" : \"array\", \"items\" : { \"type\" : \"typeref\", \"name\" : \"ChartAspect\", \"namespace\" : \"com.linkedin.metadata.aspect\", \"doc\" : \"A union of all supported metadata aspects for a Chart\", \"ref\" : [ { \"type\" : \"record\", \"name\" : \"ChartKey\", \"namespace\" : \"com.linkedin.metadata.key\", \"doc\" : \"Key for a Chart\", \"fields\" : [ { \"name\" : \"dashboardTool\", \"type\" : \"string\", \"doc\" : \"The name of the dashboard tool such as looker, redash etc.\", \"Searchable\" : { \"addToFilters\" : true, \"boostScore\" : 4.0, \"fieldName\" : \"tool\", \"fieldType\" : \"TEXT_PARTIAL\" } }, { \"name\" : \"chartId\", \"type\" : \"string\", \"doc\" : \"Unique id for the chart. This id should be globally unique for a dashboarding tool even when there are multiple deployments of it. As an example, chart URL could be used here for Looker such as '<http://looker.linkedin.com/looks/1234|looker.linkedin.com/looks/1234>'\" } ], \"Aspect\" : { \"name\" : \"chartKey\" } }, { \"type\" : \"record\", \"name\" : \"ChartInfo\", \"namespace\" : \"com.linkedin.chart\", \"doc\" : \"Information about a chart\", \"include\" : [ { \"type\" : \"record\", \"name\" : \"CustomProperties\", \"namespace\" : \"com.linkedin.common\", \"doc\" : \"Misc. properties about an entity.\",
[...]
I noticed that
ServiceSnapshot.pdl
(described in the docs) is not in the master at
/metadata/snapshot/
(link). Also
Snapshot.pdl
is not up-to-date containing
ServiceSnapshot.pdl
inside
union
(link). Any ideas? Am I forgetting something?
e
Hey! Those were examples of how to add a new entity, so they were not pushed to master. You will have to create the pdl files in the example to add a new entity!
b
So in your local do you have ServiceSnapshot.pdl? And referenced within Snapshot.pdl?
👍 1
d
@early-lamp-41924 So... what about ServiceKey.pdl, ServiceAspect.pdl and ServiceInfo.pdl? Were they pushed to master by mistake? 😄 But ok, I think that I should create the remaining files by my own... Thank you!
b
Didn't realize those were in the repo.. .They were definitely not supposed to be 😞
I can remove them. Thanks for raising
👍 1
d
Copy code
[...]
+ mypy src/ tests/
tests/unit/test_packaging.py:8: error: Library stubs not installed for "pkg_resources" (or incompatible with Python 3.8)
tests/unit/test_packaging.py:8: note: Hint: "python3 -m pip install types-pkg_resources"
src/datahub/metadata/schema_classes.py:7: error: Library stubs not installed for "six" (or incompatible with Python 3.8)
src/datahub/metadata/schema_classes.py:7: note: Hint: "python3 -m pip install types-six"
src/datahub/ingestion/source/superset.py:5: error: Library stubs not installed for "dateutil.parser" (or incompatible with Python 3.8)
src/datahub/ingestion/source/superset.py:5: note: Hint: "python3 -m pip install types-python-dateutil"
src/datahub/ingestion/source/superset.py:5: error: Library stubs not installed for "dateutil" (or incompatible with Python 3.8)
src/datahub/ingestion/source/superset.py:6: error: Library stubs not installed for "requests" (or incompatible with Python 3.8)
src/datahub/ingestion/source/kafka_connect.py:7: error: Library stubs not installed for "requests" (or incompatible with Python 3.8)
src/datahub/emitter/rest_emitter.py:8: error: Library stubs not installed for "requests" (or incompatible with Python 3.8)
src/datahub/emitter/rest_emitter.py:8: note: Hint: "python3 -m pip install types-requests"
src/datahub/emitter/rest_emitter.py:9: error: Library stubs not installed for "requests.exceptions" (or incompatible with Python 3.8)
src/datahub/configuration/toml.py:3: error: Library stubs not installed for "toml" (or incompatible with Python 3.8)
src/datahub/configuration/toml.py:3: note: Hint: "python3 -m pip install types-toml"
tests/unit/test_rest_sink.py:4: error: Library stubs not installed for "requests" (or incompatible with Python 3.8)
src/datahub/ingestion/source/mysql.py:2: error: Library stubs not installed for "pymysql" (or incompatible with Python 3.8)
src/datahub/ingestion/source/mysql.py:2: note: Hint: "python3 -m pip install types-PyMySQL"
src/datahub/configuration/yaml.py:3: error: Library stubs not installed for "yaml" (or incompatible with Python 3.8)
src/datahub/configuration/yaml.py:3: note: Hint: "python3 -m pip install types-PyYAML"
tests/unit/test_glue_source.py:5: error: Library stubs not installed for "freezegun" (or incompatible with Python 3.8)
tests/unit/test_glue_source.py:5: note: Hint: "python3 -m pip install types-freezegun"
src/datahub/ingestion/run/pipeline.py:5: error: Library stubs not installed for "click" (or incompatible with Python 3.8)
src/datahub/check/check_cli.py:3: error: Library stubs not installed for "click" (or incompatible with Python 3.8)
src/datahub/check/check_cli.py:3: note: Hint: "python3 -m pip install types-click"
src/datahub/check/check_cli.py:3: note: (or run "mypy --install-types" to install all missing stub packages)
src/datahub/check/check_cli.py:3: note: See <https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports>
src/datahub/check/check_cli.py:12: error: Untyped decorator makes function "check" untyped
src/datahub/check/check_cli.py:17: error: Untyped decorator makes function "mce_file" untyped
src/datahub/check/check_cli.py:18: error: Untyped decorator makes function "mce_file" untyped
src/datahub/check/check_cli.py:26: error: Untyped decorator makes function "local_docker" untyped
src/datahub/check/check_cli.py:40: error: Untyped decorator makes function "plugins" untyped
src/datahub/check/check_cli.py:41: error: Untyped decorator makes function "plugins" untyped
src/datahub_provider/_lineage_core.py:4: error: Library stubs not installed for "dateutil.parser" (or incompatible with Python 3.8)
src/datahub_provider/_lineage_core.py:4: error: Library stubs not installed for "dateutil" (or incompatible with Python 3.8)
src/datahub/entrypoints.py:6: error: Library stubs not installed for "click" (or incompatible with Python 3.8)
src/datahub/entrypoints.py:28: error: Untyped decorator makes function "datahub" untyped
src/datahub/entrypoints.py:29: error: Untyped decorator makes function "datahub" untyped
src/datahub/entrypoints.py:30: error: Untyped decorator makes function "datahub" untyped
src/datahub/entrypoints.py:46: error: Untyped decorator makes function "version" untyped
src/datahub/entrypoints.py:53: error: Untyped decorator makes function "ingest" untyped
src/datahub/entrypoints.py:54: error: Untyped decorator makes function "ingest" untyped
tests/unit/test_plugin_system.py:2: error: Library stubs not installed for "click.testing" (or incompatible with Python 3.8)
tests/unit/test_check.py:1: error: Library stubs not installed for "click.testing" (or incompatible with Python 3.8)
tests/unit/serde/test_serde.py:8: error: Library stubs not installed for "click.testing" (or incompatible with Python 3.8)
tests/integration/sql_server/test_sql_server.py:5: error: Library stubs not installed for "click.testing" (or incompatible with Python 3.8)
tests/integration/mysql/test_mysql.py:2: error: Library stubs not installed for "click.testing" (or incompatible with Python 3.8)
Found 35 errors in 19 files (checked 144 source files)

> Task :metadata-ingestion:lint FAILED
After add new pdl files, I run
./gradlew :gms:impl:build -Prest.model.compatibility=ignore
and then:
./gradlew build
As you can see task metadata ingestionlint failed. Running :
./docker/dev.sh
Copy code
Creating mysql         ... done
Creating neo4j         ... done
Creating elasticsearch ... done
Creating zookeeper           ... done
Creating elasticsearch-setup ... done
Creating kibana              ... done
Creating broker              ... done
Creating schema-registry     ... done
Creating kafka-rest-proxy     ... done
Creating kafka-setup         ... done
Creating schema-registry-ui   ... done
Creating datahub-mae-consumer ... done
Creating datahub-gms          ... done
Creating kafka-topics-ui        ... done
Creating datahub-mce-consumer   ... done
Creating datahub-frontend-react ... error

ERROR: for datahub-frontend-react  Cannot start service datahub-frontend-react: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "datahub-frontend/bin/playBinary": stat datahub-frontend/bin/playBinary: no such file or directory: unknown

ERROR: for datahub-frontend-react  Cannot start service datahub-frontend-react: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "datahub-frontend/bin/playBinary": stat datahub-frontend/bin/playBinary: no such file or directory: unknown
ERROR: Encountered errors while bringing up the project.
How can I fix this? 😆
b
cc @gray-shoe-75895 not sure where this lint failure is coming from^
d
For some reason
./gradlew clean
didn't work for me in this case. After I removed the
build
folders by my own everything worked according to No Code Metadata Modeling. So weird... 😄
b
Oh wow! Maybe clean is misconfigured for some of these modules...
👍 1
g
The python lint failures should be fixed on latest, since this PR got merged https://github.com/linkedin/datahub/pull/2666
🙌 2
c
@dazzling-book-76108 came across your message because I'm seeing a similar error when trying to run
./docker/dev.sh
. What
build
folders did you have to remove in order to get this working?
👀 1
👍 1
Any help or ideas from the rest of the team would be most welcome 🙏 error details below Commands I ran:
Copy code
git checkout master  # up to date with upstream
./gradlew clean build  # output says "Task :metadata-ingestion:test FAILED", but otherwise seems OK
datahub docker nuke
./docker/dev.sh
output from
dev.sh
with similar error that Matheus mentioned above:
Copy code
Creating mysql                 ... done
Creating zookeeper             ... done
Creating elasticsearch ... done
Creating neo4j                 ... done
Recreating elasticsearch-setup ... done
Creating broker                ... done
Recreating mysql-setup         ... done
Creating schema-registry       ... done
Recreating kafka-setup         ... done
Creating datahub-gms           ... done
Recreating 16d2c085f9f3_datahub-frontend-react ... error

ERROR: for 16d2c085f9f3_datahub-frontend-react  Cannot start service datahub-frontend-react: OCI runtime create failed: container_linux.go:367: starting container process caused: exec: "datahub-frontend/bin/playBinary": stat datahub-frontend/bin/playBinary: no such file or directory: unknown

ERROR: for datahub-frontend-react  Cannot start service datahub-frontend-react: OCI runtime create failed: container_linux.go:367: starting container process caused: exec: "datahub-frontend/bin/playBinary": stat datahub-frontend/bin/playBinary: no such file or directory: unknown
ERROR: Encountered errors while bringing up the project.
b
@colossal-account-65055. Seems that the ingestion test failure may be terminated the build.. Do you mind trying
./gradlew datahub-frontend:build
and then dev.sh again?
cc. @gray-shoe-75895 for the ingestion tests
g
@colossal-account-65055 it’d be great it you could send me the logs from the metadata-ingestion:test failure
c
Thank you both! @big-carpet-38439 yep you're right, seems obvious now, but building the frontend individually fixed the issue. @gray-shoe-75895 here are the logs
I'm able to reproduce the problem on a fresh clone on the
master
branch
g
@colossal-account-65055 - by chance, have you ever installed airflow on that machine? seems like (1) the airflow test was not properly isolated from the rest of the system and (2) it picked up an old config version that was lying around on your machine
c
@gray-shoe-75895 yep you're right, thanks for the diagnosis assist! Found https://stackoverflow.com/questions/65385500/valueerror-invalid-literal-for-int-with-base-10-30-0-when-running-unittest and that steered me to just
rm ~/airflow/airflow.cfg
.
Still not seeing a successful gradle build, however 😞 Any ideas on the following error?
Copy code
$ docusaurus build

[en] Creating an optimized production build...
Loading of version failed for version "current"
Unable to build website for locale "en".
Error: Invalid sidebar file at "sidebars.js".
These sidebar document ids do not exist:
- releases

Available document ids are:
- README
- datahub-frontend/README
...
g
That’s pretty odd - what commit are you on (this is likely related to https://github.com/linkedin/datahub/pull/2776)? Also, if you try running a clean first does it resolve the issue?
c
Yep that's it, updating to the latest commit and running clean first fixes it 👍
I see a failure in a unit test though, does this look familiar? The test is
test_package_discovery()
Thanks for your patience 🙏 let me know if there is a better venue for these questions
g
Ah that’s my bad - I believe running
find . -type f -name '*.py[co]' -delete -o -type d -name __pycache__ -delete
will fix that issue
c
Hm, I'm still seeing the failure
Er, no not the same one, but the docs website failure is back. This feels a bit like whack-a-mole 😅 sorry for needing so much help! I'm running:
Copy code
./gradlew clean  # output is green
find . -type f -name '*.py[co]' -delete -o -type d -name __pycache__ -delete
./gradlew build  # Task :docs-website:yarnBuild fails with the same sidebar document ids error I sent before
b
it's complaining about the
releases
file?
let me know if you see a file
releases.md
under
docs-website/genDocs/
c
Copy code
ls docs-website/genDocs
README.md                   datahub-graphql-core        docker                      metadata-ingestion
datahub-frontend            datahub-kubernetes          docs                        metadata-jobs
datahub-gms-graphql-service datahub-web-react           gms
I think the most recent error I'm seeing is still on the
docs-website:yarnBuild
step but not related to releases anymore. I'm not 100% sure I am parsing the output right but it looks like maybe the yarnBuild failure is related to an earlier yarnGenerate error that has to do with throttling for a rate limit? Let me send you the whole output:
g
This looks like a bug - I believe https://github.com/linkedin/datahub/pull/2847 should fix it
c
@gray-shoe-75895, this is cool, thank you for the bug fix! 🙏 Are you able to run the full build on your branch? I checked out your branch and the docs generation step now succeeds for me (yay!) but the
:metadata-ingestion:testQuick
step fails, even after running the
find
command you sent earlier which initially seemed to fix the problem.
g
It seems to work on my end. Maybe try running
*git clean* -*fdx*
from the metadata-ingestion directory to restore it to a “clean checkout”?
a
I am also failing the airflow unit test although it doesn’t seem to show an specific error message even with
--stacktrace
My
docker/dev.sh
also fails not sure if related
Copy code
👉  docker/dev.sh
[+] Running 5/11
 ⠿ broker Pulled                                                                                                                                                             0.8s
 ⠿ elasticsearch-setup Error                                                                                                                                                 0.8s
 ⠿ elasticsearch Pulled                                                                                                                                                      0.8s
 ⠿ schema-registry Pulled                                                                                                                                                    0.7s
 ⠿ mysql-setup Pulled                                                                                                                                                        0.7s
 ⠿ mysql Error                                                                                                                                                               0.8s
 ⠿ kafka-setup Error                                                                                                                                                         0.8s
 ⠿ datahub-gms Error                                                                                                                                                         0.8s
 ⠿ datahub-frontend-react Error                                                                                                                                              0.8s
 ⠿ zookeeper Pulled                                                                                                                                                          0.8s
 ⠇ neo4j Pulling                                                                                                                                                             0.8s
Error response from daemon: manifest for linkedin/datahub-gms:debug not found: manifest unknown: manifest unknown
I have done
./gradlew build -x check
which succeeds but
docker/dev.sh
still fails with the same error as above
g
@colossal-account-65055I’ve updated the “clean” command to run that find and delete command
@ambitious-lifeguard-64025 the error you’re facing is quite weird, and not one that I’ve seen before
Copy code
tests/unit/test_airflow.py:7: in <module>
    import airflow.configuration
venv/lib/python3.8/site-packages/airflow/__init__.py:50: in <module>
    from airflow.models import DAG  # noqa: E402
venv/lib/python3.8/site-packages/airflow/models/__init__.py:21: in <module>
    from airflow.models.baseoperator import BaseOperator, BaseOperatorLink  # noqa: F401
venv/lib/python3.8/site-packages/airflow/models/baseoperator.py:43: in <module>
    from airflow.models.dag import DAG
venv/lib/python3.8/site-packages/airflow/models/dag.py:47: in <module>
    from airflow.executors import LocalExecutor, get_default_executor
venv/lib/python3.8/site-packages/airflow/executors/__init__.py:23: in <module>
    from airflow.executors.base_executor import BaseExecutor # noqa
venv/lib/python3.8/site-packages/airflow/executors/base_executor.py:24: in <module>
    import airflow.utils.dag_processing
venv/lib/python3.8/site-packages/airflow/utils/dag_processing.py:40: in <module>
    from setproctitle import setproctitle
E   ImportError: dlopen(/Users/szeng/Documents/Learn/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/setproctitle.cpython-38-darwin.so, 2): Symbol not found: _Py_GetArgcArgv
E     Referenced from: /Users/szeng/Documents/Learn/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/setproctitle.cpython-38-darwin.so
E     Expected in: flat namespace
E    in /Users/szeng/Documents/Learn/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/setproctitle.cpython-38-darwin.so
This seems to be related to this stackoverflow question: https://stackoverflow.com/questions/60684146/airflow-initdb-undefined-symbol-py-getargcargv
👀 1
c
@gray-shoe-75895 finally got a green build 🥳 thank you very much for all your help and troubleshooting!!
🎉 2
g
Woohoo!!
a
Thanks @gray-shoe-75895 running it in
pyenv
fixed the airflow tests, however i still get manifest unknown error when I run
dev.sh
Copy code
👉  ./dev.sh
[+] Running 2/11
 ⠿ broker Error                                                                    0.8s
 ⠿ elasticsearch-setup Error                                                       0.8s
 ⠿ elasticsearch Error                                                             0.8s
 ⠇ mysql Pulling                                                                   0.8s
 ⠿ datahub-frontend-react Error                                                    0.8s
 ⠇ neo4j Pulling                                                                   0.8s
 ⠿ mysql-setup Pulled                                                              0.8s
 ⠿ datahub-gms Error                                                               0.8s
 ⠿ zookeeper Pulled                                                                0.8s
 ⠿ schema-registry Error                                                           0.8s
 ⠿ kafka-setup Error                                                               0.8s
Error response from daemon: manifest for linkedin/datahub-frontend-react:debug not found: manifest unknown: manifest unknown