https://datahubproject.io logo
#troubleshoot
Title
# troubleshoot
c

crooked-holiday-47153

09/28/2022, 9:42 AM
Hi All, I am trying to upgrade my local DataHub getting started setup as I already did successfully in the past but now it doesn't work. This is the command I execute:
Copy code
datahub docker quickstart --quickstart-compose-file docker/quickstart/docker-compose-without-neo4j.quickstart.yml
and this is the output I am getting:
Copy code
Pulling docker images...
unknown shorthand flag: 'f' in -f
See 'docker --help'.

Usage:  docker [OPTIONS] COMMAND

A self-sufficient runtime for containers

Options:
      --config string      Location of client config files (default "/home/ssm-user/.docker")
  -c, --context string     Name of the context to use to connect to the daemon (overrides DOCKER_HOST env var and default context set with "docker context use")
  -D, --debug              Enable debug mode
  -H, --host list          Daemon socket(s) to connect to
  -l, --log-level string   Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")
      --tls                Use TLS; implied by --tlsverify
      --tlscacert string   Trust certs signed only by this CA (default "/home/ssm-user/.docker/ca.pem")
      --tlscert string     Path to TLS certificate file (default "/home/ssm-user/.docker/cert.pem")
      --tlskey string      Path to TLS key file (default "/home/ssm-user/.docker/key.pem")
      --tlsverify          Use TLS and verify the remote
  -v, --version            Print version information and quit

Management Commands:
  builder     Manage builds
  config      Manage Docker configs
  container   Manage containers
  context     Manage contexts
  image       Manage images
  manifest    Manage Docker image manifests and manifest lists
  network     Manage networks
  node        Manage Swarm nodes
  plugin      Manage plugins
  secret      Manage Docker secrets
  service     Manage services
  stack       Manage Docker stacks
  swarm       Manage Swarm
  system      Manage Docker
  trust       Manage trust on Docker images
  volume      Manage volumes

Commands:
  attach      Attach local standard input, output, and error streams to a running container
  build       Build an image from a Dockerfile
  commit      Create a new image from a container's changes
  cp          Copy files/folders between a container and the local filesystem
  create      Create a new container
  diff        Inspect changes to files or directories on a container's filesystem
  events      Get real time events from the server
  exec        Run a command in a running container
  export      Export a container's filesystem as a tar archive
  history     Show the history of an image
  images      List images
  import      Import the contents from a tarball to create a filesystem image
  info        Display system-wide information
  inspect     Return low-level information on Docker objects
  kill        Kill one or more running containers
  load        Load an image from a tar archive or STDIN
  login       Log in to a Docker registry
  logout      Log out from a Docker registry
  logs        Fetch the logs of a container
  pause       Pause all processes within one or more containers
  port        List port mappings or a specific mapping for the container
  ps          List containers
  pull        Pull an image or a repository from a registry
  push        Push an image or a repository to a registry
  rename      Rename a container
  restart     Restart one or more containers
  rm          Remove one or more containers
  rmi         Remove one or more images
  run         Run a command in a new container
  save        Save one or more images to a tar archive (streamed to STDOUT by default)
  search      Search the Docker Hub for images
  start       Start one or more stopped containers
  stats       Display a live stream of container(s) resource usage statistics
  stop        Stop one or more running containers
  tag         Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
  top         Display the running processes of a container
  unpause     Unpause all processes within one or more containers
  update      Update configuration of one or more containers
  version     Show the Docker version information
  wait        Block until one or more containers stop, then print their exit codes

Run 'docker COMMAND --help' for more information on a command.

To get more help with docker, check out our guides at <https://docs.docker.com/go/guides/>

Error while pulling images. Going to attempt to move on to docker compose up assuming the images have been built locally
Any help will be appreciated. 10x, Eyal
plus1 1
d

dazzling-judge-80093

09/28/2022, 10:00 AM
Why don’t you just run
datahub docker quickstart
without the compose file?
b

better-orange-49102

09/28/2022, 10:05 AM
think an announcement should be made about the new Docker Compose vs the old docker-compose that Datahub have been using
c

crooked-holiday-47153

09/28/2022, 3:11 PM
@dazzling-judge-80093 - same behaviour also for
Copy code
datahub docker quickstart
@better-orange-49102 do you think they moved the repo to pull from and this is why the command fails to pull the images?
b

better-orange-49102

09/28/2022, 3:23 PM
i havent tested this myself to confirm, but i think Docker released a new version of compose (in aug 2022) and the quickstart code changed to use compose v2. If
docker-compose --version
is < v2 then you're using docker-compose. v2 compose uses
docker compose
without the
-
hence quickstart will fail under the hood for you. you probably need to change your version of compose. @incalculable-ocean-74010 pls correct me if im wrong , since your commit changed the behavior 😅
i

incalculable-ocean-74010

09/28/2022, 3:24 PM
Hey folks, @better-orange-49102 is 100% right. @crooked-holiday-47153 what is the output of running
docker compose version
?
This was an unfortunate breaking change we had to do to support ARM machines correctly running standalone consumers in DataHub cli.
c

crooked-holiday-47153

09/28/2022, 6:14 PM
Hi @incalculable-ocean-74010 and @better-orange-49102, docker-compose --version output is:
Copy code
docker-compose version 1.29.2, build unknown
What should I do to resolve?
i

incalculable-ocean-74010

09/28/2022, 6:14 PM
Not
docker-compose
,
docker compose
, note the whitespace between the commands
c

crooked-holiday-47153

09/28/2022, 6:15 PM
output for docker compose is:
Copy code
docker: 'compose' is not a docker command.
See 'docker --help'
i

incalculable-ocean-74010

09/28/2022, 6:17 PM
That is the issue. You have an old version of docker which does not include the compose sub-command
c

crooked-holiday-47153

09/28/2022, 6:23 PM
I am running on amazon linux any quick guide to upgrade the docker and to what version?
@blue-boots-43993 helped with the upgrade instructions and it seems problem solved - waiting for the upgrade of datahub to finish
b

blue-boots-43993

09/28/2022, 6:35 PM
@thankful-vr-12699 run this and you should be okay
Copy code
sudo mkdir -p /usr/local/lib/docker/cli-plugins
sudo curl -SL <https://github.com/docker/compose/releases/download/v2.11.2/docker-compose-linux-x86_64> -o  /usr/local/lib/docker/cli-plugins/docker-compose
c

crooked-holiday-47153

09/28/2022, 7:06 PM
@incalculable-ocean-74010 @better-orange-49102 now I am facing a new issue that my snowflake-usage ingestion fails on the following:
Copy code
File "/tmp/datahub/ingest/venv-snowflake-usage-0.8.45/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 140, in '
           'get\n'
           '    raise KeyError(f"Did not find a registered class for {key}")\n'
           "KeyError: 'Did not find a registered class for snowflake-usage'\n"
           '\n'
           'The above exception was the direct cause of the following exception:\n'
any ideas?
b

better-orange-49102

09/29/2022, 5:50 AM
i assume this is CLI ingestion. what does
datahub check plugins --verbose
say
t

thankful-vr-12699

09/29/2022, 7:58 AM
Hi @blue-boots-43993, The upgrade command is not working:
Copy code
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
Warning: Failed to create the file
Warning: /usr/local/lib/docker/cli-plugins/docker-compose: No such file or
Warning: directory
  0 42.4M    0   866    0     0   2231      0  5:32:36 --:--:--  5:32:36  2231
curl: (23) Failed writing body (0 != 866)
b

blue-boots-43993

09/29/2022, 8:29 AM
@thankful-vr-12699 do you get this error on first or second command? Make sure you run
sudo mkdir -p /usr/local/lib/docker/cli-plugins
prior to curl call
c

crooked-holiday-47153

09/29/2022, 8:31 AM
@better-orange-49102 to your question this is the outputSources athena (disabled) ModuleNotFoundError("No module named 'pyathena'") azure-ad AzureADSource bigquery (disabled) ModuleNotFoundError("No module named 'sqlalchemy_bigquery'") bigquery-beta (disabled) ModuleNotFoundError("No module named 'google'") bigquery-usage (disabled) ModuleNotFoundError("No module named 'cachetools'") clickhouse (disabled) ModuleNotFoundError("No module named 'clickhouse_driver'") clickhouse-usage (disabled) ModuleNotFoundError("No module named 'sqlalchemy'") csv-enricher CSVEnricherSource datahub-business-glossary BusinessGlossaryFileSource datahub-lineage-file LineageFileSource dbt (disabled) ModuleNotFoundError("No module named 'boto3'") delta-lake (disabled) ModuleNotFoundError("No module named 'deltalake'") druid (disabled) ModuleNotFoundError("No module named 'pydruid'") elasticsearch (disabled) ModuleNotFoundError("No module named 'elasticsearch'") feast (disabled) ModuleNotFoundError("No module named 'feast'") feast-legacy FeastSource file GenericFileSource glue (disabled) ModuleNotFoundError("No module named 'botocore'") hana (disabled) ModuleNotFoundError("No module named 'sqlalchemy'") hive (disabled) ModuleNotFoundError("No module named 'pyhive'") iceberg (disabled) ModuleNotFoundError("No module named 'iceberg'") kafka (disabled) ModuleNotFoundError("No module named 'confluent_kafka'") kafka-connect (disabled) ModuleNotFoundError("No module named 'jpype'") ldap (disabled) ModuleNotFoundError("No module named 'ldap'") looker (disabled) ModuleNotFoundError("No module named 'looker_sdk'") lookml (disabled) ModuleNotFoundError("No module named 'looker_sdk'") mariadb (disabled) ModuleNotFoundError("No module named 'pymysql'") metabase (disabled) ModuleNotFoundError("No module named 'sqllineage'") mode (disabled) ModuleNotFoundError("No module named 'tenacity'") mongodb (disabled) ModuleNotFoundError("No module named 'bson'") mssql (disabled) ModuleNotFoundError("No module named 'sqlalchemy_pytds'") mysql (disabled) ModuleNotFoundError("No module named 'pymysql'") nifi NifiSource okta (disabled) ModuleNotFoundError("No module named 'okta'") openapi OpenApiSource oracle (disabled) ModuleNotFoundError("No module named 'cx_Oracle'") postgres (disabled) ModuleNotFoundError("No module named 'psycopg2'") powerbi (disabled) ModuleNotFoundError("No module named 'msal'") presto-on-hive (disabled) ModuleNotFoundError("No module named 'pyhive'") pulsar PulsarSource redash (disabled) ModuleNotFoundError("No module named 'redash_toolbelt'") redshift (disabled) ModuleNotFoundError("No module named 'psycopg2'") redshift-usage (disabled) ModuleNotFoundError("No module named 'sqlalchemy'") s3 (disabled) ModuleNotFoundError("No module named 'pydeequ'") sagemaker (disabled) ModuleNotFoundError("No module named 'boto3'") salesforce (disabled) ModuleNotFoundError("No module named 'simple_salesforce'") snowflake (disabled) ModuleNotFoundError("No module named 'snowflake'") snowflake-legacy (disabled) ModuleNotFoundError("No module named 'snowflake'") snowflake-usage-legacy (disabled) ModuleNotFoundError("No module named 'more_itertools'") sqlalchemy (disabled) ModuleNotFoundError("No module named 'sqlalchemy'") starburst-trino-usage (disabled) ModuleNotFoundError("No module named 'sqlalchemy'") superset (disabled) ModuleNotFoundError("No module named 'sqlalchemy'") tableau (disabled) ModuleNotFoundError("No module named 'tableauserverclient'") trino (disabled) ModuleNotFoundError("No module named 'sqlalchemy'") vertica (disabled) ModuleNotFoundError("No module named 'sqlalchemy'") Sinks: console ConsoleSink datahub-kafka (disabled) ModuleNotFoundError("No module named 'confluent_kafka'") datahub-rest DatahubRestSink file FileSink Transformers: add_dataset_domain AddDatasetDomain add_dataset_ownership AddDatasetOwnership add_dataset_properties AddDatasetProperties add_dataset_tags AddDatasetTags add_dataset_terms AddDatasetTerms mark_dataset_status MarkDatasetStatus pattern_add_dataset_domain PatternAddDatasetDomain pattern_add_dataset_ownership PatternAddDatasetOwnership pattern_add_dataset_schema_tags PatternAddDatasetSchemaTags pattern_add_dataset_schema_terms PatternAddDatasetSchemaTerms pattern_add_dataset_tags PatternAddDatasetTags pattern_add_dataset_terms PatternAddDatasetTerms set_dataset_browse_path AddDatasetBrowsePathTransformer simple_add_dataset_domain SimpleAddDatasetDomain simple_add_dataset_ownership SimpleAddDatasetOwnership simple_add_dataset_properties SimpleAddDatasetProperties simple_add_dataset_tags SimpleAddDatasetTags simple_add_dataset_terms SimpleAddDatasetTerms simple_remove_dataset_ownership SimpleRemoveDatasetOwnership If a plugin is disabled, try running: pip install 'acryl-datahub[<plugin>]' ``````
t

thankful-vr-12699

09/29/2022, 8:39 AM
@blue-boots-43993, Yes I run
sudo mkdir -p /usr/local/lib/docker/cli-plugins
first with no error. I have this error with the second command
b

better-orange-49102

09/29/2022, 8:45 AM
datahub --version
?@crooked-holiday-47153
c

crooked-holiday-47153

09/29/2022, 8:46 AM
@better-orange-49102
Copy code
acryl-datahub, version 0.8.45
b

better-orange-49102

09/29/2022, 8:53 AM
@dazzling-judge-80093 can you advise on why snowflake-usage is not present in 0.8.45? (I'm out of ideas)
d

dazzling-judge-80093

09/29/2022, 8:55 AM
@better-orange-49102 the new Snowflake source does the usage extraction as well, so ther I no need to a separate recipe for usage
c

crooked-holiday-47153

09/29/2022, 9:21 AM
@dazzling-judge-80093 So leaving the regular snowflake source will bring me also usage as is or I need to update the recipe?
d

dazzling-judge-80093

09/29/2022, 9:24 AM
yes, you can enable or disable usage in the recipe with
include_usage_stats
config property. We are going to merge usage in other sources as well as it caused a lot of confusion earlier.
b

blue-boots-43993

09/29/2022, 10:40 AM
we managed to solve @thankful-vr-12699’s problem....issue was resolved with:
Copy code
1. sudo rm -rf /usr/local/lib/docker/cli-plugins/ 
2. sudo mkdir -p /usr/local/lib/docker/cli-plugins
3. sudo curl -SL <https://github.com/docker/compose/releases/download/v2.11.2/docker-compose-linux-x86_64> -o  /usr/local/lib/docker/cli-plugins/docker-compose
4. export PATH=/usr/local/lib/docker/cli-plugins:$PATH
5. sudo chmod +x /home/moustlant/.docker/cli-plugins/docker-compose
note that this is on Windows using wsl with Ubuntu
key problem was making sure that new docker compose is used (there was other docker compose found with running
which docker-compose
) and that there are execute permissions on the newly installed/downloaded binary
t

thankful-vr-12699

09/29/2022, 11:56 AM
Thank you @blue-boots-43993 for your help!
36 Views