limited-agent-54038
04/29/2022, 3:10 AM'[2022-04-29 02:44:40,288] ERROR {logger:26} - Please set env variable SPARK_VERSION\n'
I am just having trouble figuring out where this env variable is or how to change it. Thanksechoing-airport-49548
04/29/2022, 3:36 PMechoing-airport-49548
04/29/2022, 3:36 PMlimited-agent-54038
04/29/2022, 4:11 PMechoing-airport-49548
04/29/2022, 4:16 PMdatahub check plugins
and telling me what the output of that is?limited-agent-54038
04/29/2022, 4:18 PMSources:
[2022-04-29 09:17:51,592] ERROR {logger:26} - Please set env variable SPARK_VERSION
athena (disabled)
azure-ad
bigquery (disabled)
bigquery-usage (disabled)
clickhouse (disabled)
clickhouse-usage (disabled)
data-lake
datahub-business-glossary
datahub-lineage-file
dbt
druid (disabled)
elasticsearch (disabled)
feast
file
glue
hive (disabled)
kafka (disabled)
kafka-connect (disabled)
ldap (disabled)
looker (disabled)
lookml (disabled)
mariadb (disabled)
metabase (disabled)
mode (disabled)
mongodb (disabled)
mssql (disabled)
mysql (disabled)
nifi (disabled)
okta (disabled)
openapi
oracle (disabled)
postgres (disabled)
powerbi (disabled)
redash (disabled)
redshift (disabled)
redshift-usage (disabled)
sagemaker
snowflake (disabled)
snowflake-usage (disabled)
sqlalchemy (disabled)
starburst-trino-usage (disabled)
superset (disabled)
tableau (disabled)
trino (disabled)
Sinks:
console
datahub-kafka (disabled)
datahub-rest
file
Transformers:
add_dataset_ownership
add_dataset_properties
add_dataset_tags
add_dataset_terms
mark_dataset_status
pattern_add_dataset_ownership
pattern_add_dataset_tags
pattern_add_dataset_terms
set_dataset_browse_path
simple_add_dataset_ownership
simple_add_dataset_properties
simple_add_dataset_tags
simple_add_dataset_terms
simple_remove_dataset_ownership
echoing-airport-49548
04/29/2022, 4:21 PMdata-lake
connector right? I think that error is likely a red herringechoing-airport-49548
04/29/2022, 4:22 PMlimited-agent-54038
04/29/2022, 4:24 PMechoing-airport-49548
04/29/2022, 4:28 PMlimited-agent-54038
04/29/2022, 5:10 PM'[2022-04-29 17:10:00,004] ERROR {logger:26} - Please set env variable SPARK_VERSION\n'
limited-agent-54038
04/29/2022, 5:11 PMlimited-agent-54038
04/29/2022, 5:12 PMechoing-airport-49548
04/29/2022, 5:19 PMechoing-airport-49548
04/29/2022, 5:19 PMechoing-airport-49548
04/29/2022, 5:19 PMlimited-agent-54038
04/29/2022, 5:22 PMechoing-airport-49548
04/29/2022, 5:23 PMechoing-airport-49548
04/29/2022, 5:23 PMlimited-agent-54038
04/29/2022, 5:43 PMlimited-agent-54038
04/29/2022, 6:51 PMechoing-airport-49548
04/29/2022, 6:52 PMlimited-agent-54038
04/29/2022, 6:53 PMechoing-airport-49548
04/29/2022, 6:53 PMlimited-agent-54038
04/29/2022, 6:53 PMlimited-agent-54038
04/29/2022, 6:53 PMlimited-agent-54038
04/29/2022, 7:46 PMERROR {datahub.entrypoints:152} - File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/entrypoints.py", line 138, in main
and there is no clear instructions or guide on how to fix thisechoing-airport-49548
04/29/2022, 7:47 PMlimited-agent-54038
04/29/2022, 7:50 PMlimited-agent-54038
04/29/2022, 7:51 PMsource:
type: "s3"
config:
platform: s3
path_spec:
include: 's3://***********/SDATA-1511/test_scenario_03/*.*'
aws_config:
aws_access_key_id: **********
aws_secret_access_key: *******
aws_region: us-west-2
sink:
type: datahub-rest
config:
server: '<http://localhost:9002/api/gms>'
# see <https://datahubproject.io/docs/metadata-ingestion/sink_docs/file> for complete documentation
limited-agent-54038
04/29/2022, 7:52 PMechoing-airport-49548
04/29/2022, 7:54 PMechoing-airport-49548
04/29/2022, 7:54 PMlimited-agent-54038
04/29/2022, 7:54 PMechoing-airport-49548
04/29/2022, 7:55 PMlimited-agent-54038
04/29/2022, 7:55 PMechoing-airport-49548
04/29/2022, 7:55 PMsink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
echoing-airport-49548
04/29/2022, 7:55 PMlimited-agent-54038
04/29/2022, 7:56 PMechoing-airport-49548
04/29/2022, 7:58 PMlimited-agent-54038
04/29/2022, 8:00 PMlimited-agent-54038
04/29/2022, 8:02 PMechoing-airport-49548
04/29/2022, 8:10 PMlimited-agent-54038
04/29/2022, 8:13 PMlimited-agent-54038
04/29/2022, 8:13 PMechoing-airport-49548
04/29/2022, 8:14 PMechoing-airport-49548
04/29/2022, 8:14 PMlimited-agent-54038
04/29/2022, 8:14 PMlimited-agent-54038
04/29/2022, 8:22 PMlimited-agent-54038
04/29/2022, 8:26 PMechoing-airport-49548
04/29/2022, 8:52 PMv0.8.33
?limited-agent-54038
04/30/2022, 12:07 AM'[2022-04-29 23:53:10,704] INFO {datahub.entrypoints:161} - DataHub CLI version: 0.8.32.1 at '
'/tmp/datahub/ingest/venv-1d251598-bc7e-4509-9629-67c3faae0601/lib/python3.9/site-packages/datahub/__init__.py\n'
'[2022-04-29 23:53:10,705] INFO {datahub.entrypoints:164} - Python version: 3.9.9 (main, Dec 21 2021, 10:03:34) \n'
'[GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-1d251598-bc7e-4509-9629-67c3faae0601/bin/python3 on '
'Linux-5.10.76-linuxkit-x86_64-with-glibc2.31\n'
"[2022-04-29 23:53:10,705] INFO {datahub.entrypoints:167} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': "
"'v0.8.33', 'commit': 'c34a1ba73520a9f646b21540b046d1a38441b2a2'}}, 'managedIngestion': {'defaultCliVersion': '0.8.32.1', 'enabled': "
"True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, "
"'datasetUrnNameCasing': False, 'retention': 'true', 'noCode': 'true'}\n",
"2022-04-29 23:53:11.390812 [exec_id=1d251598-bc7e-4509-9629-67c3faae0601] INFO: Failed to execute 'datahub ingest'",
'2022-04-29 23:53:11.392091 [exec_id=1d251598-bc7e-4509-9629-67c3faae0601] INFO: Caught exception EXECUTING '
'task_id=1d251598-bc7e-4509-9629-67c3faae0601, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
limited-agent-54038
04/30/2022, 12:12 AMechoing-airport-49548
05/01/2022, 9:41 PMsink:
type: datahub-rest
config:
server: '<http://datahub-gms:8080>'
echoing-airport-49548
05/01/2022, 9:42 PMlocalhost:8080
but just want to eliminate any variables here!limited-agent-54038
05/04/2022, 4:01 AMlimited-agent-54038
05/04/2022, 4:01 AM'---- (full traceback above) ----\n'
'File "/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 95, in '
'run\n'
' pipeline = Pipeline.create(pipeline_config, dry_run, preview, preview_workunits)\n'
'File "/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line '
'184, in create\n'
' return cls(\n'
'File "/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line '
'132, in __init__\n'
' self.source: Source = source_class.create(\n'
'File '
'"/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/datahub/ingestion/source/data_lake/__init__.py", '
'line 252, in create\n'
' return cls(config, ctx)\n'
'File '
'"/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/datahub/ingestion/source/data_lake/__init__.py", '
'line 176, in __init__\n'
' self.init_spark()\n'
'File '
'"/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/datahub/ingestion/source/data_lake/__init__.py", '
'line 246, in init_spark\n'
' self.spark = SparkSession.builder.config(conf=conf).getOrCreate()\n'
'File "/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/pyspark/sql/session.py", line 186, in '
'getOrCreate\n'
' sc = SparkContext.getOrCreate(sparkConf)\n'
'File "/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/pyspark/context.py", line 378, in '
'getOrCreate\n'
' SparkContext(conf=conf or SparkConf())\n'
'File "/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/pyspark/context.py", line 133, in '
'__init__\n'
' SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)\n'
'File "/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/pyspark/context.py", line 327, in '
'_ensure_initialized\n'
' SparkContext._gateway = gateway or launch_gateway(conf)\n'
'File "/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/pyspark/java_gateway.py", line 105, in '
'launch_gateway\n'
' raise Exception("Java gateway process exited before sending its port number")\n'
'\n'
'Exception: Java gateway process exited before sending its port number\n'
'[2022-05-04 03:37:52,868] INFO {datahub.entrypoints:161} - DataHub CLI version: 0.8.32.1 at '
'/tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/lib/python3.9/site-packages/datahub/__init__.py\n'
'[2022-05-04 03:37:52,868] INFO {datahub.entrypoints:164} - Python version: 3.9.9 (main, Dec 21 2021, 10:03:34) \n'
'[GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-f5791ce2-9915-424a-8798-3860928ab87a/bin/python3 on '
'Linux-5.10.76-linuxkit-x86_64-with-glibc2.31\n'
"[2022-05-04 03:37:52,868] INFO {datahub.entrypoints:167} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': "
"'v0.8.33', 'commit': 'c34a1ba73520a9f646b21540b046d1a38441b2a2'}}, 'managedIngestion': {'defaultCliVersion': '0.8.32.1', 'enabled': "
"True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, "
"'datasetUrnNameCasing': False, 'retention': 'true', 'noCode': 'true'}\n",
"2022-05-04 03:37:53.532891 [exec_id=f5791ce2-9915-424a-8798-3860928ab87a] INFO: Failed to execute 'datahub ingest'",
'2022-05-04 03:37:53.536504 [exec_id=f5791ce2-9915-424a-8798-3860928ab87a] INFO: Caught exception EXECUTING '
'task_id=f5791ce2-9915-424a-8798-3860928ab87a, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
gray-cpu-75769
08/25/2022, 1:20 PMlimited-agent-54038
08/25/2022, 5:05 PMgray-cpu-75769
08/26/2022, 4:37 AMlimited-agent-54038
09/01/2022, 11:00 PM