microscopic-mechanic-13766
10/28/2022, 8:52 AMreadHiveTable(<databaseName>, <tableName>)
Thanks in advance !!high-hospital-85984
10/28/2022, 9:51 AMdocker run -it --user root -v /my/path/recipes:/temp linkedin/datahub-ingestion:v0.8.41 ingest run --dry-run -c /temp/kafka-connect-to-datahub-kafka.yml
[2022-10-28 09:47:42,927] INFO {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.41+docker
[2022-10-28 09:47:42,952] INFO {datahub.ingestion.run.pipeline:160} - Sink configured successfully.
[2022-10-28 09:47:44,455] INFO {datahub.ingestion.source.kafka_connect:866} - Connection to <address> is ok
[2022-10-28 09:47:44,456] ERROR {datahub.ingestion.run.pipeline:126} - No JVM shared library file (libjvm.so) found. Try setting up the JAVA_HOME environment variable properly.
[2022-10-28 09:47:44,456] INFO {datahub.cli.ingest_cli:115} - Starting metadata ingestion
[2022-10-28 09:47:44,456] INFO {datahub.cli.ingest_cli:133} - Finished metadata pipeline
Failed to configure source (kafka-connect) due to No JVM shared library file (libjvm.so) found. Try setting up the JAVA_HOME environment variable properly.
No ~/.datahubenv file found, generating one for you...
Possibly related issue: https://github.com/datahub-project/datahub/issues/4741prehistoric-fireman-61692
10/28/2022, 10:35 AMjolly-football-89638
10/28/2022, 12:05 PMfull-chef-85630
10/28/2022, 1:23 PM"""dag name: social-insights"""
from datetime import timedelta, datetime
from airflow import DAG
from airflow.utils.dates import days_ago
from datahub.configuration.config_loader import load_config_file
from datahub.ingestion.run.pipeline import Pipeline
try:
from airflow.operators.python import PythonOperator
except ModuleNotFoundError:
from airflow.operators.python_operator import PythonOperator
default_args = {
"owner": "airflow",
"depends_on_past": False,
"email": "<http://xxxx.com|xxxx.com>",
"email_on_failure": False,
"email_on_retry": False,
"retries": 1,
"retry_delay": timedelta(minutes=5),
"execution_timeout": timedelta(minutes=120),
}
def template():
"""Run ingestion job: social-insights"""
config = load_config_file("/social_insights.yaml")
pipeline = Pipeline.create(config)
pipeline.run()
pipeline.raise_from_status()
with DAG(
dag_id="social-insights",
schedule_interval=timedelta(hours=1),
start_date=days_ago(2),
) as dag:
PythonOperator(
task_id="social-insights",
python_callable=template,
)
airflow error info
. It will be skipped from lineage. The error was daemonic processes are not allowed to have children.
Airflow of a single node is running normallylittle-spring-72943
10/28/2022, 4:02 PMcrooked-holiday-47153
10/31/2022, 8:27 AMsilly-intern-25190
10/31/2022, 8:35 AMgifted-knife-16120
10/31/2022, 10:35 AM['Platform was not found in DataHub. Using postgres name as is'],\n"
" 'metabase-dbname-2': ['Cannot determine database name for platform: postgres'],\n"
" 'metabase-platform-3': ['Platform was not found in DataHub. Using postgres name as is'],\n"
" 'metabase-dbname-3': ['Cannot determine database name for platform: postgres'],\n"
" 'metabase-platform-1': ['Platform was not found in DataHub. Using h2 name as is'],\n"
" 'metabase-dbname-1': ['Cannot determine database name for platform: h2']},\n"
anyone can help?careful-action-61962
10/31/2022, 10:58 AM~~~~ Execution Summary ~~~~
RUN_INGEST - {'errors': [],
'exec_id': '2aa01dec-ff3a-4093-af01-9538d1fae92c',
'infos': ['2022-10-30 18:30:00.193001 [exec_id=2aa01dec-ff3a-4093-af01-9538d1fae92c] INFO: Starting execution for task with name=RUN_INGEST',
'2022-10-30 18:30:00.193303 [exec_id=2aa01dec-ff3a-4093-af01-9538d1fae92c] INFO: Caught exception EXECUTING '
'task_id=2aa01dec-ff3a-4093-af01-9538d1fae92c, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 113, in execute_task\n'
' task_event_loop = asyncio.new_event_loop()\n'
' File "/usr/local/lib/python3.10/asyncio/events.py", line 782, in new_event_loop\n'
' return get_event_loop_policy().new_event_loop()\n'
' File "/usr/local/lib/python3.10/asyncio/events.py", line 673, in new_event_loop\n'
' return self._loop_factory()\n'
' File "/usr/local/lib/python3.10/asyncio/unix_events.py", line 64, in __init__\n'
' super().__init__(selector)\n'
' File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 53, in __init__\n'
' selector = selectors.DefaultSelector()\n'
' File "/usr/local/lib/python3.10/selectors.py", line 350, in __init__\n'
' self._selector = self._selector_cls()\n'
'OSError: [Errno 24] Too many open files\n']}
Execution finished with errors.
alert-fall-82501
10/31/2022, 1:05 PMalert-fall-82501
10/31/2022, 1:06 PM%4|1667221196.244|FAIL|rdkafka#consumer-1| [thrd:<http://datahub-sbx2-frontend.amer-dev.XXXX.com:9092/bootstra|datahub-sbx2-frontend.amer-dev.XXXX.com:9092/bootstra>]: <http://datahub-sbx2-frontend.amer-dev.XXXX.com:9092/bootstrap|datahub-sbx2-frontend.amer-dev.XXXX.com:9092/bootstrap>: Connection setup timed out in state CONNECT (after 30090ms in state CONNECT)
alert-fall-82501
10/31/2022, 1:06 PMbland-nail-65199
10/31/2022, 2:10 PMastonishing-pager-27015
10/31/2022, 3:37 PMprofiling:
enabled: true
profile_table_level_only: true
but setting up ingestion in the UI with Enable Profiling
checked, I only see this in the YAML view:
profiling:
enabled: true
When I run it, no profiling seems to occur, though everything else works fine.
edit: I needed to change the profiling settings that were limited based on table change recency, row count, and size.purple-sugar-36357
10/31/2022, 4:55 PMearly-hydrogen-27542
10/31/2022, 5:51 PMlimited-forest-73733
10/31/2022, 7:33 PMwitty-microphone-40893
10/31/2022, 10:51 PMdamp-ambulance-34232
11/01/2022, 4:19 AMalert-fall-82501
11/01/2022, 10:28 AMbitter-elephant-29459
11/01/2022, 2:27 PMsilly-finland-62382
11/01/2022, 6:36 PMeager-lifeguard-22029
11/01/2022, 7:36 PMwitty-television-74309
11/01/2022, 8:14 PMmammoth-fountain-69052
11/02/2022, 4:55 AMlimited-forest-73733
11/02/2022, 9:34 AMlimited-forest-73733
11/02/2022, 10:57 AMflaky-soccer-57765
11/02/2022, 12:38 PMthankful-ram-70854
11/02/2022, 4:08 PM