Hello, I am trying to configuaration lineage data....
# ingestion
s
Hello, I am trying to configuaration lineage data.excutor this command " airflow connections add --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host 'http://localhost:8080'" but receiving the following error. [2021-09-01 153229,340] {cli_action_loggers.py:105} WARNING - Failed to log action with (sqlite3.OperationalError) no such table: log [SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (?, ?, ?, ?, ?, ?, ?)] [parameters: ('2021-09-01 073229.337103', None, None, 'cli_connections_add', None, 'hadoop', '{"host_name": "localhost", "full_command": "[\'/home/hadoop/.local/bin/airflow\', \'connections\', \'add\', \'--conn-type\', \'datahub_rest\', \'datahub_rest_default\', \'--conn-host\', \'http://localhost:8080\']"}')] (Background on this error at: http://sqlalche.me/e/13/e3q8) Traceback (most recent call last): File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context cursor, statement, parameters, context File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute cursor.execute(statement, parameters) sqlite3.OperationalError: no such table: connection The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/hadoop/.local/bin/airflow", line 8, in <module> sys.exit(main()) File "/home/hadoop/.local/lib/python3.6/site-packages/airflow/__main__.py", line 40, in main args.func(args) File "/home/hadoop/.local/lib/python3.6/site-packages/airflow/cli/cli_parser.py", line 48, in command return func(*args, **kwargs) File "/home/hadoop/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 91, in wrapper return f(*args, **kwargs) File "/home/hadoop/.local/lib/python3.6/site-packages/airflow/cli/commands/connection_command.py", line 196, in connections_add if not session.query(Connection).filter(Connection.conn_id == new_conn.conn_id).first(): File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3429, in first ret = list(self[0:1]) File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3203, in getitem return list(res) File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3535, in iter return self._execute_and_instances(context) File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1011, in execute return meth(self, multiparams, params) File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement distilled_params, File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context e, statement, parameters, cursor, context File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception sqlalchemy_exception, with_traceback=exc_info[2], from_=e File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 182, in raise_ raise exception File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context cursor, statement, parameters, context File "/home/hadoop/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: connection [SQL: SELECT connection.password AS connection_password, connection.extra AS connection_extra, connection.id AS connection_id, connection.conn_id AS connection_conn_id, connection.conn_type AS connection_conn_type, connection.description AS connection_description, connection.host AS connection_host, connection.schema AS connection_schema, connection.login AS connection_login, connection.port AS connection_port, connection.is_encrypted AS connection_is_encrypted, connection.is_extra_encrypted AS connection_is_extra_encrypted FROM connection WHERE connection.conn_id = ? LIMIT ? OFFSET ?] [parameters: ('datahub_rest_default', 1, 0)] (Background on this error at: http://sqlalche.me/e/13/e3q8)
m
Hi @silly-dress-39732 it seems like your airflow system isn't very happy with that command. Which version are you running and are you running it locally or on some centrally deployed system.
@silly-dress-39732 can you follow the instructions here? https://datahubproject.io/docs/docker/airflow/local_airflow
And let us know if that works for you
s
@mammoth-bear-12532 I use this command to isntall airflow " pip install acryl-datahub[airflow]"
m
@silly-dress-39732 to install airflow you would need to follow the instructions I pasted above.
Would that not be possible?
s
@mammoth-bear-12532 I use this document to install airflow https://datahubproject.io/docs/docker/airflow/local_airflow ,have some problem,Shoudle I install python3.8,I install python 3.6.5.I guess ,is right?
@mammoth-bear-12532 commend:docker-compose up following error: airflow_install_airflow-worker_1 exited with code 1 airflow-scheduler_1 | .................... airflow-scheduler_1 | ERROR! Maximum number of retries (20) reached. airflow-scheduler_1 | airflow-scheduler_1 | Last check result: airflow-scheduler_1 | $ airflow db check airflow-scheduler_1 | Unable to load the config, contains a configuration error. airflow-scheduler_1 | Traceback (most recent call last): airflow-scheduler_1 | File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir airflow-scheduler_1 | self._accessor.mkdir(self, mode) airflow-scheduler_1 | FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/logs/scheduler/2021-09-02' airflow-scheduler_1 | airflow-scheduler_1 | During handling of the above exception, another exception occurred: airflow-scheduler_1 | airflow-scheduler_1 | Traceback (most recent call last): airflow-scheduler_1 | File "/usr/local/lib/python3.8/logging/config.py", line 563, in configure airflow-scheduler_1 | handler = self.configure_handler(handlers[name]) airflow-scheduler_1 | File "/usr/local/lib/python3.8/logging/config.py", line 744, in configure_handler airflow-scheduler_1 | result = factory(**kwargs) airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/log/file_processor_handler.py", line 46, in init airflow-scheduler_1 | Path(self._get_log_directory()).mkdir(parents=True, exist_ok=True) airflow-scheduler_1 | File "/usr/local/lib/python3.8/pathlib.py", line 1292, in mkdir airflow-scheduler_1 | self.parent.mkdir(parents=True, exist_ok=True) airflow-scheduler_1 | File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir airflow-scheduler_1 | self._accessor.mkdir(self, mode) airflow-scheduler_1 | PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler' airflow-scheduler_1 | airflow-scheduler_1 | The above exception was the direct cause of the following exception: airflow-scheduler_1 | airflow-scheduler_1 | Traceback (most recent call last): airflow-scheduler_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module> airflow-scheduler_1 | from airflow.main import main airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.8/site-packages/airflow/__init__.py", line 46, in <module> airflow-scheduler_1 | settings.initialize() airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.8/site-packages/airflow/settings.py", line 444, in initialize airflow-scheduler_1 | LOGGING_CLASS_PATH = configure_logging() airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.8/site-packages/airflow/logging_config.py", line 73, in configure_logging airflow-scheduler_1 | raise e airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.8/site-packages/airflow/logging_config.py", line 68, in configure_logging airflow-scheduler_1 | dictConfig(logging_config) airflow-scheduler_1 | File "/usr/local/lib/python3.8/logging/config.py", line 808, in dictConfig airflow-scheduler_1 | dictConfigClass(config).configure() airflow-scheduler_1 | File "/usr/local/lib/python3.8/logging/config.py", line 570, in configure airflow-scheduler_1 | raise ValueError('Unable to configure handler ' airflow-scheduler_1 | ValueError: Unable to configure handler 'processor' airflow-scheduler_1 | airflow_install_airflow-scheduler_1 exited with code 1
m
@silly-dress-39732: are you on Linux?
s
@mammoth-bear-12532 yes
m
quick hack would be:
Copy code
chmod -R 777 dags/
chmod -R 777 logs/
the more correct answer seems to be:
Copy code
mkdir ./dags ./logs ./plugins
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
followed by: Once you have matched file permissions:
Copy code
docker-compose up airflow-init
docker-compose up
let me know if that fixes the issue @silly-dress-39732
s
@mammoth-bear-12532 docker-compose up airflow-init.     is ok
Copy code
docker-compose up hava some problem like this:
 Traceback (most recent call last):
airflow-scheduler_1  |   File "/usr/local/lib/python3.8/logging/config.py", line 563, in configure
airflow-scheduler_1  |     handler = self.configure_handler(handlers[name])
airflow-scheduler_1  |   File "/usr/local/lib/python3.8/logging/config.py", line 744, in configure_handler
airflow-scheduler_1  |     result = factory(**kwargs)
airflow-scheduler_1  |   File "/usr/local/lib/python3.8/logging/handlers.py", line 148, in __init__
airflow-scheduler_1  |     BaseRotatingHandler.__init__(self, filename, mode, encoding, delay)
airflow-scheduler_1  |   File "/usr/local/lib/python3.8/logging/handlers.py", line 55, in __init__
airflow-scheduler_1  |     logging.FileHandler.__init__(self, filename, mode, encoding, delay)
airflow-scheduler_1  |   File "/usr/local/lib/python3.8/logging/__init__.py", line 1147, in __init__
airflow-scheduler_1  |     StreamHandler.__init__(self, self._open())
airflow-scheduler_1  |   File "/usr/local/lib/python3.8/logging/__init__.py", line 1176, in _open
airflow-scheduler_1  |     return open(self.baseFilename, self.mode, encoding=self.encoding)
airflow-scheduler_1  | PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/dag_processor_manager/dag_processor_manager.log'
airflow-scheduler_1  |
airflow-scheduler_1  | The above exception was the direct cause of the following exception:
@mammoth-bear-12532 sudo chmod -R 777 /opt/airflow/logs/dag_processor_manager/dag_processor_manager.log
m
looks like the logs directory is still not happy
you might want to delete the docker images first before restarting them
docker compose rm
in the
airflow_install
directory
s
excute docker-compose rm is no effect。also this errorPermissionError [Errno 13] Permission denied: '/opt/airflow/logs/dag_processor_manager/dag_processor_manager.log' @mammoth-bear-12532
m
what does
ls -lR ./logs
give you
s
@mammoth-bear-12532 like this: [hadoop@172 dag_processor_manager]$ ls -R ./dag_processor_manager.log ./dag_processor_manager.log
m
are you inside the container?
can you do this
ls -lR ./logs
from outside the container?
from the host machine
s
@mammoth-bear-12532 /opt/airflow/logs/dag_processor_manager/dag_processor_manager.log. this directory is my linux local directory not in docker container
m
hmm... if you followed the instructions I provided, you should have installed airflow inside docker
not on your linux box
Maybe this is from a previous install?