microscopic-mechanic-13766
05/04/2022, 10:03 AMdatahub timeline --urn "urn:li:dataset:(urn:li:dataPlatform:hive,nasalogs.linaje,PROD)" --category documentation --start 10daysago
Using DH v0.8.33 and CLI v0.8.32.1icy-portugal-26250
05/04/2022, 10:40 AMsh docker/quiickstart.sh
, but I couldn't log in in the front end with the standard 'datahub' user (invalid credentials). When checking the docker ps
status, I notice that the linkedin/datahub-gms
container was unhealthy I sshed into the container to run manually the entrypoint command and get this error message:
/datahub/datahub-gms/scripts $ ./start.sh
+ grep -q ://
+ echo
+ NEO4J_HOST=http://
+ [[ ! -z '' ]]
+ [[ -z '' ]]
+ ELASTICSEARCH_AUTH_HEADER='Accept: */*'
+ [[ '' == true ]]
+ ELASTICSEARCH_PROTOCOL=http
+ WAIT_FOR_EBEAN=
+ [[ '' != true ]]
+ [[ '' == ebean ]]
+ [[ -z '' ]]
+ WAIT_FOR_EBEAN=' -wait <tcp://mysql:3306> '
+ WAIT_FOR_CASSANDRA=
+ [[ '' == cassandra ]]
+ WAIT_FOR_KAFKA=
+ [[ '' != true ]]
++ echo broker:29092
++ sed 's/,/ -wait tcp:\/\//g'
+ WAIT_FOR_KAFKA=' -wait <tcp://broker:29092> '
+ WAIT_FOR_NEO4J=
+ [[ elasticsearch != elasticsearch ]]
+ OTEL_AGENT=
+ [[ '' == true ]]
+ PROMETHEUS_AGENT=
+ [[ '' == true ]]
+ COMMON='
-wait <tcp://mysql:3306> -wait <tcp://broker:29092> -timeout 240s java -Xms1g -Xmx1g -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war'
+ [[ '' != true ]]
+ exec dockerize -wait <http://elasticsearch:9200> -wait-http-header 'Accept: */*' -wait <tcp://mysql:3306> -wait <tcp://broker:29092> -timeout 240s java -Xms1g -Xmx1g -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war
2022/05/04 09:48:10 Waiting for: <http://elasticsearch:9200>
2022/05/04 09:48:10 Waiting for: <tcp://mysql:3306>
2022/05/04 09:48:10 Waiting for: <tcp://broker:29092>
2022/05/04 09:48:10 Connected to <tcp://mysql:3306>
2022/05/04 09:48:10 Connected to <tcp://broker:29092>
2022/05/04 09:48:10 Received 200 from <http://elasticsearch:9200>
2022-05-04 09:48:13.645:INFO::main: Logging initialized @2896ms to org.eclipse.jetty.util.log.StdErrLog
WARNING: jetty-runner is deprecated.
See Jetty Documentation for startup options
<https://www.eclipse.org/jetty/documentation/>
ERROR: No such jar file:///datahub/datahub-gms/scripts/jetty-util.jar
Usage: java [-Djetty.home=dir] -jar jetty-runner.jar [--help|--version] [ server opts] [[ context opts] context ...]
Server opts:
--version - display version and exit
--log file - request log filename (with optional 'yyyy_mm_dd' wildcard
--out file - info/warn/debug log filename (with optional 'yyyy_mm_dd' wildcard
--host name|ip - interface to listen on (default is all interfaces)
--port n - port to listen on (default 8080)
--stop-port n - port to listen for stop command (or -DSTOP.PORT=n)
--stop-key n - security string for stop command (required if --stop-port is present) (or -DSTOP.KEY=n)
[--jar file]*n - each tuple specifies an extra jar to be added to the classloader
[--lib dir]*n - each tuple specifies an extra directory of jars to be added to the classloader
[--classes dir]*n - each tuple specifies an extra directory of classes to be added to the classloader
--stats [unsecure|realm.properties] - enable stats gathering servlet context
[--config file]*n - each tuple specifies the name of a jetty xml config file to apply (in the order defined)
Context opts:
[[--path /path] context]*n - WAR file, web app dir or context xml file, optionally with a context path
2022/05/04 09:48:13 Command exited with error: exit status 1
Does anyone have any tips on how to troubleshoot that?
PS. while writing this message, I also tried through the datahub docker quickstart
(which most likely differs from my local version), but got issues with the datahub-gms too:
Unable to run quickstart - the following issues were detected:
- datahub-gms is still starting
I am attaching the latter command log.handsome-football-66174
05/04/2022, 9:24 PM- name: datahub_action
action:
module_name: datahub.integrations.great_expectations.action
class_name: DataHubValidationAction
server_url: <https://hostname>
Getting message when checkpoint runs
great_expectations checkpoint run postgres_checkpoint
Using v3 (Batch Request) API
Calculating Metrics: 0it [00:00, ?it/s]
WARNING: Enable parse_table_names_from_sql in DatahubValidationAction config to try to parse the tables being asserted from SQL query
Validation succeeded!
Suite Name Status Expectations met
- public.tablename.suite ✔ Passed 0 of 0 (100 %)
bland-orange-13353
05/05/2022, 12:17 PMkind-psychiatrist-76973
05/05/2022, 2:00 PMDEV
database on datahub but it is still showing:
pipeline_name: "snowflake_platform"
source:
type: snowflake
config:
# Coordinates
host_port: ${SNOWFLAKE_ACCOUNT}
warehouse: "AGGREGATION_COMPUTE"
# Credentials
username: ${SNOWFLAKE_USERNAME}
password: ${SNOWFLAKE_PASSWORD}
role: "ACCOUNTADMIN"
env: "PROD"
profiling:
enabled: False
database_pattern:
allow:
- "SENNDERDWH"
- "VISIBILITY"
- "CARRIER_STRATEGY_AND_PLANNING"
- "SHIPPER_STRATEGY_AND_PLANNING"
- "NETSUITE"
- "MARKETING"
- "GLOBAL_OPERATIONS"
- "CENTRAL_STRATEGY_AND_PLANNING"
- "FINANCE"
deny:
- "DEV"
- "ANALYST_DEV"
table_pattern:
ignoreCase: False
include_tables: True
include_views: True
include_table_lineage: False
stateful_ingestion:
enabled: True
remove_stale_metadata: True
sink:
type: "datahub-rest"
config:
server: ${DATAHUB_GMS_HOST}
Are we doing something wrong?lemon-terabyte-66903
05/05/2022, 2:08 PMsquare-solstice-69079
05/05/2022, 4:59 PMnumerous-eve-42142
05/05/2022, 6:49 PMPipeline finished with failures
'downstream_total_latency_in_seconds': 9783.199817}
'info': {'message': "('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))"}}],
failures': [{'error': 'Unable to emit metadata to DataHub GMS',
'warnings': [],
{'records_written': 106,
Sink (datahub-rest) report:
'query_exceptions': 3}}
'queries_combined': 54,
'combined_queries_issued': 54,
'uncombined_queries_issued': 57,
'query_combiner': {'total_queries': 108,
'soft_deleted_stale_entities': [],
...
2. I can't find a way to ingest exclusively 1 table's profiling. That's why:
Allowing like:
profile_pattern:
allow:
- "^db.schema.table\$"
Column level profiling is filtered
Allowing like:
profile_pattern:
allow:
- "db.schema.table"
Other tables with names like "table" are profiled and they're not in datahub.
Someone can save me?witty-laptop-49489
05/06/2022, 7:30 AMnumerous-camera-74294
05/06/2022, 10:23 AMicy-portugal-26250
05/06/2022, 12:29 PM> Could not resolve com.linkedin.pegasus:gradle-plugins:28.3.7.
> Could not get resource '<https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/gradle-plugins/28.3.7/gradle-plugins-28.3.7.pom>'.
> Could not GET '<https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/gradle-plugins/28.3.7/gradle-plugins-28.3.7.pom>'.
> Connect to <http://linkedin.jfrog.io:443|linkedin.jfrog.io:443> [<http://linkedin.jfrog.io/104.198.68.46|linkedin.jfrog.io/104.198.68.46>] failed: connect timed out
millions-notebook-72121
05/06/2022, 3:13 PMclean-coat-28016
05/07/2022, 5:29 AMmodern-zoo-97059
05/09/2022, 2:15 AMCaused by: org.mariadb.jdbc.internal.util.exceptions.MariaDbSqlException: Table 'datahub.metadata_aspect_v2' doesn't exist
breezy-portugal-43538
05/09/2022, 7:38 AMdatahub get --urn
command and retrieving urn name in general.
I was able to ingest data to datahub but after clicking the icon on a website to get urn name, nothing is copied and the button does not seem to take any effect. I had tested this on three different web browsers.
After trying to get the urn name by running the datahub get --urn
command I receive following error:
$ datahub get --urn "urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/some_data/results/data_info.csv,DEV)"
/home/mluser/.local/lib/python3.8/site-packages/cryptography/hazmat/backends/openssl/x509.py:14: CryptographyDeprecationWarning: This version of cryptography contains a temporary pyOpenSSL fallback path. Upgrade pyOpenSSL now.
warnings.warn(
[2022-05-09 10:24:11,560] ERROR {datahub.entrypoints:152} - File "/home/mluser/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 138, in main
135 def main(**kwargs):
136 # This wrapper prevents click from suppressing errors.
137 try:
--> 138 sys.exit(datahub(standalone_mode=False, **kwargs))
139 except click.exceptions.Abort:
..................................................
kwargs = {}
datahub = <Group datahub>
click.exceptions.Abort = <class 'click.exceptions.Abort'>
..................................................
File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
1135 def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:
(...)
--> 1137 return self.main(*args, **kwargs)
..................................................
self = <Group datahub>
args = ()
t.Any = typing.Any
kwargs = {'standalone_mode': False}
..................................................
File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/home/mluser/.local/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/mluser/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 304, in wrapper
247 def wrapper(*args: Any, **kwargs: Any) -> Any:
(...)
300 "status": "error",
301 "error": get_full_class_name(e),
302 },
303 )
--> 304 raise e
..................................................
args = (<click.core.Context object at 0x7f82fa591940>, )
Any = typing.Any
kwargs = {'urn': 'urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/s
ome_data/results/data_info.csv,DEV)',
'aspect': ()}
..................................................
File "/home/mluser/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 256, in wrapper
247 def wrapper(*args: Any, **kwargs: Any) -> Any:
(...)
252 telemetry_instance.ping(
253 "function-call", {"function": function, "status": "start"}
254 )
255 try:
--> 256 res = func(*args, **kwargs)
257 telemetry_instance.ping(
..................................................
args = (<click.core.Context object at 0x7f82fa591940>, )
Any = typing.Any
kwargs = {'urn': 'urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/s
ome_data/results/data_info.csv,DEV)',
'aspect': ()}
telemetry_instance.ping = <method 'Telemetry.ping' of <datahub.telemetry.telemetry.Telemetry object at 0x7f82eca5eb20> telemetry.py:201>
function = 'datahub.cli.get_cli.get'
func = <function 'get' get_cli.py:14>
..................................................
File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/get_cli.py", line 38, in get
25 def get(ctx: Any, urn: Optional[str], aspect: List[str]) -> None:
(...)
34 urn = ctx.args[0]
35 logger.debug(f"Using urn from args {urn}")
36 click.echo(
37 json.dumps(
--> 38 get_aspects_for_entity(entity_urn=urn, aspects=aspect, typed=False),
39 sort_keys=True,
..................................................
get = <Command get>
ctx = <click.core.Context object at 0x7f82fa591940>
Any = typing.Any
urn = 'urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/s
ome_data/results/data_info.csv,DEV)'
Optional = typing.Optional
aspect = ()
List = typing.List
ctx.args = []
logger.debug = <method 'Logger.debug' of <Logger datahub.cli.get_cli (INFO)> __init__.py:1424>
json.dumps = <function 'dumps' __init__.py:183>
..................................................
File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/cli_utils.py", line 658, in get_aspects_for_entity
648 def get_aspects_for_entity(
649 entity_urn: str,
650 aspects: List[str],
651 typed: bool = False,
652 cached_session_host: Optional[Tuple[Session, str]] = None,
653 ) -> Dict[str, Union[dict, DictWrapper]]:
654 # Process non-timeseries aspects
655 non_timeseries_aspects: List[str] = [
656 a for a in aspects if a not in timeseries_class_to_aspect_name_map.values()
657 ]
--> 658 entity_response = get_entity(
659 entity_urn, non_timeseries_aspects, cached_session_host
..................................................
entity_urn = 'urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/s
ome_data/results/data_info.csv,DEV)'
aspects = ()
List = typing.List
typed = False
cached_session_host = None
Optional = typing.Optional
Tuple = typing.Tuple
Session = <class 'requests.sessions.Session'>
Dict = typing.Dict
Union = typing.Union
DictWrapper = <class 'avrogen.dict_wrapper.DictWrapper'>
non_timeseries_aspects = []
..................................................
File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/cli_utils.py", line 508, in get_entity
492 def get_entity(
493 urn: str,
494 aspect: Optional[List] = None,
495 cached_session_host: Optional[Tuple[Session, str]] = None,
496 ) -> Dict:
(...)
504 encoded_urn: str = urn
505 elif urn.startswith("urn:"):
506 encoded_urn = urllib.parse.quote(urn)
507 else:
--> 508 raise Exception(
509 f"urn {urn} does not seem to be a valid raw (starts with urn:) or encoded urn (starts with urn%3A)"
..................................................
urn = 'urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/s
ome_data/results/data_info.csv,DEV)'
aspect = []
Optional = typing.Optional
List = typing.List
cached_session_host = None
Tuple = typing.Tuple
Session = <class 'requests.sessions.Session'>
Dict = typing.Dict
urllib.parse.quote = <function 'quote' parse.py:799>
..................................................
---- (full traceback above) ----
File "/home/mluser/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 138, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/mluser/.local/lib/python3.8/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/home/mluser/.local/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/mluser/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 304, in wrapper
raise e
File "/home/mluser/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 256, in wrapper
res = func(*args, **kwargs)
File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/get_cli.py", line 38, in get
get_aspects_for_entity(entity_urn=urn, aspects=aspect, typed=False),
File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/cli_utils.py", line 658, in get_aspects_for_entity
entity_response = get_entity(
File "/home/mluser/.local/lib/python3.8/site-packages/datahub/cli/cli_utils.py", line 508, in get_entity
raise Exception(
Exception: urn urn urn:li:dataset:(urn:li:dataPlatform:s3,incoming_data/case_1/test/2022-05-06T14-30-26Z/data_2022-05-06T14-30-26Z/some_data/results/data_info.csv,DEV) does not seem to be a valid raw (starts with urn:) or encoded urn (starts with urn%3A)
[2022-05-09 10:24:11,561] INFO {datahub.entrypoints:161} - DataHub CLI version: 0.8.31.6 at /home/mluser/.local/lib/python3.8/site-packages/datahub/__init__.py
[2022-05-09 10:24:11,561] INFO {datahub.entrypoints:164} - Python version: 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0] at /usr/bin/python3 on Linux-5.4.0-96-generic-x86_64-with-glibc2.29
[2022-05-09 10:24:11,561] INFO {datahub.entrypoints:167} - GMS config {}
The gms logs after running docker logs -f datahub-gms
does not update when I am either clicking the get urn running the website button or when running the command mentioned above. There are no logs pointing to any error.
This issue was mentioned earlier in my older threads and @hundreds-photographer-13496 was investigating the server logs that I had provided (by the way huge thanks for help!).
Could you take a look and help resolve this issue or give some steps on how to get more info about the problem?
Thanks in advance!worried-motherboard-80036
05/09/2022, 4:57 PMsource:
type: "elasticsearch"
config:
# Coordinates
host: '<https://the_host:9200>'
# Credentials
username: the-user
password: the-pass
ca_certs: False
verify_certs: False
# Options
# url_prefix: "" # optional url_prefix
env: "DEV"
# index_pattern:
# allow: [".*some_index_name_pattern*"]
# deny: [".*skip_index_name_pattern*"]
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
Running the ingestion I get:
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/source/elastic_search.py", line 359, in _extract_mcps
340 def _extract_mcps(self, index: str) -> Iterable[MetadataChangeProposalWrapper]:
(...)
355 # 1.1 Generate the schema fields from ES mappings.
356 index_mappings = raw_index_metadata["mappings"]
357 index_mappings_json_str: str = json.dumps(index_mappings)
358 md5_hash = md5(index_mappings_json_str.encode()).hexdigest()
--> 359 schema_fields = list(
360 ElasticToSchemaFieldConverter.get_schema_fields(index_mappings)
..................................................
self = ElasticsearchSource(ctx=<datahub.ingestion.api.common.PipelineContext object at 0x13026d280>)
index = '.signals_watches_trigger_state'
Iterable = typing.Iterable
MetadataChangeProposalWrapper = <class 'datahub.emitter.mcp.MetadataChangeProposalWrapper'>
index_mappings = {}
raw_index_metadata = {'aliases': {},
'mappings': {},
'settings': {'index': {...}}}
index_mappings_json_str = '{}'
json.dumps = <function 'dumps' __init__.py:183>
md5_hash = '99914b932bd37a50b983c5e7c90ae93b'
..................................................
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/source/elastic_search.py", line 158, in get_schema_fields
152 def get_schema_fields(
153 cls, elastic_mappings: Dict[str, Any]
154 ) -> Generator[SchemaField, None, None]:
155 converter = cls()
156 properties = elastic_mappings.get("properties")
157 if not properties:
--> 158 raise ValueError(
159 f"Missing 'properties' in elastic search mappings={json.dumps(elastic_mappings)}!"
..................................................
cls = <class 'datahub.ingestion.source.elastic_search.ElasticToSchemaFieldConverter'>
elastic_mappings = {}
Dict = typing.Dict
Any = typing.Any
Generator = typing.Generator
SchemaField = <class 'datahub.metadata.schema_classes.SchemaFieldClass'>
converter = <datahub.ingestion.source.elastic_search.ElasticToSchemaFieldConverter object at 0x134858940>
properties = None
..................................................
---- (full traceback above) ----
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/entrypoints.py", line 149, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/x/.pyenv/versions/forter-3.8.12/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/telemetry/telemetry.py", line 317, in wrapper
raise e
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/telemetry/telemetry.py", line 269, in wrapper
res = func(*args, **kwargs)
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/utilities/memory_leak_detector.py", line 102, in wrapper
res = func(*args, **kwargs)
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/cli/ingest_cli.py", line 128, in run
raise e
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/cli/ingest_cli.py", line 114, in run
pipeline.run()
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/run/pipeline.py", line 214, in run
for wu in itertools.islice(
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/source/elastic_search.py", line 308, in get_workunits
for mcp in self._extract_mcps(index):
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/source/elastic_search.py", line 359, in _extract_mcps
schema_fields = list(
File "/Users/x/Development/data-hub/datahub/metadata-ingestion/src/datahub/ingestion/source/elastic_search.py", line 158, in get_schema_fields
raise ValueError(
ValueError: Missing 'properties' in elastic search mappings={}!
[2022-05-09 17:25:49,344] INFO {datahub.entrypoints:176} - DataHub CLI version: 0.0.0.dev0 at /Users/cristicalugaru/Development/data-hub/datahub/metadata-ingestion/src/datahub/__init__.py
[2022-05-09 17:25:49,344] INFO {datahub.entrypoints:179} - Python version: 3.8.12 (default, Jan 31 2022, 11:27:11)
[Clang 13.0.0 (clang-1300.0.27.3)] at /Users/cristicalugaru/.pyenv/versions/my-env/bin/python on macOS-12.0.1-arm64-arm-64bit
[2022-05-09 17:25:49,344] INFO {datahub.entrypoints:182} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.34', 'commit': '9422578e419a30231bdb83bd5f4cd42607781942'}}, 'managedIngestion': {'defaultCliVersion': '0.8.34.1', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'noCode': 'true'}
I see some indexes ingested but not the main ones:
[2022-05-09 17:25:49,375] INFO {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.searchguard_resource_owner
[2022-05-09 17:25:49,424] INFO {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.searchguard_resource_owner
[2022-05-09 17:25:49,808] INFO {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.kibana-event-log-7.13.2
[2022-05-09 17:25:49,820] INFO {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.kibana-event-log-7.13.2
[2022-05-09 17:25:49,835] INFO {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.kibana-event-log-7.13.2
[2022-05-09 17:25:49,888] INFO {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.ds-ilm-history-5-2022.03.10-000001
[2022-05-09 17:25:49,911] INFO {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.ds-ilm-history-5-2022.03.10-000001
[2022-05-09 17:25:49,926] INFO {datahub.ingestion.run.pipeline:103} - sink wrote workunit index-.ds-ilm-history-5-2022.03.10-000001
Any idea if I'm doing something wrong here?mammoth-fall-12031
05/09/2022, 4:59 PM./gradlew :metadata-service:war:build
But when I try to run it using
./gradlew :metadata-service:war:run
The build is getting stuck at 99%. Below is the last few lines of logs
2022-05-09 21:30:59.653:INFO:oejs.AbstractConnector:main: Started ServerConnector@77afea7d{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
2022-05-09 21:30:59.666:INFO:oejs.Server:main: Started @12086ms
2022-05-09 21:34:00.979:WARN:oejs.HttpChannel:qtp580024961-11: /auth/generateSessionTokenForUser
java.lang.NullPointerException
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1700)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1667)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:505)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
at java.lang.Thread.run(Thread.java:748)
Can anyone help me resolve this?wide-dawn-46249
05/09/2022, 5:19 PMlively-jackal-83760
05/10/2022, 10:56 AMearly-vr-295
05/10/2022, 12:02 PMprehistoric-knife-90526
05/10/2022, 2:51 PMadorable-receptionist-20059
05/10/2022, 6:27 PMmodern-zoo-97059
05/11/2022, 6:58 AMrich-policeman-92383
05/11/2022, 8:40 AMbusy-dusk-4970
05/11/2022, 1:01 PM./gradlew build
locally on an M1 mac and I'm running into this errorfresh-napkin-5247
05/11/2022, 1:38 PMlocalhost:8080/api/graphiql
adventurous-apple-98365
05/11/2022, 3:20 PMvalidateModels
gradle task fails with “found invalid relationship with name AssociatedWith at path /domains/*. Invalid entityTypes(s) provided”
I see that the domains aspect has that relationship in the Pegasus files. Is there something I'm doing wrong to add the domain to my snapshot?gorgeous-telephone-63628
05/11/2022, 6:42 PM> Task :datahub-web-react:yarnGenerate FAILED
yarn run v1.22.0
$ graphql-codegen --config codegen.yml
node:internal/modules/cjs/loader:936
throw err;
^
Error: Cannot find module './_baseClone'
Require stack:
- /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/lodash/clone.js
- /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/builder.js
- /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/generated/index.js
- /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/utils/react/cleanJSXElementLiteralChild.js
- /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/react/buildChildren.js
- /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/index.js
- /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/index.cjs.js
- /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/code-file-loader/index.cjs.js
- /Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-codegen/cli/bin.js
at Function.Module._resolveFilename (node:internal/modules/cjs/loader:933:15)
at Function.Module._load (node:internal/modules/cjs/loader:778:27)
at Module.require (node:internal/modules/cjs/loader:1005:19)
at require (node:internal/modules/cjs/helpers:94:18)
at Object.<anonymous> (/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/lodash/clone.js:1:17)
at Module._compile (node:internal/modules/cjs/loader:1101:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
at Module.load (node:internal/modules/cjs/loader:981:32)
at Function.Module._load (node:internal/modules/cjs/loader:822:12)
at Module.require (node:internal/modules/cjs/loader:1005:19) {
code: 'MODULE_NOT_FOUND',
requireStack: [
'/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/lodash/clone.js',
'/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/builder.js',
'/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/generated/index.js',
'/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/utils/react/cleanJSXElementLiteralChild.js',
'/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/builders/react/buildChildren.js',
'/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/node_modules/@babel/types/lib/index.js',
'/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/graphql-tag-pluck/index.cjs.js',
'/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-tools/code-file-loader/index.cjs.js',
'/Users/dcurran/Documents/git_projects/df-datahub/datahub-web-react/node_modules/@graphql-codegen/cli/bin.js'
]
}
error Command failed with exit code 1.
info Visit <https://yarnpkg.com/en/docs/cli/run> for documentation about this command.
wonderful-egg-79350
05/12/2022, 5:57 AMmost-plumber-32123
05/12/2022, 5:59 AM