Hello, sorry for asking probably very basic quest...
# troubleshoot
b
Hello, sorry for asking probably very basic question, but I'm a little stuck with this. So I want to add endpoint_url for aws in datahub code so I can modify this to my own value. I saw that currently on beta branch datahub started to support S3 data lake so, changing this endpoint_url would be essential for me as I am using my own S3 buckets. I tried to build the datahub using the ./gradle build command but for some reason I had encountered following problem:
Copy code
symbol:   class Generated
  location: package javax.annotation.processing
/sharedvolume/datahub/datahub-graphql-core/src/mainGeneratedGraphQL/java/com/linkedin/datahub/graphql/generated/Filter.java:7: error: cannot find symbol
@javax.annotation.processing.Generated(
                            ^
  symbol:   class Generated
  location: package javax.annotation.processing
100 errors

> Task :datahub-graphql-core:compileJava FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':datahub-graphql-core:compileJava'.
> Compilation failed; see the compiler error output for details.

* Try:
Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Exception is:
org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':datahub-graphql-core:compileJava'.
        at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.lambda$executeIfValid$3(ExecuteActionsTaskExecuter.java:186)
        at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:268)
        at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeIfValid(ExecuteActionsTaskExecuter.java:184)
        at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:173)
        at org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:109)

		at org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:46)                                                                    [37/1861]
        at org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:62)
        at org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57)
        at org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:56)
        at org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:36)
        at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:77)
        at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:55)
        at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:52)
        at org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:200)
        at org.gradle.internal.operations.DefaultBuildOperationRunner$CallableBuildOperationWorker.execute(DefaultBuildOperationRunner.java:195)
        at org.gradle.internal.operations.DefaultBuildOperationRunner$3.execute(DefaultBuildOperationRunner.java:75)
        at org.gradle.internal.operations.DefaultBuildOperationRunner$3.execute(DefaultBuildOperationRunner.java:68)
        at org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:153)
        at org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:68)
        at org.gradle.internal.operations.DefaultBuildOperationRunner.call(DefaultBuildOperationRunner.java:62)
        at org.gradle.internal.operations.DefaultBuildOperationExecutor.lambda$call$2(DefaultBuildOperationExecutor.java:76)
        at org.gradle.internal.operations.UnmanagedBuildOperationWrapper.callWithUnmanagedSupport(UnmanagedBuildOperationWrapper.java:54)
        at org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:76)
        at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:52)
        at org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:41)
        at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:411)
        at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:398)
        at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:391)
        at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:377)
        at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.lambda$run$0(DefaultPlanExecutor.java:127)
        at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:191)
        at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.executeNextNode(DefaultPlanExecutor.java:182)
        at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:124)
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
        at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
Caused by: org.gradle.api.internal.tasks.compile.CompilationFailedException: Compilation failed; see the compiler error output for details.
        at org.gradle.api.internal.tasks.compile.JdkJavaCompiler.execute(JdkJavaCompiler.java:57)
        at org.gradle.api.internal.tasks.compile.JdkJavaCompiler.execute(JdkJavaCompiler.java:40)
        at org.gradle.api.internal.tasks.compile.daemon.AbstractDaemonCompiler$CompilerWorkAction.execute(AbstractDaemonCompiler.java:135)
        at org.gradle.workers.internal.DefaultWorkerServer.execute(DefaultWorkerServer.java:63)
        at org.gradle.workers.internal.AbstractClassLoaderWorker$1.create(AbstractClassLoaderWorker.java:49)
        at org.gradle.workers.internal.AbstractClassLoaderWorker$1.create(AbstractClassLoaderWorker.java:43)
        at org.gradle.internal.classloader.ClassLoaderUtils.executeInClassloader(ClassLoaderUtils.java:97)
        at org.gradle.workers.internal.AbstractClassLoaderWorker.executeInClassLoader(AbstractClassLoaderWorker.java:43)
        at org.gradle.workers.internal.FlatClassLoaderWorker.run(FlatClassLoaderWorker.java:32)
        at org.gradle.workers.internal.FlatClassLoaderWorker.run(FlatClassLoaderWorker.java:22)
        at org.gradle.workers.internal.WorkerDaemonServer.run(WorkerDaemonServer.java:85)
        at org.gradle.workers.internal.WorkerDaemonServer.run(WorkerDaemonServer.java:55)
        at org.gradle.process.internal.worker.request.WorkerAction$1.call(WorkerAction.java:138)
        at org.gradle.process.internal.worker.child.WorkerLogEventListener.withWorkerLoggingProtocol(WorkerLogEventListener.java:41)
        at org.gradle.process.internal.worker.request.WorkerAction.run(WorkerAction.java:135)
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182)
        at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
        at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414)
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
        at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
        at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
I'm not sure where does it come from and how to handle this, I think i did everything correctly according to the instruction here: https://datahubproject.io/docs/developers/ Also, since I am not familiar with datahub repository, how big is the scope of the change to make endpoint_url for aws changeable via the yml file? Could you provide some info on what to check and where to look and what to change in what src file? Thank you deeply for all the help you guys provide : )
g
I had face this error and it was because I was not having proper JDK 1.8 installation
e
Hey Pawan, please try @gentle-hamburger-31302’s suggestion above! For the change you mentioned, seems like it is an ingestion fix? You don’t have to build the whole repository to work on ingestion! Please refer to this doc https://datahubproject.io/docs/metadata-ingestion/developing/ for a guide on working with the ingestion library!
b
Hi, you were right, it was missing jdk problem and it looks like build just the ingestion does the trick! Thanks : ) So this change will be related to ingestion, we need to have exposed enpoint_url to ingest the data from our own S3 bucket, and since now we can't really specify that, we are kind of stuck with whole process. I'm not sure about the required scope of the changes, but looked like the only file needed to be changed was: https://github.com/datahub-project/datahub/blob/37aedfc87c4e39015bd456bf34debe5a3a[…]/metadata-ingestion/src/datahub/ingestion/source/s3/__init__.py where: file = file.replace("s3://", "s3a://") would equal to my own file path like: "my_s3_host_addrr://some/path/to/csv/" is it really as straightforward as I think it is?
Hi @early-lamp-41924 one more thing, I'm trying to run the beta S3 feature: https://datahubproject.io/docs/metadata-ingestion/source_docs/s3_data_lake/ and I'm receiving following error while doing ingestion via yml file, where type is specified to "S3":
Copy code
File "/home/pawel/.local/lib/python3.8/site-packages/datahub/ingestion/api/registry.py", line 124, in get
    115  def get(self, key: str) -> Type[T]:
 (...)
    120          self._check_cls(MyClass)
    121          return MyClass
    122      print(f"DEBUG:{self._mapping}")
    123      if key not in self._mapping:
--> 124          raise KeyError(f"Did not find a registered class for {key}")
my _mapping looks like this:
Copy code
{
'athena': 'datahub.ingestion.source.sql.athena:AthenaSource', 'azure-ad': 'datahub.ingestion.source.identity.azure_ad:AzureADSource', 'bigquery': 'datahub.ingestion.source.sql.bigquery:BigQuerySource', 'bigquery-usage': 'datahub.ingestion.source.usage.bigquery_usage:BigQueryUsageSource', 'clickhouse': 'datahub.ingestion.source.sql.clickhouse:ClickHouseSource', 'clickhouse-usage': 'datahub.ingestion.source.usage.clickhouse_usage:ClickHouseUsageSource', 'data-lake': 'datahub.ingestion.source.data_lake:DataLakeSource', 'datahub-business-glossary': 'datahub.ingestion.source.metadata.business_glossary:BusinessGlossaryFileSource', 'datahub-lineage-file': 'datahub.ingestion.source.metadata.lineage:LineageFileSource', 'dbt': 'datahub.ingestion.source.dbt:DBTSource', 'druid': 'datahub.ingestion.source.sql.druid:DruidSource', 'elasticsearch': 'datahub.ingestion.source.elastic_search:ElasticsearchSource', 'feast': 'datahub.ingestion.source.feast:FeastSource', 'file': 'datahub.ingestion.source.file:GenericFileSource', 'glue': 'datahub.ingestion.source.aws.glue:GlueSource', 'hive': 'datahub.ingestion.source.sql.hive:HiveSource', 'kafka': 'datahub.ingestion.source.kafka:KafkaSource', 'kafka-connect': 'datahub.ingestion.source.kafka_connect:KafkaConnectSource', 'ldap': 'datahub.ingestion.source.ldap:LDAPSource', 'looker': 'datahub.ingestion.source.looker:LookerDashboardSource', 'lookml': 'datahub.ingestion.source.lookml:LookMLSource', 'mariadb': 'datahub.ingestion.source.sql.mariadb.MariaDBSource', 'metabase': 'datahub.ingestion.source.metabase:MetabaseSource', 'mode': 'datahub.ingestion.source.mode:ModeSource', 'mongodb': 'datahub.ingestion.source.mongodb:MongoDBSource', 'mssql': 'datahub.ingestion.source.sql.mssql:SQLServerSource', 'mysql': 'datahub.ingestion.source.sql.mysql:MySQLSource', 'nifi': 'datahub.ingestion.source.nifi:NifiSource', 'okta': 'datahub.ingestion.source.identity.okta:OktaSource', 'openapi': 'datahub.ingestion.source.openapi:OpenApiSource', 'oracle': 'datahub.ingestion.source.sql.oracle:OracleSource', 'postgres': 'datahub.ingestion.source.sql.postgres:PostgresSource', 'powerbi': 'datahub.ingestion.source.powerbi:PowerBiDashboardSource', 'redash': 'datahub.ingestion.source.redash:RedashSource', 'redshift': 'datahub.ingestion.source.sql.redshift:RedshiftSource', 'redshift-usage': 'datahub.ingestion.source.usage.redshift_usage:RedshiftUsageSource', 'sagemaker': 'datahub.ingestion.source.aws.sagemaker:SagemakerSource', 'snowflake': 'datahub.ingestion.source.sql.snowflake:SnowflakeSource', 'snowflake-usage': 'datahub.ingestion.source.usage.snowflake_usage:SnowflakeUsageSource', 'sqlalchemy': 'datahub.ingestion.source.sql.sql_generic:SQLAlchemyGenericSource', 'starburst-trino-usage': 'datahub.ingestion.source.usage.starburst_trino_usage:TrinoUsageSource', 'superset': 'datahub.ingestion.source.superset:SupersetSource', 'tableau': 'datahub.ingestion.source.tableau:TableauSource', 'trino': 'datahub.ingestion.source.sql.trino:TrinoSource'}
I looks like my S3 beta feature is missing or not enabled in ingestion source, is there any easy way on how to enable that?