S3 ingestion error below (please assist.) ```'[20...
# troubleshoot
b
S3 ingestion error below (please assist.)
Copy code
'[2022-11-14 15:21:02,184] ERROR    {logger:26} - Please set env variable SPARK_VERSION\n'
           'JAVA_HOME is not set\n'
I have JAVA_HOME set...
Copy code
$ echo $JAVA_HOME
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-11.el8.x86_64/
MY SPARK_HOME set...
Copy code
$ echo $SPARK_HOME
/opt/spark
My pyspark version == 3.0.3
Copy code
$ pyspark --version
22/11/14 09:23:51 WARN Utils: Your hostname, sa1x-eam-p1 resolves to a loopback address: 127.0.0.1; using 172.30.230.254 instead (on interface ens3)
22/11/14 09:23:51 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.3
      /_/
                        
Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 1.8.0_322
Branch HEAD
Compiled by user ubuntu on 2021-06-17T04:08:22Z
Revision 65ac1e75dc468f53fc778cd2ce1ba3f21067aab8
Url <https://github.com/apache/spark>
Type --help for more information.
My SPARK_VERSION set...
Copy code
$ echo $SPARK_VERSION
3.0.3
1
a
Hi Jason, could you please paste the full stack trace from the error, as well as some logs?
b
Hi @astonishing-answer-96712 Logs attached.
a
@bright-motherboard-35257 is this still an active issue? @gray-shoe-75895 can offer some insight
b
@astonishing-answer-96712 , yes still an open issue for me. Thanks.
g
It looks like you’re running ingestion from the UI, but those environment variables are set on your local machine. I believe you’ll need to use ingestion from the CLI using
datahub ingest
b
@gray-shoe-75895 ah makes sense. So the UI is running in a docker container (the quick start)… if I add Java and Spark to that container the UI should run it as expected correct?
g
Kinda - when the UI schedules ingestion, it is actually executed by the datahub-actions container. However, that container doesn’t yet have good support for setting custom environment variables or accessing java dependencies. While a few people have had success with customizing the datahub-actions image, it’s definitely less well-supported vs the cli method is well-understood
b
@gray-shoe-75895 CLI got me further along. When I run the ingest with profiling set to true it fails (logs attached.) Can you assist here? Thanks!
g
Copy code
/22/12/08 12:14:30 ERROR Executor: Exception in task 3.0 in stage 10.0 (TID 38)]
java.io.IOException: Cannot run program "python": error=2, No such file or directory
This error message indicates spark isn’t able to find python?
Likely a path setup issue, but not 100% sure
b
It was user error on my part. I installed Spark and neglected to have it set to automatically start at reboot of server. The Spark service was not running. As soon as got it running, this worked without error. Thanks for you support with this.