https://kotlinlang.org logo
Join SlackCommunities
Powered by
# kotlin-spark
  • d

    davidh

    02/23/2022, 3:52 PM
    Hi, is there an alternate maven repo that has the spark-3.2 versions? Maven central only has the 1.0.2 which doesn't have the new 3.2
    j
    a
    j
    • 4
    • 6
  • m

    Michael Livshitz

    04/13/2022, 12:05 PM
    Hi! Jumped over here just to say that I've recently finished migrating a large set of complex Lucene-Over-Spark pipelines from Scala to Kotlin and it was a blast!
    βž• 1
    K 2
    πŸ‘ 1
    metal 2
    z
    • 2
    • 8
  • j

    Jolan Rensen [JetBrains]

    05/26/2022, 4:55 PM
    https://blog.jetbrains.com/big-data-tools/2022/05/26/kotlin-api-for-apache-spark-streaming-jupyter-and-more/
    K 5
  • j

    Jolan Rensen [JetBrains]

    05/31/2022, 3:33 PM
    Hi everyone, we at JetBrains are curious to find out what you use the Kotlin Spark API for. Do you have any problems with it? What would you like to see in upcoming versions? Don't hesitate to let us know πŸ™‚
    t
    • 2
    • 6
  • j

    jaeyol.youn

    06/29/2022, 12:43 AM
    hello. I need your help. sample stacktrace
    Copy code
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 4
    22/06/29 01:58:58 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on lniuhwk3388.nhnjp.ism:42755 in memory (size: 48.1 KB, free: 366.3 MB)
    22/06/29 01:58:58 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on lniuhwk4040.nhnjp.ism:36704 in memory (size: 48.1 KB, free: 1458.6 MB)
    22/06/29 01:58:58 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on lniuhwk3393.nhnjp.ism:33023 in memory (size: 48.1 KB, free: 1458.6 MB)
    22/06/29 01:58:58 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on lniuhwk3870.nhnjp.ism:34781 in memory (size: 48.1 KB, free: 1458.6 MB)
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 23
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 15
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 5
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 1
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 6
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 21
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 17
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 10
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 8
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 7
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 3
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 13
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 2
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 9
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 22
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 12
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 0
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 24
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 19
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 16
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 14
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 18
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 11
    22/06/29 01:58:58 INFO spark.ContextCleaner: Cleaned accumulator 20
    22/06/29 01:59:00 ERROR yarn.ApplicationMaster: User class threw exception: kotlin.jvm.KotlinReflectionNotSupportedError: Kotlin reflection implementation is not found at runtime. Make sure you have kotlin-reflect.jar in the classpath
    kotlin.jvm.KotlinReflectionNotSupportedError: Kotlin reflection implementation is not found at runtime. Make sure you have kotlin-reflect.jar in the classpath
    	at kotlin.jvm.internal.ClassReference.error(ClassReference.kt:84)
    	at kotlin.jvm.internal.ClassReference.isData(ClassReference.kt:70)
    	at org.jetbrains.kotlinx.spark.api.ApiV1Kt.isSupportedClass(ApiV1.kt:172)
    	at org.jetbrains.kotlinx.spark.api.ApiV1Kt.generateEncoder(ApiV1.kt:167)
    	at com.zshp.agg.batch.bannedword.CatalogBannedWordSparkApplication.main(CatalogBannedWordSparkApplication.kt:159)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
    22/06/29 01:59:00 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: kotlin.jvm.KotlinReflectionNotSupportedError: Kotlin reflection implementation is not found at runtime. Make sure you have kotlin-reflect.jar in the classpath
    	at kotlin.jvm.internal.ClassReference.error(ClassReference.kt:84)
    	at kotlin.jvm.internal.ClassReference.isData(ClassReference.kt:70)
    	at org.jetbrains.kotlinx.spark.api.ApiV1Kt.isSupportedClass(ApiV1.kt:172)
    	at org.jetbrains.kotlinx.spark.api.ApiV1Kt.generateEncoder(ApiV1.kt:167)
    	at com.zshp.agg.batch.bannedword.CatalogBannedWordSparkApplication.main(CatalogBannedWordSparkApplication.kt:159)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
    )
    22/06/29 01:59:00 INFO spark.SparkContext: Invoking stop() from shutdown hook
    22/06/29 01:59:00 INFO server.AbstractConnector: Stopped Spark@52631839{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
    22/06/29 01:59:00 INFO ui.SparkUI: Stopped Spark web UI at <http://lniuhwk3388.nhnjp.ism:38055>
    22/06/29 01:59:00 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
    22/06/29 01:59:00 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
    22/06/29 01:59:00 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
    22/06/29 01:59:00 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
    (serviceOption=None,
     services=List(),
     started=false)
    22/06/29 01:59:00 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    22/06/29 01:59:00 INFO memory.MemoryStore: MemoryStore cleared
    22/06/29 01:59:00 INFO storage.BlockManager: BlockManager stopped
    22/06/29 01:59:00 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
    22/06/29 01:59:00 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    22/06/29 01:59:00 INFO spark.SparkContext: Successfully stopped SparkContext
    22/06/29 01:59:00 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: kotlin.jvm.KotlinReflectionNotSupportedError: Kotlin reflection implementation is not found at runtime. Make sure you have kotlin-reflect.jar in the classpath
    	at kotlin.jvm.internal.ClassReference.error(ClassReference.kt:84)
    	at kotlin.jvm.internal.ClassReference.isData(ClassReference.kt:70)
    	at org.jetbrains.kotlinx.spark.api.ApiV1Kt.isSupportedClass(ApiV1.kt:172)
    	at org.jetbrains.kotlinx.spark.api.ApiV1Kt.generateEncoder(ApiV1.kt:167)
    	at com.zshp.agg.batch.bannedword.CatalogBannedWordSparkApplication.main(CatalogBannedWordSparkApplication.kt:159)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
    )
    22/06/29 01:59:00 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
    22/06/29 01:59:00 INFO yarn.ApplicationMaster: Deleting staging directory <viewfs://iu/user/z_shopping_ai/.sparkStaging/application_1655162175779_1751556>
    22/06/29 01:59:00 INFO util.ShutdownHookManager: Shutdown hook called
  • j

    jaeyol.youn

    06/29/2022, 12:45 AM
    Copy code
    # gradle version 6.8.2
    val kotlinVersion = "1.6.21"
     dependency("org.apache.spark:spark-sql_2.12:2.4.4")
     dependency("org.jetbrains.kotlinx.spark:kotlin-spark-api-2.4_2.12:1.0.2")
     // shadow jar <https://github.com/johnrengelman/shadow>
     dependency("com.github.johnrengelman.shadow:com.github.johnrengelman.shadow.gradle.plugin:6.1.0")
  • j

    jaeyol.youn

    06/29/2022, 12:48 AM
    Copy code
    Spark Version: 2.4.4
    
    $SPARK_HOME/bin/spark-submit \
    --master yarn \
    --deploy-mode cluster
  • j

    Jolan Rensen [JetBrains]

    06/29/2022, 7:20 PM
    Unfortunately we stopped supporting Spark 2.X.X. It became too much to maintain. If you have the possibility, upgrade to Spark 3.0, 3.1, or 3.2 πŸ™‚ That's the newest version we support with release 1.1.0. We still have 1.0.2 up for Spark 2.X, but then you're on your own. Although, reading through the error, try adding kotlin-reflect as a dependency to your project. That might solve it temporarily
    πŸ‘ 1
    ❀️ 1
    πŸ™‡β€β™‚οΈ 1
  • j

    Jolan Rensen [JetBrains]

    08/04/2022, 2:48 PM
    Kotlin Spark API v1.2 is out! We now support all combinations Scala and patch versions of Spark, have support for UDTs, UDFs and more. Read the blog below if you want to read more or try it out πŸ™‚ https://blog.jetbrains.com/kotlin/2022/08/kotlin-api-for-apache-spark-v1-2-udts-udfs-rdds-compatibility-and-more/ (And of course, let us know when you have any issues, suggestions, thoughts and whatnot πŸ™‚ )
    πŸŽ‰ 1
  • m

    Michael Livshitz

    08/29/2022, 8:33 PM
    HI! I've tried using Kotlin Spark v1.2 today, specifically
    kotlin-spark-api_3.0.1_2.12:1.2.1
    along side
    Kotlin 1.6.10
    with a
    1.8 JVM target
    on an existing code base that is already using an older version of Kotlin Spark, but once I've replaced my old dependencies with the new ones, I got
    Cannot inline bytecode built with JVM target 11 into bytecode that is being built with JVM target 1.8. Please specify proper '-jvm-target' option
    . Now when I go to sources I do see that it its bytecode 55 (Java 11). I was wondering if there is anything I can do about it given I'm stuck using a 1.8 JVM. Thank you for making Spark awesome btw :D
    r
    j
    • 3
    • 3
  • d

    Denis Ambatenne

    10/06/2022, 10:39 AM
    Hi all! πŸ‘‹ πŸ“£ K The Kotlin team is trying to determine how to develop the Kotlin Spark API further, and we’d love your input. If you are actively using it in your personal or professional projects, please share your experience with us in a phone interview. It will take no more than one hour, and you’ll get a 1-year All Products Pack subscription or a $100 Amazon Gift Card as a reward. πŸ‘‰ Please choose a time slot in my Calendly set up a call.
    πŸ‘ 3
    πŸ‘πŸΌ 1
  • a

    Aleksander Eskilson

    11/15/2022, 3:54 PM
    Hi folks! I'd posted in the
    kotlin-spark-api
    gitter a few days ago, https://gitter.im/JetBrains/kotlin-spark-api?at=636d86ea18f21c023ba7b373 There I'd noted that the default Spark classloader can have conflicts with kotlin-spark-api's replacement spark-sql Catalyst converters. The solution there I'd initially tried was to set the
    spark.driver.userClassPathFirst
    config option. But in practice this causes a lot of other classpath issues with other supplied jars (things like s3 impl jars). Another approach I'd tried was to make a shaded jar of my project, ensuring I included shaded dependencies for
    org.jetbrains.kotlinx.spark
    and
    org.jetbrains.kotlin
    . This seems to work when I add my project with the
    spark.jars
    and also the
    spark.driver.extraClassPath
    option (which states it'll prepend jars to the Spark classpath, which I'd suppose is the reason the replacement Catalyst converters get properly picked up by spark first), pointing to my project's physical jar. That's all a little unwieldy though. I'd prefer to use the
    spark.jars.packages
    option for my packaged project, but that doesn't seem to work (the spark-sql overloads seem to not get discovered by the Spark classloader before it's own implementation). Is anyone else experiencing odd classloader issues between Spark and the kotlin-spark-api replacement of Catalyst converters? Ideas on how to resolve these classpath conflicts?
    j
    p
    • 3
    • 7
  • j

    Jolan Rensen [JB]

    12/02/2022, 1:12 PM
    Kotlin Spark API v1.2.2 is released! https://github.com/Kotlin/kotlin-spark-api/releases/tag/1.2.2 It's just a small update this time, mostly version bumps: β€’ Added
    BigInteger
    support in #182 thanks to #181 β€’ New Spark versions: 3.2.3, 3.3.1, 3.2.2 β€’ New Scala versions: 2.12.17, 2.13.10 β€’ Updated Kotlin to 1.7.20 β€’ Small bugfix regarding Map encoding You can get the version that works with your Spark/Scala setup using the following table: https://github.com/Kotlin/kotlin-spark-api#supported-versions-of-apache-spark (Might take a couple of hours for Maven Central to update)
  • h

    holgerbrandl

    12/12/2022, 1:43 PM
    Hi, when working through https://blog.jetbrains.com/kotlin/2020/08/introducing-kotlin-for-apache-spark-preview/ I've noticed that the IDE seems longer able to restructuring the Tuple2 in
    Copy code
    citiesWithCoordinates.rightJoin(populations, citiesWithCoordinates.col("name") `===` populations.col("city"))
                .filter { (_, citiesPopulation) ->
                    citiesPopulation.population > 15_000_000L
                }
    Was this never working, or was destructuring support for Tuple2 dropped?
  • h

    holgerbrandl

    12/12/2022, 1:43 PM
    image.png
  • j

    Jolan Rensen [JB]

    12/12/2022, 1:46 PM
    @holgerbrandl Destructuring of tuples is definitely part of the API. Make sure to
    import org.jetbrains.kotlinx.spark.api.tuples.*
    to make sure it's imported πŸ™‚ (the IDE doesn't always auto import stuff like this correctly)
    h
    • 2
    • 1
  • j

    Jolan Rensen [JB]

    12/12/2022, 1:54 PM
    I do see that you're referring to the blog post from over 2 years ago. There have been a lot of changes since (and that's an understatement). I'd recommend also checking out the readme (https://github.com/Kotlin/kotlin-spark-api), examples (https://github.com/Kotlin/kotlin-spark-api/tree/release/examples/src/main/kotlin/org/jetbrains/kotlinx/spark/examples) and extensive wiki (https://github.com/Kotlin/kotlin-spark-api/wiki) πŸ™‚ Lot's of up-to-date stuff there
  • h

    holgerbrandl

    12/14/2022, 9:44 AM
    Hi, when trying the run a small example defined in https://github.com/holgerbrandl/kalasim/blob/master/modules/sparksim/src/main/kotlin/org/kalasim/exaples/spark/DistributedSim.kt#L26 it fails with
    Copy code
    Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (192.168.217.128 executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
    ... very long stacktrace  ....
    I was under the impression that all serialization should be very simply in the expression, and every job is returning a simple string. What am I doing wrong?
    j
    • 2
    • 4
  • j

    Jolan Rensen [JB]

    01/09/2023, 12:48 PM
    Kotlin Spark API v1.2.3 is released! https://github.com/Kotlin/kotlin-spark-api/releases/tag/1.2.3 Due to high demand, we backported the project to support Java 8 for all modules (except Jupyter). You can get any of the supported versions https://github.com/Kotlin/kotlin-spark-api#supported-versions-of-apache-spark from Maven Central as usual πŸ™‚ Let us know if you have any issues with it!
    K 2
  • j

    Jolan Rensen [JB]

    02/27/2023, 1:45 PM
    Thanks to Awesome Spark for including us on their list πŸ™‚
    πŸ‘ 2
  • j

    Jolan Rensen [JB]

    07/26/2023, 11:41 AM
    Kotlin Spark API v1.2.4 is released https://github.com/Kotlin/kotlin-spark-api/releases/tag/v1.2.4 It includes Java 8 support for Jupyter Notebooks as well now, plus Spark support from 3.0.0-3.3.2. Get the version you need from maven central using this table: https://github.com/Kotlin/kotlin-spark-api#supported-versions-of-apache-spark as usual. If you're using Jupyter, the new version will be used automatically, or you can specify it as described here https://github.com/Kotlin/kotlin-spark-api/wiki/Jupyter. Regarding Spark 3.4+, we're in a bit of a pickle. Due to changes in Spark, the API cannot simply be updated to support that version anymore. It will cost a significant amount of work to try and rebuild the base of the Kotlin Spark API from the ground up and we're investigating whether there's enough demand for it. If you have any ideas how to add support for Kotlin data classes and the inferring
    encoder<>()
    function, or you simply want to show your interest in the matter, let us know in the relevant issue: https://github.com/Kotlin/kotlin-spark-api/issues/195 As usual, let us know if you have any issues and enjoy the version bump :)
    πŸ‘ 2
    πŸŽ‰ 4
  • t

    Tyler Kinkade

    03/05/2024, 3:26 AM
    @Jolan Rensen [JB] Hi πŸ™‚ I'm trying to use the Kotlin DataFrame interoperability code from the project wiki to convert a Kotlin dataframe to a Spark dataframe in a Kotlin notebook, but it doesn't compile (error below).
    Copy code
    %use dataframe
    %use spark
    import org.apache.spark.api.java.JavaSparkContext
    import org.apache.spark.sql.*
    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.types.*
    import org.apache.spark.unsafe.types.CalendarInterval
    import org.jetbrains.kotlinx.dataframe.*
    import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
    import org.jetbrains.kotlinx.dataframe.api.*
    import org.jetbrains.kotlinx.dataframe.columns.ColumnKind.*
    import org.jetbrains.kotlinx.dataframe.schema.*
    import org.jetbrains.kotlinx.spark.api.*
    import java.math.*
    import java.sql.*
    import java.time.*
    import kotlin.reflect.*
    
    fun DataFrame<*>.toSpark(spark: SparkSession, sc: JavaSparkContext): Dataset<Row> {
        val rows = sc.toRDD(rows().map(DataRow<*>::toSpark))
        return spark.createDataFrame(rows, schema().toSpark())
    }
    
    fun DataRow<*>.toSpark(): Row =
        RowFactory.create(
            *values().map {
                when (it) {
                    is DataRow<*> -> it.toSpark()
                    else -> it
                }
            }.toTypedArray()
        )
    
    fun DataFrameSchema.toSpark(): StructType =
        DataTypes.createStructType(
            columns.map { (name, schema) ->
                DataTypes.createStructField(name, schema.toSpark(), schema.nullable)
            }
        )
    
    fun ColumnSchema.toSpark(): DataType =
        when (this) {
            is ColumnSchema.Value -> type.toSpark() ?: error("unknown data type: $type")
            is ColumnSchema.Group -> schema.toSpark()
            is ColumnSchema.Frame -> error("nested dataframes are not supported")
            else -> error("unknown column kind: $this")
        }
    
    fun KType.toSpark(): DataType? = when(this) {
        typeOf<Byte>(), typeOf<Byte?>() -> DataTypes.ByteType
        typeOf<Short>(), typeOf<Short?>() -> DataTypes.ShortType
        typeOf<Int>(), typeOf<Int?>() -> DataTypes.IntegerType
        typeOf<Long>(), typeOf<Long?>() -> DataTypes.LongType
        typeOf<Boolean>(), typeOf<Boolean?>() -> DataTypes.BooleanType
        typeOf<Float>(), typeOf<Float?>() -> DataTypes.FloatType
        typeOf<Double>(), typeOf<Double?>() -> DataTypes.DoubleType
        typeOf<String>(), typeOf<String?>() -> DataTypes.StringType
        typeOf<LocalDate>(), typeOf<LocalDate?>() -> DataTypes.DateType
        typeOf<Date>(), typeOf<Date?>() -> DataTypes.DateType
        typeOf<Timestamp>(), typeOf<Timestamp?>() -> DataTypes.TimestampType
        typeOf<Instant>(), typeOf<Instant?>() -> DataTypes.TimestampType
        typeOf<ByteArray>(), typeOf<ByteArray?>() -> DataTypes.BinaryType
        typeOf<Decimal>(), typeOf<Decimal?>() -> DecimalType.SYSTEM_DEFAULT()
        typeOf<BigDecimal>(), typeOf<BigDecimal?>() -> DecimalType.SYSTEM_DEFAULT()
        typeOf<BigInteger>(), typeOf<BigInteger?>() -> DecimalType.SYSTEM_DEFAULT()
        typeOf<CalendarInterval>(), typeOf<CalendarInterval?>() -> DataTypes.CalendarIntervalType
        else -> null
    }
    Error:
    Copy code
    Line_31.jupyter.kts (20:48 - 55) 'toSpark' is a member and an extension at the same time. References to such elements are not allowed
    How can I fix this?
    j
    • 2
    • 4
  • e

    Eric Rodriguez

    03/12/2024, 4:32 PM
    πŸ‘‹ Hello, team! I'm using kotlin-spark-api_3.3.2_2.12:12.4 to read an write some parqut files from MinIO through hadoop-aws and aws-java-sdk. I have a StandAlone cluster using docker compose and a I have a jupyter notebook on the same network as well. Everyhting works fine if I launch it from jupyter or from intellij, but when i create a fat-jar with maven-shade-plugin and use
    spark-submit
    it just stalls forever:
    Copy code
    24/03/12 12:09:26 INFO StandaloneSchedulerBackend: Granted executor ID app-20240312110926-0002/1 on hostPort 172.20.0.4:42795 with 1 core(s), 1024.0 MiB RAM
    24/03/12 12:09:26 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.55:58381 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.1.55, 58381, None)
    24/03/12 12:09:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.55, 58381, None)
    24/03/12 12:09:26 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.55, 58381, None)
    24/03/12 12:09:26 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
    j
    a
    • 3
    • 58
  • e

    Eric Rodriguez

    03/13/2024, 11:31 AM
    anyone knows of a spark 3.3.2 jdk11 image that u can recommend?
    πŸ‘€ 1
    t
    • 2
    • 10
  • e

    Eric Rodriguez

    03/13/2024, 9:41 PM
    why is
    spark.sql.codegen.wholeStage
    set to.
    false
    for jupyter?
    j
    • 2
    • 7
  • e

    Eric Rodriguez

    03/20/2024, 6:22 AM
    So I think the issue is my spark images have hadoop 3.3.2 but the kotlin-spark-api uses 3.3.6, just FYI
    Copy code
    +- org.jetbrains.kotlinx.spark:kotlin-spark-api_3.3.2_2.12:jar:1.2.4:compile
    [INFO] |  +- org.jetbrains.kotlinx.spark:core_3.3.2_2.12:jar:1.2.4:compile
    [INFO] |  +- org.jetbrains.kotlinx.spark:scala-tuples-in-kotlin_2.12:jar:1.2.4:compile
    [INFO] |  +- org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.8.20:compile
    [INFO] |  |  \- org.jetbrains.kotlin:kotlin-stdlib-jdk7:jar:1.8.20:compile
    [INFO] |  +- org.apache.spark:spark-streaming_2.12:jar:3.3.2:runtime
    [INFO] |  \- org.apache.hadoop:hadoop-client:jar:3.3.6:runtime
    [INFO] |     +- org.apache.hadoop:hadoop-common:jar:3.3.6:runtime
    [INFO] |     |  +- org.apache.hadoop.thirdparty:hadoop-shaded-protobuf_3_7:jar:1.1.1:runtime
    [INFO] |     |  +- org.apache.hadoop.thirdparty:hadoop-shaded-guava:jar:1.1.1:runtime
    [INFO] |     |  +- commons-net:commons-net:jar:3.9.0:runtime
  • e

    Eric Rodriguez

    03/25/2024, 8:38 AM
    So I downgraded to jdk11 and made sure hive 3.2.2 is the only version in the deps. and everything works as expected both in
    local[*]
    and against a cluster on docjer-compose. But then I began to save
    delta
    forma instead of
    parquet
    and now I get
    Copy code
    cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF``
    when I run against the cluster (works on lcoal). Any idea how I may debug this?
    j
    • 2
    • 8
  • s

    Slackbot

    04/07/2024, 2:49 AM
    This message was deleted.
    e
    • 2
    • 3
  • t

    Tyler Kinkade

    04/09/2024, 11:10 AM
    I can't figure out how to solve this exception: When
    withSpark
    is invoked dynamically in a Dockerized Ktor web app, an
    UnsupportedFileSystemException
    is thrown. GitHub issue is here GitHub repo is here. Broadcast.kt (from kotlin-spark-api example)
    Copy code
    import org.jetbrains.kotlinx.spark.api.map
    import org.jetbrains.kotlinx.spark.api.withSpark
    import java.io.Serializable
    
    object Broadcast {
        data class SomeClass(val a: IntArray, val b: Int) : Serializable
    
        fun broadcast(): MutableList<Int> {
            lateinit var result: MutableList<Int>
            withSpark(master = "local") {
                val broadcastVariable = spark.broadcast(SomeClass(a = intArrayOf(5, 6), b = 3))
                result = listOf(1, 2, 3, 4, 5)
                    .toDS()
                    .map {
                        val receivedBroadcast = broadcastVariable.value
                        it + receivedBroadcast.a.first()
                    }
                    .collectAsList()
                println(result)
            }
            return result
        }
    }
    Routing.kt
    Copy code
    fun Application.configureRouting() {
        routing {
            get("/") {
                val list = Broadcast.broadcast()
                call.respondText(list.toString())
            }
        }
    }
    Dockerfile
    Copy code
    # syntax=docker/dockerfile:1
    FROM eclipse-temurin:11-jre-jammy AS jre-jammy-spark
    RUN curl <https://archive.apache.org/dist/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3-scala2.13.tgz> -o spark.tgz && \
        tar -xf spark.tgz && \
        mv spark-3.3.2-bin-hadoop3-scala2.13 /opt/spark && \
        rm spark.tgz
    ENV SPARK_HOME="/opt/spark"
    ENV PATH="${PATH}:/opt/spark/bin:/opt/spark/sbin"
    
    FROM gradle:8.4-jdk11 AS gradle-build
    COPY --chown=gradle:gradle . /home/gradle/src
    WORKDIR /home/gradle/src
    RUN gradle buildFatJar --no-daemon
    
    FROM jre-jammy-spark AS app
    RUN mkdir -p /app
    COPY --from=gradle-build /home/gradle/src/build/libs/*-all.jar /app/app.jar
    ENTRYPOINT ["java","-jar","/app/app.jar"]
    compose.yaml
    Copy code
    services:
      app:
        build: .
        ports:
          - 8888:8888
    In a shell, run:
    Copy code
    $ docker compose up
    Then, open http://localhost:8888 in a browser. An
    org.apache.hadoop.fs.UnsupportedFileSystemException
    will be thrown:
    Copy code
    app-1  | 2024-04-09 10:26:26.484 [main] INFO  ktor.application - Autoreload is disabled because the development mode is off.
    app-1  | 2024-04-09 10:26:26.720 [main] INFO  ktor.application - Application started in 0.261 seconds.
    app-1  | 2024-04-09 10:26:26.816 [DefaultDispatcher-worker-1] INFO  ktor.application - Responding at <http://0.0.0.0:8888>
    app-1  | WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
    app-1  | Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
    app-1  | 2024-04-09 10:27:07.885 [eventLoopGroupProxy-4-1] WARN  o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    app-1  | WARNING: An illegal reflective access operation has occurred
    app-1  | WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/app/app.jar) to constructor java.nio.DirectByteBuffer(long,int)
    app-1  | WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
    app-1  | WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    app-1  | WARNING: All illegal access operations will be denied in a future release
    app-1  | 2024-04-09 10:27:10.066 [eventLoopGroupProxy-4-1] WARN  o.a.spark.sql.internal.SharedState - URL.setURLStreamHandlerFactory failed to set FsUrlStreamHandlerFactory
    app-1  | 2024-04-09 10:27:10.071 [eventLoopGroupProxy-4-1] WARN  o.a.spark.sql.internal.SharedState - Cannot qualify the warehouse path, leaving it unqualified.
    app-1  | org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "file"
    e
    j
    • 3
    • 21
  • y

    Yogeshvu

    07/18/2024, 11:46 PM
    Is there a support for querying Apache Iceberg tables?
    j
    • 2
    • 4