kotlinlang #datascience

zaleslaw

12/11/2024, 10:08 AM

🚀 Kotlin DataFrame v0.15.0 Release Announcement 🚀 We’re thrilled to announce the release of *Kotlin DataFrame v0.15.0*, packed with powerful new features, performance improvements, and exciting experimental integrations! Key Highlights 1. Experimental CSV Parser ◦ Introducing a new CSV parser based on Deephaven-CSV for faster and more reliable data parsing. 2. GeoDataFrame Class ◦ Work with geographical data in GeoJson or Shapefile formats and visualize it using Kandy. 3. Full BigInteger Support ◦ Enhanced support for BigInteger, enabling parsing, conversions, statistics, and column arithmetic. 4. Custom SQL Database Registration ◦ Register custom SQL databases effortlessly—check the user guide for details. 5. Improved Parsing ◦ Faster and more flexible parsing of String columns. ◦ New ParserOptions.useFastDoubleParser setting for improved Double parsing performance. 6. *Compiler Plugin Improvements (*check the actual demo here) Explore the Features Check out the resources below to dive into the new functionality: • New Features Example Notebook • How to Extend DataFrame Library for Custom SQL Database Support: Example with HSQLDB We can't wait to see what you'll build with Kotlin DataFrame v0.15.0!

🔥 9

❤️ 8

K 10

zaleslaw

12/17/2024, 11:48 AM

Hello everyone! We’re excited to share the pre-release of Kandy version 0.8.0, which introduces geo-plotting capabilities! This is the very first, experimental version of geo-plotting, and while we are still polishing the final release, we invite you to explore the new features using 0.8.0-RC1
. The detailed documentation and user guide are on their way, but for now, you can refer to the attached notebook, which includes examples, use cases, and demonstrations of working with geospatial data. We’d love to hear your feedback and impressions as you try out this new functionality! https://gist.github.com/AndreiKingsley/5aa25acbfa52aadbb6a3e3c641c54d57

🗺️ 8

🔥 9

📈 6

kotlin notebook 9

😃 4

🎉 3

shaktiman_droid

12/20/2024, 9:56 PM

Are there any data science/machine learning/onnx related libraries that are Kotlin Multiplatform with the Kotlin/native support for iOS implementation.

altavir

12/25/2024, 6:03 AM

@Adrian Trapletti We've started a commercial project that uses Clarabel4J as one of its solvers. So thank you very much both for it and for your great tutorials. The project itself is closed source, but I think that I will be able to share some code of matrix preparations in KMath examples in future. Today I made some JMH benchmarks comparing Clarabel solver and OJalgo solver on the same problem. I remember You said that Clarabel should be faster, but I get 1349 ops/second on Ojalgo versus 400 ops/second on Clarabel4J. It is possible that I messed matrix preparations (I did not optimize them), but they all are linear so I do not think it should matter.

Jens

01/09/2025, 9:50 AM

I have a list of lists of strings, basically just an excel sheet, of roughly 60-90MB size in RAM. This data needs to be exported as excel and am thinking about tinkering a little more with Kotlin Dataframe to do it. What would you say: Use Dataframe or just plain old apache POI?

esionecneics

01/09/2025, 11:50 PM

Can I use Kandy also completely and flawless in a normal kt file? I have this function, which works fine in a Kotlin notebook, but NOT in a kt file:

Copy code

import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.kandy.dsl.plot
import org.jetbrains.kotlinx.kandy.letsplot.layers.line
import org.jetbrains.kotlinx.kandy.util.color.Color


fun visualizeCustomerNumber(df: DataFrame<*>) {
    df.plot {
        line {
            x("createdAt") {
                axis {
                    name = "Date"
                }
            }
            y("customerNumber") {
                axis {
                    name = "Customer Number"
                }
            }
            color = Color.RED
        }
        layout {
            title = "Tomorrow Customer Number"
        }
    }
}

The keywords: |axis, name, layout, title| are recognized by IntelliJ The import in the notebook on the other hand works perfectly with: %use kandy This is the version I use in my build.gradle.kts: dependencies { implementation("org.jetbrains.kotlinxkandy lets plot0.8.0-RC1") }

Dumitru Preguza

01/22/2025, 4:14 PM

Hi everyone, I'm using the experimental Notebook Spring Boot starter, so far it's fine, but there are some issues to resolve, the main one is transforming Hibernate entities to DataFrame, because of lazy init. I end up having this exception:

Copy code

listOf(jobRepository.getJobById(12345)).toDataFrame()

Copy code

The problem is found in one of the loaded libraries: check library renderers
org.hibernate.LazyInitializationException: failed to lazily initialize a collection of role: data.entity.JobDefinition.jobRuns: could not initialize proxy - no Session
org.jetbrains.kotlinx.jupyter.exceptions.ReplLibraryException: The problem is found in one of the loaded libraries: check library renderers
	at ...

Dumitru Preguza

01/22/2025, 4:39 PM

How to enable the "multi dollar interpolation" feature ? Sometimes I work with mongodb queries, and I need the dollar sign $ inside a string without interpolation:

Copy code

$$"test $test"
$$"""test $test"""

Exception/hint appears: The feature "multi dollar interpolation" is experimental and should be enabled explicitly

patrickdelconte

02/14/2025, 7:05 PM

I am trying to create a plot that has a line on top of a bar chart, like in the image, with a separate y axis for the bar and line charts. I tried a few different combinations of

scale

and

axis

inside

y()

without any luck, none of it worked. It seems like only the last usage of

scale

makes it into the plot.

Copy code

val wounds = listOf(4520, 3242, 3128, 3156, 4115, 5082, 5918, 3811, 5013, 6426, 5952, 5761, 5685, 5316, 4726, 4127, 4121, 3837, 3232, 2684, 2151, 1904, 1528, 1182, 971, 679, 564, 367, 276, 194, 111, 87, 59, 42, 19, 11, 1, 2)
val betterPercent = wounds.indices.map { x -> (wounds.filterIndexed { index, _ ->  index > x }.sumOf { it } / wounds.sum().toDouble() * 100.0).roundToInt() }

plot() {
    x(wounds.indices)
    bars() {
        y(wounds)
    }
    line {
        y(betterPercent){
            scale = continuous(0..100)
        }
    }
}

zaleslaw

02/19/2025, 2:51 PM

🌍 Kandy 0.8: Unlocking Geospatial Visualization! 🗺️ The latest Kandy 0.8 update brings powerful geospatial plotting capabilities! Effortlessly work with spatial data using GeoDataFrame and seamlessly visualize it with Kandy’s geo extensions. Explore the geo plotting guide and dive into the gallery of geo charts to see what’s possible!

🌎 4

❤️ 4

🎉 6

📈 4

K 2

Jens

03/03/2025, 5:51 PM

Hey everybody, I'm trying to read an Excel file with kotlinx.dataframe but get the exception

Copy code

[2025-03-03T17:49:09.967Z] Caused by: java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data.
[2025-03-03T17:49:09.967Z] This may indicate that the file is used to inflate memory usage and thus could pose a security risk.
[2025-03-03T17:49:09.967Z] You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit.
[2025-03-03T17:49:09.967Z] Uncompressed size: 847128, Raw/compressed size: 8468, ratio: 0.009996

I have the following deps in my build.gradle.kts, but the class ZipSecureFile can't be found. Is there another way, I can set the min inflate ratio?

Copy code

implementation("org.jetbrains.kotlinx:dataframe:0.15.0")
    implementation("org.jetbrains.kotlinx:dataframe-excel:0.15.0")

Jens

03/03/2025, 5:54 PM

Copy code

%use dataframe

Seems to add this class to the classpath and it can be imported. What's the corresponding dependency, when working with dataframe in headless production code?

Jens

03/03/2025, 6:00 PM

Ah, ok. Importing poi fixed it

Copy code

implementation ("org.apache.poi:poi-ooxml:5.2.3")

Paulina Sobieszuk

03/06/2025, 12:34 PM

Hey Kotlin DataFrame users! The Kotlin team wants to learn more about what you use Dataframe for. Please vote by reacting to this post:

What do you use DataFrame for?

1️⃣ Generating reports & dashboards

2️⃣ Preparing data for ML/AI models

3️⃣ Data cleaning & enrichment

4️⃣ Working with REST APIs, files, and SQL databases

5️⃣ Processing data in business logic applications

6️⃣ Something else (tell us in the comments)

Thanks a lot for your help!

3️⃣ 10

4️⃣ 5

5️⃣ 6

6️⃣ 1

2️⃣ 7

1️⃣ 8

zaleslaw

03/07/2025, 2:44 PM

alphabet white question Hi, dear community, please participate in the poll above ☝️ , it‘s important for us K

👍 1

esionecneics

03/12/2025, 1:46 PM

I really wish there was a machine learning library for Kotlin. Like Sklearn for Python. At the moment I always fall back on the Java library Smile, but its API is unfortunately so impractical and inconsistent and it's so hard to prepare the data so that it can be fed into an SVM or Gradient Boosting Classifier. An ML library that harmonizes with Dataframe and Kandy would be a logical next step. It's not about competing with Python and its data science ecosystem. It's about Kotlin developers not having to switch back and forth between Kotlin and Python all the time. I know the Jetbrains team can do it 💛

👌 1

Paulina Sobieszuk

03/13/2025, 2:34 PM

Hello, Kotlin DataFrame users! K If you currently use Kotlin DataFrame or have used it in the past, we’d love to hear from you! We’re conducting 60-minute interviews about Kotlin DataFrame use cases to learn what’s working well and identify areas for improvement. The sessions will take place via Google Meet. As a thank-you for your time, you can choose from one of the following rewards: • A USD 100 Amazon Gift Card, or • A one-year subscription to JetBrains All Products Pack. To participate, please complete a short questionnaire. If your profile matches our study criteria, you'll be redirected to our Calendly page to schedule your session. Kotlin Product Research Team 🙌

Marian Schubert

03/27/2025, 1:51 PM

I'm getting following error in a Kotlin notebook after we updated project to Kotlin 2.1.20 (from 2.0.21)

Copy code

Class '...' was compiled with an incompatible version of Kotlin. The actual metadata version is 2.1.0, but the compiler version 1.9.0 can read versions up to 2.0.0.

It seems that Kotlin notebook plugin is using Kotlin 1.9 compiler? Is there any way to fix that problem?

eenriquelopez

04/08/2025, 2:13 PM

I came across a paper discussing an experiment and tried to reproduce it. Here’s a brief summary: • Portfolio A: In a bull market, grows by 20%; in a bear market, drops by 20%. • Portfolio B: In a bull market, grows by 25%; in a bear market, drops by 35%. • Bull market probability: 75%. According to the paper, both portfolios should have a one year expected return of 10%. However, the paper claims that Portfolio A wins over Portfolio B around 90% of the time. After running a Monte Carlo simulation (code attached), I found that Portfolio A outperforms Portfolio B around 66% of the time. Question: Am I doing something wrong in my simulation, or is the assumption in the original paper incorrect?

Copy code

// Simulation parameters
val years = 30
val simulations = 10000
val initialInvestment = 1.0

// Market probabilities (adjusting bear probability to 30% and bull to 70%)
val bullProb = 0.75 // 75% for Bull markets

// Portfolio returns
val portfolioA = mapOf("bull" to 1.20, "bear" to 0.80)
val portfolioB = mapOf("bull" to 1.25, "bear" to 0.65)

// Function to simulate one portfolio run and return the accumulated return for each year
fun simulatePortfolioAccumulatedReturns(returns: Map<String, Double>, rng: Random): List<Double> {
    var value = initialInvestment
    val accumulatedReturns = mutableListOf<Double>()
    
    repeat(years) {
        val isBull = rng.nextDouble() < bullProb
        val market = if (isBull) "bull" else "bear"
        value *= returns[market]!!

        // Calculate accumulated return for the current year
        val accumulatedReturn = (value - initialInvestment) / initialInvestment * 100
        accumulatedReturns.add(accumulatedReturn)
    }
    return accumulatedReturns
}

// Running simulations and storing accumulated returns for each year (for each portfolio)
val rng = Random(System.currentTimeMillis())

val accumulatedResults = (1..simulations).map {
    val accumulatedReturnsA = simulatePortfolioAccumulatedReturns(portfolioA, rng)
    val accumulatedReturnsB = simulatePortfolioAccumulatedReturns(portfolioB, rng)
    
    mapOf("Simulation" to it, "PortfolioA" to accumulatedReturnsA, "PortfolioB" to accumulatedReturnsB)
}

// Count the number of simulations where Portfolio A outperforms Portfolio B and vice versa
var portfolioAOutperformsB = 0
var portfolioBOutperformsA = 0
accumulatedResults.forEach { result ->
    val accumulatedA = result["PortfolioA"] as List<Double>
    val accumulatedB = result["PortfolioB"] as List<Double>

    if (accumulatedA.last() > accumulatedB.last()) {
        portfolioAOutperformsB++
    } else {
        portfolioBOutperformsA++
    }
}

// Print the results
println("Number of simulations where Portfolio A outperforms Portfolio B: $portfolioAOutperformsB")
println("Number of simulations where Portfolio B outperforms Portfolio A: $portfolioBOutperformsA")
println("Portfolio A outperformed Portfolio B in ${portfolioAOutperformsB.toDouble() / simulations * 100}% of simulations.")
println("Portfolio B outperformed Portfolio A in ${portfolioBOutperformsA.toDouble() / simulations * 100}% of simulations.")

👍 2

Paulo Cereda

04/11/2025, 5:27 PM

Hi friends! DataFrame question. 🙂 I have a

.csv

file in which one the columns has a comma-separated string. I would like to split it and have the line replicated for each element. I have a working code, but it's far from optimal. Code in thread. 🧵

Jason Zhao

04/14/2025, 6:57 AM

Hi, I have a question about the Kotlin notebook. Is it not possible to use

kotlinx.serialization

in the kotlin notebook? I have a notebook that needs to load my application transiently to perform a task. The application loading fails with the following error when it gets to the config deserialization part. I initially thought it was an issue with version conflict with dependencies, but I excluded the dependencies in question from loading Kotlin Serialization and it is still occurring. Note that when I run the same application logic in a regular Kotlin main function, I don't get this error.

java.lang.AbstractMethodError: Receiver class ((my class name))$$serializer does not define or inherit an implementation of the resolved method 'abstract kotlinx.serialization.KSerializer[] typeParametersSerializers()' of interface kotlinx.serialization.internal.GeneratedSerializer.

Note: I am in K2 mode so the issue might be related to that. (On a side note variables are still not detected across cells in the K2 mode.)

Ume Channel

04/29/2025, 11:13 PM

Good day! Hello how to fix when the number is overflow - like 1,927,384,739 becomes 1.927384739E9 - I don't want it to be expressed as E Notation

pjTDYzyiTmEte21D_oIlIY6HL8t_5FH7f-Global Library Data.xlsx

Ume Channel

04/30/2025, 12:55 AM

how to include decimal in number when importing xlxs

Ume Channel

04/30/2025, 10:35 PM

Temporary solution for overflow Number E Notation: Map it to data class then transform it to a String, transform to Double or null (if null make it 0.0) then make it Big Decimal then recreate the table for recreation - process(on studying)

Ume Channel

05/01/2025, 6:21 PM

Hello, when i don't have a column names like in the first row names like - country, region, expenditures, total libraries, etc. and my first row is directly values - like Albania, Europe, 132312, 12312, etc. and I query about first this happens - see image There should be a column names first(title) before the values?

Karthick

05/08/2025, 8:58 AM

Anybody knows that can we import a variable or functions from one notebook into another notebook?

Paulo Cereda

05/30/2025, 3:49 PM

⚠️ Hic sunt leones Occasionally I have to bring data to DataFrame from an unsupported database, so I wrote the following code as "an exercise to the reader":

Copy code

fun ResultSet.asSequence(): Sequence<ResultSet> = sequence {
    while (next()) {
        yield(this@asSequence)
    }
}

operator fun ResultSet.get(index: Int): Any? = this.getObject(index)

fun ResultSet.toDataFrame(): DataFrame<*> =
    mutableMapOf<String, MutableList<Any?>>()
        .let { map ->
            val names = List(metaData.columnCount) {
                metaData.getColumnName(it + 1)
            }
            this
                .asSequence()
                .forEach { row ->
                    names.mapIndexed { index, name ->
                        map[name]?.add(row[index + 1]) ?: map.put(name, mutableListOf(row[index + 1]))
                    }
                }
            map
        }.toDataFrame()

Hope someone can make use of this (even as a counterexample). 😁 Cheerio!

🦁 2

eenriquelopez

06/18/2025, 9:52 AM

I have two separate cells in a Notebook: one that does some processing, and returns me a dataframe that contain some coordinates with some info. Another one that takes the GeoPolygon for a region. This is what I can draw: As you may have imagined, I want to combine them both. However, if I read the documentation of Geo Plotting properly, the only way to combine them is to transform the first dataframe into a GeoJSON object (several GeoPoints probably) and then overlap it into the map. Is there any other way that this can be done without having to convert the objects? I was hoping that the geomap() function could offer somehow a lambda that could be used to return points.

eenriquelopez

06/18/2025, 9:53 AM

(code here, if you are interested)

eenriquelopez

06/18/2025, 10:53 AM

Also, partially related: is there any library or framework like Quarto in R that would allow you to publish a report?