Hi Team, As part of a POC, We are trying to load p...
# general
s
Hi Team, As part of a POC, We are trying to load pinot table data into a spark dataFrame using the spark JDBC option. However when we try we are seeing the following error:
Copy code
Exception in thread "main" java.sql.SQLFeatureNotSupportedException
	at org.apache.pinot.client.base.AbstractBaseStatement.setQueryTimeout(AbstractBaseStatement.java:167)
	at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
	at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
k
Hi, all the features of JDBC driver are not supported currently. e.g. setQueryTimeout method here. Can I understand your use case so that I can suggest some alternatives?
s
We are exploring the possibility of querying Apache pinot using spark JDBC to gather distinct column values from a table.
k
Why not get distinct values directly from pinot?
It will be much more efficient
s
Hi Kartik, Below is the flow: Pinot table --> spark(read, filter, transform) --> use column data to fetch data from Postgres
k
got it. We haven't verified our JDBC driver compatibility with Spark yet. I will need to check what methods will need to be implemented here.