Apache Pinot #general

vishal

12/20/2022, 8:49 AM

HI, i am trying to compile pinot code locally but some dependency is missing in pom file can somebody help with that?

Ashish Kumar

12/21/2022, 4:17 AM

Hi Team, I am trying to ingest data into DIMENSION table with composite keys using spark-batch-ingestion pinot jar. But seems like ingestion is not complete i.e. There are rows in source which are not available in Pinot DIMENSION table.

vishal

12/21/2022, 10:03 AM

Hi Team, i am trying to write

UDF - Scalar function

steps i've followed is as below: created java project with the package name

org.apache.pinot.scalar.ScalarFunc

. Pom file:

Copy code

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="<http://maven.apache.org/POM/4.0.0>"
         xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance>"
         xsi:schemaLocation="<http://maven.apache.org/POM/4.0.0> <http://maven.apache.org/xsd/maven-4.0.0.xsd>">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.apache.pinot.scalar.ScalarFunc</groupId>
    <artifactId>ScalarFunc</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.pinot</groupId>
            <artifactId>pinot-common</artifactId>
            <version>0.11.0</version>
        </dependency>
    </dependencies>


</project>

Main.java

Copy code

package org.apache.pinot.scalar.ScalarFunc;

import org.apache.pinot.spi.annotations.ScalarFunction;

public class Main {
    public static void main(String[] args) {
//        System.out.println("Hello world!");
    }

    @ScalarFunction
    static String getdata(String ref){
        return "hurray testing is working";
    }
}

created jar file using

mvn clean install

and moved that jar file to

pinot/plugins.

and than reinstalled the pinot. tried to run

select getdata(AirTime) from airlineStats limit 10

this query. here AirTime is column name. but its returning error as below:

Copy code

[
  {
    "message": "QueryExecutionError:\norg.apache.pinot.spi.exception.BadQueryRequestException: Unsupported function: getdata not found\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:304)\n\tat org.apache.pinot.core.operator.transform.TransformOperator.<init>(TransformOperator.java:65)\n\tat org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:71)\n\tat org.apache.pinot.core.plan.SelectionPlanNode.run(SelectionPlanNode.java:71)",
    "errorCode": 200
  }
]

am i doing anything wrong? reference: https://docs.pinot.apache.org/users/user-guide-query/scalar-functions#scalar-functions

Ashish Kumar

12/22/2022, 3:52 AM

Hi Team, Is there any API to delete all the segments for a given prefix of a Pinot Table?

Ralph Debusmann

12/22/2022, 11:00 AM

Hi is it possible to post multiple SQL queries at the same time to reduce the number of REST calls to Pinot/network traffic?

vishal

12/22/2022, 11:24 AM

Hi Team, i've one offline table as below:

Copy code

id, name, position, timestamp
1,vishal,SDE1,
2,Raj, SDE2
1,vishal, SDE2
1,vishal,NULL

here i am try to featch data for id=1 so SQL query will be

SELECT position FROM table where id=1

it will return 3 records. but i want only one record which is

1,vishal,SDE2

. can we achieve through scalar function lets say we will pass id as arg and fetch all data in that function get write a logic which will merge the rows and return one string which is concat of all columns. EX.:

Copy code

SELECT data(1) FROM table;

def of data func(id) {
select all the data which has id=1 and merge those record and return one single string
}

is it possible? can we query inside the function? or we need to call api to query data inside the function?

Shakti Singh

12/22/2022, 3:41 PM

👋 Hi everyone!

👋 7

Rostan TABET

12/23/2022, 1:29 PM

Hello

Rostan TABET

12/23/2022, 1:31 PM

For Bloom filters, the documentation states

Bloom filter helps prune segments that do not contain any record matching an EQUALITY predicate.

I wonder if it also includes

IN

predicates. For example, can a bloom filter be useful for a predicate in the form

fruit IN ('banana', 'apple', 'orange')

or do I need to change it to

fruit = 'banana' OR fruit = 'apple' OR fruit = 'orange'

Dileep Kancharla

12/23/2022, 3:00 PM

Hello Team, Does Apache Pinot support materialised views ?

Rohit Yadav

12/26/2022, 7:20 AM

Hi All, is there a possibility to soft delete Pinot tables and have an instant recovery if needed? OR Can I disable a table from only being queried but allow ingestions to happen? The requirement arises from table management use cases where I do not want a prod table to be deleted immediately but have a grace period for recovery. I came across an API to disable Pinot table that stops querying and ingestion for a table but recovery can be tricky in the case of Realtime tables(stream data getting evicted and recovery is not instant on high qps stream).

Amos Bird

12/27/2022, 7:10 AM

Hello! Since Pinot and Druid are very similar in many aspects, I wonder if both projects are from the same original project. Or many one is a fork of the other?

Amos Bird

12/28/2022, 2:45 PM

Hi! I found the concept of

Raw value forward index

very confusing. Is it just a column storage of the original data? I don't see why it's called an index.

Amos Bird

12/28/2022, 2:58 PM

It's also interesting to see that

colB

is reordered before doing dict-encoding in Dictionary-encoded forward index with bit compression (default)

Timothy Spann

12/29/2022, 12:31 AM

My wrap up includes some videos/articles on Pinot https://medium.com/@tspann/2022-wrap-up-for-streaming-247cd21fd483

👍 5

Abhishek Dubey

01/02/2023, 10:45 AM

Hi Team, can we add columns to pre-aggregation list (or star-tree index) any point of time post data ingestion without impact to ingestion ? I understand it’ll need table configuration to be modified but do we need to stop ingestion ?

Shankar Uprety

01/04/2023, 1:17 AM

👋 Hi everyone!

👋🏽 1

👋 1

Shreyans Bhavsar

01/04/2023, 11:08 AM

Hello! Can we do an upsert in the offline table using IngestFromFile API?

Harshit

01/04/2023, 11:23 AM

Hello, I am getting following error while ingesting data via Flink

Could not find index for column: gKey, type: FORWARD_INDEX, segment: /tmp/data/pinotServerData/key1_OFFLINE/key1_3_

Schema

Copy code

{
  "schemaName": "key",
  "dimensionFieldSpecs": [
    {
      "name": "rootKey",
      "dataType": "STRING"
    },
    {
      "name": "gKey",
      "dataType": "STRING"
    }
  ],
  "primaryKeyColumns": [
    "gKey"
  ]
}

Table config

Copy code

{
  "tableName": "key",
  "tableType": "OFFLINE",
  "isDimTable": true,
  "segmentsConfig": {
    "schemaName": "key",
    "segmentPushType": "REFRESH",
    "replication": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP"
  },
  "metadata": {
    "customConfigs": {}
  },
  "quota": {
    "storage": "200M"
  }
}

Harshit

01/05/2023, 6:46 AM

Hello, I am getting following error while ingesting data via Flink

Could not find index for column: gKey, type: FORWARD_INDEX, segment: /tmp/data/pinotServerData/key1_OFFLINE/key1_3_

Schema

Copy code

{
  "schemaName": "key",
  "dimensionFieldSpecs": [
    {
      "name": "rootKey",
      "dataType": "STRING"
    },
    {
      "name": "gKey",
      "dataType": "STRING"
    }
  ],
  "primaryKeyColumns": [
    "gKey"
  ]
}

Table config

Copy code

{
  "tableName": "key",
  "tableType": "OFFLINE",
  "isDimTable": true,
  "segmentsConfig": {
    "schemaName": "key",
    "segmentPushType": "REFRESH",
    "replication": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP"
  },
  "metadata": {
    "customConfigs": {}
  },
  "quota": {
    "storage": "200M"
  }
}

Shreeram Goyal

01/05/2023, 9:13 AM

Hi, I figured out that while applying transformation functions, the destination column name has to be changed from source column name which in our use case would cause a lot of issues as we would have to change all our queries. Is there a work around for this where we can transform a particular column keeping the column same?

Pratik Tibrewal

01/06/2023, 8:39 AM

Does Pinot support additions or subtractions in date functions, something like:

Copy code

date_trunc('week', date_parse(datestr, '%y-%m-%d')) + interval '3' day

Tim Berglund

01/06/2023, 5:16 PM

I should like to remind all three to four thousand of you that you still have all weekend to get your abstract in for the Real-Time Analytics Summit.

✅ 1

Tim Berglund

01/06/2023, 5:16 PM

This isn’t a Pinot show as such, but given the topic, there ought to be a lot of Pinot content there. Send us your proposals! The doors close at midnight PST on Sunday. 💥

🍷 2

Unmesh Vijay Kadam

01/09/2023, 8:05 AM

Can Nodejs application be connected to Pinot? If so what are the packages that can be used in the Nodejs application to connect to pinot?

Prashant Korade

01/09/2023, 6:11 PM

Hi Team, Is there way to get ingestion time stamp of record in pinot when consuming records from kafka ? Thanks

Tim Berglund

01/09/2023, 9:43 PM

I know I just said a few days ago that the Real-Time Analytics Summit CFP was closing, but we’ve decided to extend it by a week. Many of you got your proposals in, but in the end I decided this was all just too close to the holidays to cut it off now. You’ve got till Jan 16. Submit your talk, if you haven’t already! https://sessionize.com/real-time-analytics-summit-2023

Rostan TABET

01/10/2023, 9:18 AM

Hi Pinot team, I have a question about the implementation of the Python client The method

Cursor.fetchall

has the following docstring :

Fetch all (remaining) rows of a query result, returning them as a

sequence of sequences (e.g. a list of tuples). Note that the cursor's

arraysize attribute can affect the performance of this operation.

However, the method's implementation is simply :

Copy code

return list(self)

which basically creates a list by calling

fetchone

, i.e.

self._results.pop(0)

once for each element of the list

self._results

. I wonder if there is a reason for this, instead of something like :

Copy code

res = self._results
self._results = []
return res

My main concern is about possible performance issues when the query result contains many rows

Sachin Mittal Consultant

01/10/2023, 5:08 PM

Hello folks I am trying to evaluate pinot as a realtime data store and I am trying to consume data from aws kinesiss. Right now I am able to figure out the config and custom data transformation for the same I just wanted to rerun it from scratch. I am on macos and built pinot from source. Where does pinot store all the data and configs we create using admin ? How can i delete that and start fresh by restarting all services from scratch ?

Sachin Mittal Consultant

01/10/2023, 8:08 PM

Folks I am trying to ingest from kinesis stream and I am getting this error:

Caught exception while decoding row

These mostly seem to be from my transformation functions, I have used some in-built functions and for those specific functions the stack trace indicated what problems could be and I have fixed them. However I also needed from groovy scripts too and I executed this scripts standalone to check if they are working fine and they do However it still does not seem to be decoding row and I am not able to now figure out where the problem is. The stack trace I get is something like:

Copy code

org.apache.pinot.shaded.com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0x89
 at [Source: (ByteArrayInputStream); line: 1, column: 3]
	at org.apache.pinot.shaded.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2337) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.shaded.com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:710) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3607) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeCharForError(UTF8StreamJsonParser.java:3350) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3582) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2688) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:870) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:762) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.shaded.com.fasterxml.jackson.databind.ObjectReader._bindAsTree(ObjectReader.java:2058) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.shaded.com.fasterxml.jackson.databind.ObjectReader._bindAndCloseAsTree(ObjectReader.java:2044) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.shaded.com.fasterxml.jackson.databind.ObjectReader.readTree(ObjectReader.java:1739) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.spi.utils.JsonUtils.bytesToJsonNode(JsonUtils.java:211) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder.decode(JSONMessageDecoder.java:61) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder.decode(JSONMessageDecoder.java:73) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder.decode(JSONMessageDecoder.java:37) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.spi.stream.StreamDataDecoderImpl.decode(StreamDataDecoderImpl.java:47) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.processStreamEvents(LLRealtimeSegmentDataManager.java:549) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:434) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:629) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-782c3c2df59d2b173ba9ef595aeabd27cb00a332]
	at java.lang.Thread.run(Thread.java:832) [?:?]

Is there a way to debug is better ?