Apache Pinot #general

Join Slack

vmarchaud

11/06/2020, 10:38 AM

It seems to only happen with the jdk11 (both 0.5 and 0.6-RC), i tested with jdk8 and it works fine

Noah Prince

11/06/2020, 4:32 PM

@User org/apache/pinot/spi/plugin/PluginClassLoader.java Change line 50 to:

Copy code

method.invoke(this, url);

instead of

Copy code

method.invoke(classLoader, url);

Fixed the issue for me.

Noah Prince

11/08/2020, 5:20 PM

Every now and then, after several restarts of brokers, controllers, servers, my local Pinot gets into a bad state where it shows number of segments 0/83. VerifySegmentState gives

Copy code

Segment: table_OFFLINE_1603953900214_1603953902314_7 idealstate: {Server_10.136.245.18_8003=ONLINE} is MISSING in external view:
Segment: table_OFFLINE_1603953900214_1603953902314_7 idealstate: {Server_10.136.245.18_8003=ONLINE} does NOT match external view: null
table_OFFLINE = ERROR

Helix doesn’t seem to be assigning segments to my server. So the whole thing is just broken. Scorched earth (nuking ZK, reloading all the segments, etc) works, but if this were procuction how do I get things to work again? How do you debug something like this?

lâm nguyễn hoàng

11/09/2020, 1:39 PM

is it possible to create a table name and schema name according to the system date

Noah Prince

11/09/2020, 2:55 PM

What do people generally use for dashboarding with Pinot? I see there’s no Grafana plugin

Mohammed Galalen

11/10/2020, 6:04 PM

Is there is any way to convert sql table schema to pinot table schema?

Chundong Wang

11/10/2020, 9:16 PM

Just reported https://github.com/apache/incubator-pinot/issues/6253,

NullPointerException

is thrown when query with aggregation on top of groovy functions. Should we expect aggregation to work on top of groovy as it’s treated as transform function?

Tymm

11/11/2020, 9:42 AM

Hello, I followed this guide : https://docs.pinot.apache.org/basics/getting-started/advanced-pinot-setup and have successfully setup the cluster in docker on my ubuntu VM. However, once I restart the docker service/ reboot the ubuntu VM, i found out broker log has this error:

No server found for request 8: select * from sitevisitordata limit 10

please advice. thanks

João Comini

11/11/2020, 4:27 PM

Hello guys! I'm here to talk about the use case that my team and I are facing right now. We have a realtime data processing platform that provides general data aggregations (sums, counts, avg, etc.) to a transactional fraud detection engine. In summary we consume data from Kafka topics, and with this data we increment some counters for a given ID (e.g. a credit card hash) and then we update it inside the database. This solution have a really great performance as we are accessing key-value indexed columns, but we have a really though time creating new pre-aggregation flows. So we thought: "what if we had a self-service data aggregation engine, where we didn't need to code every step of every pre-aggregation flow?". And here we are! We've been looking to Pinot for a long time now, and we're still not sure if it is going to fit our scenario. The major problems that we have today is: • Today, it takes us approximately 2 or 3 days to deliver a new pre-aggregation flow. • Our pre-aggregation algorithm have some kind of an "imprecision" as we work with a big time-window inside our aggregation technique. I want to give you some metrics here, so maybe you can help me think (or not) if Pinot can be suitable to us: • Our fraud detection engine runs, at its peak, a throughput of 7~8k transactions per minute. • For each transaction we make dozens (if not hundreds) of requests to our pre-aggregation platform, which gives us a throughput of ~100k queries per minute. • Our pre-aggregations lantecy SLA is 1 second to return ALL queries. I'm just putting it here to discuss similar use cases and understand whether the team's effort in starting something new or maintaining what already exists is worth it. I apologize if this is not the best way to introduce myself and start an discussion here. 😄

👋 3

Noah Prince

11/12/2020, 4:12 AM

Is it possible to pull data for one table from multiple kafka topics with the same message type over them. Doesn't look like you can do a subscribePattern

Sarabjeet

11/12/2020, 5:33 AM

Hello Team, can someone please provide me write access to https://github.com/apache/incubator-pinot.git/? Unable to push my changes

Matt

11/12/2020, 6:59 PM

Hi All, I am new to Pinot and got some basic questions. When comparing with ElasticSearch do Pinot creates similar large scale index? Can I configure Pinot to talk to S3 or other data stores directly without provisioning additional space for local Indices? if in case local index is mandatory how big it will be? Thanks.

Matt

11/14/2020, 1:04 AM

anyone got a performance comparison of storing index in HDD/ SSD / EFS / NFS ? As usual I assume SSD will be the high performer?

Ken Krugler

11/16/2020, 5:49 PM

Question about hierarchical aggregations. In Elasticsearch we can (for example) group by day, and then sub-group in each day by some attribute, and sum a metric, and get the top 10 results (for that sum of that metric) per day. It doesn’t seem possible to do this with Pinot, but wanted to confirm, as my SQL skills are pretty rusty, thanks!

Matt

11/16/2020, 5:56 PM

hello, I am sending kafka JSON messages with following structure.

{data: string, timestamp: yyyy-mm-dd hh:mm:ss:ms zone ,  attra: string, mainattr: {attra: string,  attrb:string, attrc: string}}

consumed by pinot. Would like to check whether the "timeColumnName" should match the "timestamp" from the kafka messsage. if I have to match how will I specify the date format ? "segmentsConfig": { "timeColumnName": "mergedTimeMillis", "timeType": "MILLISECONDS",

jiatao

11/18/2020, 12:10 AM

Hey, the travis ci for my pr stuck for a while, anyone has the same issue? https://travis-ci.com/github/apache/incubator-pinot/builds/202233045

Jack

11/18/2020, 11:35 PM

Hello community, We are pleased to announce that Apache Pinot (incubating) 0.6.0 is released! Apache Pinot (incubating) is a distributed columnar storage engine that can ingest data in realtime and serve analytical queries at low latency. The release can be downloaded at: https://pinot.apache.org/download The release note is available at: https://docs.pinot.apache.org/basics/releases/0.6.0 <https://github.com/apache/incubator-pinot/releases/tag/release-0.6.0> Additional resources - Project website: https://pinot.apache.org Getting started: https://docs.pinot.apache.org/getting-started Mailing list: dev@pinot.apache.org Slack channel: https://communityinviter.com/apps/apache-pinot/apache-pinot Twitter: https://twitter.com/ApachePinot Best Regards, Apache Pinot (incubating) Team

🥃 6

🍷 9

🎉 16

Cesar

11/19/2020, 1:12 AM

Hey folks, just wishing to get some ideas here. What are some of the hot loops in Pinot? Any idea which loop could benefit from the new Java Vector API (https://openjdk.java.net/jeps/338) ?

Ho Tien Vu

11/19/2020, 4:23 PM

hi Pinot-dev, any update on generic "Null" support? https://docs.pinot.apache.org/developers/advanced/null-value-support

Darshan

11/19/2020, 5:07 PM

Hi folks, I tried to build the pinot project and seeing build errors. Occasionally, I see a "re-try"/"sync" type error as it takes a while to sync from public maven to our internal aritfactory. Unfortunately, upon several re-tries, I still see the errors. 1) [INFO] Pinot Google Cloud Storage ......................... FAILURE [ 0.555 s] 2)[ERROR] Failed to execute goal on project pinot-gcs: Could not resolve dependencies for project org.apache.pinotpinot gcsjar0.6.0 SNAPSHOT Failed to collect dependencies at com.google.cloudgoogle cloud storagejar:1.102.0 -> com.google.cloudgoogle cloud core httpjar:1.91.3 -> com.google.apigax httpjsonjar:0.66.1: Failed to read artifact descriptor for com.google.apigax httpjsonjar0.66.1 Could not transfer artifact com.google.apigax httpjsonpom:0.66.1 from/to central Any thoughts/suggestions?

Ken Krugler

11/20/2020, 8:06 PM

I was hoping to use the BaseClusterIntegrationTest class in my test code, so I could run a local integration test against data a workflow is generating. But the pinot-integration-tests jar (0.6.0, at least) doesn’t contain this class, or actually any code…just the META-INF directory. Is that intentional?

Matt

11/21/2020, 10:34 PM

I am executing a query and getting following error,

select length(mydata) from mytable where length(mydata) > 500 LIMIT 10

. Other queries are working fine. Wondering whether this a query issue or cluster setup / index issue?

Copy code

ERROR SplitRunner-0-38 com.facebook.presto.execution.executor.TaskExecutor Error processing Split 20201121_222456_00039_rmrgi.1.0.0-4 PinotSplit{connectorId=pinot, splitType=SEGMENT, columnHandle=[PinotColumnHandle{columnName=mydata, dataType=varchar, type=REGULAR}], segmentPinotQuery=Optional[SELECT mydata FROM mytable_REALTIME LIMIT 2147483647], brokerPinotQuery=Optional.empty, segments=[mytable__0__7__20201121T2220Z], segmentHost=Optional[Server_test-pinot-server-1.test-pinot-server-headless_8098]} (start = 2824864.565796, wall = 3 ms, cpu = 0 ms, wait = 0 ms, calls = 1): PINOT_UNCLASSIFIED_ERROR: Error when hitting host Server_test-pinot-server-1.sb-pinot-server-headless_8098 with pinot query "SELECT mydata FROM mytable_REALTIME LIMIT 2147483647"

Siavash Mahmoudian

11/22/2020, 6:12 AM

Hi Everyone! 👋 I’m co-founder of an online community platform startup. We help companies create customizable white-labeled social networks to connect their audience together. Apache Pinot looks amazing and we want to use Apache Pinot mostly for our analytics and user segmentation/filtering. Using Pinot for analytics is a no-brainer. However, I’m not sure if we should ElasticSearch or Apache Pinot for user filtering. To give you more context, in our platform users can take different actions such as “Creating a post”, “Liking a post”, “Commenting on a post”, “Buying an item”, etc. and they have different properties such as “Title”, “Age”, “Last Seen At”, etc. An example of user filtering is to fetch all users who have more than 5 posts and 10 comments and their age is more than 21 and were seen in the last 10 days. We should be able to sort the results on different columns of the user and paginate the results. Now here are my two questions: 1. If we want to use Pinot for user filtering, we will need to set the data retention period to infinite since the filters can be applied to any time period including from the beginning. Does Pinot slow down based on the amount of data it stores over time? Should we think of running cron jobs every month for instance to convert all the very old records to one or there is no need for it? 2. If we want to do filters on the number of actions (Buying an item), action fields (The amount of the item that was bought) and user fields (there can be custom fields defined). This means each record that we want to insert will have many columns. For instance for the “Buying an item” example, we need to save all the properties of the buyer, the product, the price. For other actions, we will need to save other properties. This means the number of columns can end up to hundreds. Is Apache Pinot designed to handle tons of columns in the schema? Thanks in advance for the help!

Joice Jacob

11/24/2020, 6:14 AM

hi Recently I was working with pinot group by command . I have loaded 3 million data in data in the table. I executed group by query with limit. If the limit is above 1 lakh, the query returns only 1 lakh records. As per documentations I have changed query limit over riding property to fix this limit issue but it didn't work. The issue is attached as screenshot below. Thanks in advance for the help!

Tymm

11/24/2020, 9:58 AM

Hi, can pinot aggregate all the data periodically?

Jakob Edding

11/24/2020, 10:36 AM

Hi, does Pinot offer an implementation of Change Data Capture (CDC) for OLAP queries so that updated query results can be obtained automatically? Or an alternative way to be notified of updated indexes so that we can manually retrigger the query?

👋 2

Amit Chopra

11/24/2020, 4:25 PM

I am new to Pinot and was trying out some of the examples. One document feedback - i saw that the name of controller created during Manual cluster setup is pinot-controller. While the controller name used in further pages Batch import example (docker examples and /tmp/pinot-quick-start/docker-job-spec.yml) use pinot-quickstart as the name. This causes the steps to fail unless you manually fix them 🙂

Matt

11/25/2020, 6:43 PM

Hello, I tried incubator-pinot helm chart and set replica count for server as 2 . I noticed that all the segments are going to one server and it is used heavily and the other statefulset is kind of doing nothing. Is this some sort of active passive behaviour? Also I scheduled 26G ram with jmx 4G for the server and sometimes the memory reaches max as well. How can i spread the segments and distribute the load? Also which metric can we use for autoscaling?

Tymm

11/26/2020, 1:18 AM

hello, where can I find guides/ documentations for pinot-minion?

sambit

11/29/2020, 2:54 PM

Does pinot support update / upsert query using pinot query language or any other API ? If yes does it support full updates or partial updates also possible.