Apache Pinot #general

Kishore G

02/27/2020, 4:10 PM

start with this https://apache-pinot.gitbook.io/apache-pinot-cookbook/getting-started/quickstart/getting-start-with-pinot-quickstart

Sidd

02/28/2020, 12:33 AM

<!here>, I have added user docs for text search -- https://apache-pinot.gitbook.io/apache-pinot-cookbook/pinot-user-guide/pinot-query-language/text-search-support

👍 1

sunny19930321

03/02/2020, 10:52 AM

hi ，When to execute the expired data, 6*3600?

sunny19930321

03/02/2020, 11:01 AM

How do you execute the pinot sever down line? How does the data above migrate?

Xiang Fu

03/02/2020, 12:47 PM

controller has a retention manager which will kick off retention job

Xiang Fu

03/02/2020, 12:48 PM

default is 6 hours

Xiang Fu

03/02/2020, 12:48 PM

for server migration you can use segment rebalance tool

sunny19930321

03/02/2020, 2:11 PM

In some tables（realtime table and offline table）, the data segment is OFFLINE, so how can it be transferred to the ONLINE state

sunny19930321

03/02/2020, 2:12 PM

image.png

Mayank

03/02/2020, 2:52 PM

Are you suspecting the segment is offline due to retention, or is this an issue unrelated to your previous question?

Mayank

03/02/2020, 2:53 PM

If you have access to the server log, it would likely indicate why the segment is offline (typically due to some error).

sunny19930321

03/02/2020, 3:32 PM

It’s probably because of retention, but how do you delete those OFFLINE segments

sunny19930321

03/02/2020, 3:35 PM

I find that rebalance depends on the /tmp/PinotController directory under the controller. What if the amount of data is too large for the /tmp/PinotController directory disk? Because /tmp/PinotController takes over all the data segments of the cluster

Mayank

03/02/2020, 4:15 PM

You shouldnt need to delete them manually

Mayank

03/02/2020, 4:16 PM

Do you mean rebalance or retention?

Mayank

03/02/2020, 4:17 PM

Will check the code and get back

Mayank

03/02/2020, 5:22 PM

@User Are you starting the controller with default configs? You can (and should) overwrite the

controller.data.dir

config to a location that has enough storage, and can also be shared across multiple instances of the controller (in production). For example, in production you can use a deep-storage such as S3/HDFS/ADLS, or NFS that is mounted on the controller instances.

Elon

03/02/2020, 7:19 PM

Re: new star tree index config - We wanted to make sure we have the new config. This is our current star tree config (example):

Copy code

"starTreeIndexSpec": {
      "dimensionsSplitOrder": [
        "column1",
        "column2",
        "column3"
      ],
      "skipStarNodeCreationForDimensions": [],
      "functionColumnPairs": [
        "COUNT__columnX",
        "SUM__columnX",
        "MAX__columnX",
        "AVG__columnX",
        "COUNT__columnY",
        "SUM__columnY",
        "MAX__columnY",
        "AVG__itemQuantity",
        "COUNT__tax",
        "SUM__tax",
        "MAX__tax",
        "AVG__tax"
      ]
    }

Elon

03/02/2020, 7:20 PM

@User ^^^ is that the new config?

Jackie

03/02/2020, 7:25 PM

@User This is a mix of both old and new configs... The new configs are under the key

starTreeIndexConfigs

and takes a list of configs (one for each tree)

Jackie

03/02/2020, 7:26 PM

Copy code

"starTreeIndexConfigs": [
  {
    "dimensionsSplitOrder": [
      "column1",
      "column2",
      "column3"
    ],
    "functionColumnPairs": [
      "COUNT__columnX",
      "SUM__columnX",
      "MAX__columnX",
      "AVG__columnX",
      "COUNT__columnY",
      "SUM__columnY",
      "MAX__columnY",
      "AVG__itemQuantity",
      "COUNT__tax",
      "SUM__tax",
      "MAX__tax",
      "AVG__tax"
    ]
  }
]

Elon

03/02/2020, 7:27 PM

Perfect thanks! Also what is the

Copy code

maxLeafRecords

config?

Jackie

03/02/2020, 7:29 PM

That is the upper bound of the records to process for each leaf node (default 10000)

Jackie

03/02/2020, 7:29 PM

The lower the threshold, the less records to process (higher performance), but that will increase the size of the tree.

Jackie

03/02/2020, 7:30 PM

You may read more about how star-tree works here: https://pinot.readthedocs.io/en/latest/star-tree/star-tree.html

👍 1

sunny19930321

03/03/2020, 3:58 AM

in production you can use a deep-storage HDFS, Does this mount the HDFS specific document @User

sunny19930321

03/03/2020, 12:00 PM

or Does this mount the NFS specific document

Mayank

03/03/2020, 1:57 PM

Not sure if I follow the question @User

Mayank

03/03/2020, 1:58 PM

The controller data.dir can only be one of these storages.

Mayank

03/03/2020, 1:59 PM

If you specify data.dir on a hdfs:// path, then controller will use that to store segments and servers will download from there.