https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • k

    Kishore G

    02/27/2020, 4:10 PM
    start with this https://apache-pinot.gitbook.io/apache-pinot-cookbook/getting-started/quickstart/getting-start-with-pinot-quickstart
  • s

    Sidd

    02/28/2020, 12:33 AM
    <!here>, I have added user docs for text search -- https://apache-pinot.gitbook.io/apache-pinot-cookbook/pinot-user-guide/pinot-query-language/text-search-support
    👍 1
  • s

    sunny19930321

    03/02/2020, 10:52 AM
    hi ,When to execute the expired data, 6*3600?
  • s

    sunny19930321

    03/02/2020, 11:01 AM
    How do you execute the pinot sever down line? How does the data above migrate?
  • x

    Xiang Fu

    03/02/2020, 12:47 PM
    controller has a retention manager which will kick off retention job
  • x

    Xiang Fu

    03/02/2020, 12:48 PM
    default is 6 hours
  • x

    Xiang Fu

    03/02/2020, 12:48 PM
    for server migration you can use segment rebalance tool
  • s

    sunny19930321

    03/02/2020, 2:11 PM
    In some tables(realtime table and offline table), the data segment is OFFLINE, so how can it be transferred to the ONLINE state
  • s

    sunny19930321

    03/02/2020, 2:12 PM
    image.png
  • m

    Mayank

    03/02/2020, 2:52 PM
    Are you suspecting the segment is offline due to retention, or is this an issue unrelated to your previous question?
  • m

    Mayank

    03/02/2020, 2:53 PM
    If you have access to the server log, it would likely indicate why the segment is offline (typically due to some error).
  • s

    sunny19930321

    03/02/2020, 3:32 PM
    It’s probably because of retention, but how do you delete those OFFLINE segments
  • s

    sunny19930321

    03/02/2020, 3:35 PM
    I find that rebalance depends on the /tmp/PinotController directory under the controller. What if the amount of data is too large for the /tmp/PinotController directory disk? Because /tmp/PinotController takes over all the data segments of the cluster
  • m

    Mayank

    03/02/2020, 4:15 PM
    You shouldnt need to delete them manually
  • m

    Mayank

    03/02/2020, 4:16 PM
    Do you mean rebalance or retention?
  • m

    Mayank

    03/02/2020, 4:17 PM
    Will check the code and get back
  • m

    Mayank

    03/02/2020, 5:22 PM
    @User Are you starting the controller with default configs? You can (and should) overwrite the
    controller.data.dir
    config to a location that has enough storage, and can also be shared across multiple instances of the controller (in production). For example, in production you can use a deep-storage such as S3/HDFS/ADLS, or NFS that is mounted on the controller instances.
  • e

    Elon

    03/02/2020, 7:19 PM
    Re: new star tree index config - We wanted to make sure we have the new config. This is our current star tree config (example):
    Copy code
    "starTreeIndexSpec": {
          "dimensionsSplitOrder": [
            "column1",
            "column2",
            "column3"
          ],
          "skipStarNodeCreationForDimensions": [],
          "functionColumnPairs": [
            "COUNT__columnX",
            "SUM__columnX",
            "MAX__columnX",
            "AVG__columnX",
            "COUNT__columnY",
            "SUM__columnY",
            "MAX__columnY",
            "AVG__itemQuantity",
            "COUNT__tax",
            "SUM__tax",
            "MAX__tax",
            "AVG__tax"
          ]
        }
  • e

    Elon

    03/02/2020, 7:20 PM
    @User ^^^ is that the new config?
  • j

    Jackie

    03/02/2020, 7:25 PM
    @User This is a mix of both old and new configs... The new configs are under the key
    starTreeIndexConfigs
    and takes a list of configs (one for each tree)
  • j

    Jackie

    03/02/2020, 7:26 PM
    Copy code
    "starTreeIndexConfigs": [
      {
        "dimensionsSplitOrder": [
          "column1",
          "column2",
          "column3"
        ],
        "functionColumnPairs": [
          "COUNT__columnX",
          "SUM__columnX",
          "MAX__columnX",
          "AVG__columnX",
          "COUNT__columnY",
          "SUM__columnY",
          "MAX__columnY",
          "AVG__itemQuantity",
          "COUNT__tax",
          "SUM__tax",
          "MAX__tax",
          "AVG__tax"
        ]
      }
    ]
  • e

    Elon

    03/02/2020, 7:27 PM
    Perfect thanks! Also what is the
    Copy code
    maxLeafRecords
    config?
  • j

    Jackie

    03/02/2020, 7:29 PM
    That is the upper bound of the records to process for each leaf node (default 10000)
  • j

    Jackie

    03/02/2020, 7:29 PM
    The lower the threshold, the less records to process (higher performance), but that will increase the size of the tree.
  • j

    Jackie

    03/02/2020, 7:30 PM
    You may read more about how star-tree works here: https://pinot.readthedocs.io/en/latest/star-tree/star-tree.html
    👍 1
  • s

    sunny19930321

    03/03/2020, 3:58 AM
    in production you can use a deep-storage HDFS, Does this mount the HDFS specific document @User
  • s

    sunny19930321

    03/03/2020, 12:00 PM
    or Does this mount the NFS specific document
  • m

    Mayank

    03/03/2020, 1:57 PM
    Not sure if I follow the question @User
  • m

    Mayank

    03/03/2020, 1:58 PM
    The controller data.dir can only be one of these storages.
  • m

    Mayank

    03/03/2020, 1:59 PM
    If you specify data.dir on a hdfs:// path, then controller will use that to store segments and servers will download from there.
1...115116117...160Latest