https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • c

    Chinmay Soman

    09/05/2019, 9:32 PM
    the obvious way is to move to a HDFS based filesystem of course. But we could also do the CRC comparison upfront and skip the metadata update ? Just trying to understand the implication of this change
  • s

    Seunghyun

    09/05/2019, 9:51 PM
    https://github.com/apache/incubator-pinot/blob/cf980cbe564dda983fccf2bf7eabf877a7721cf8/pinot-controller/src/main/java/org/apache/pinot/controller/api/upload/ZKOperator.java#L153
    Copy code
    if (newCrc == existingCrc) {
            <http://LOGGER.info|LOGGER.info>(
                "New segment crc '{}' is the same as existing segment crc for segment '{}'. Updating ZK metadata without refreshing the segment.",
                newCrc, segmentName);
            // NOTE: even though we don't need to refresh the segment, we should still update creation time and refresh time
            // (creation time is not included in the crc)
            existingSegmentZKMetadata.setCreationTime(segmentMetadata.getIndexCreationTime());
            existingSegmentZKMetadata.setRefreshTime(System.currentTimeMillis());
            if (!_pinotHelixResourceManager.updateZkMetadata(offlineTableName, existingSegmentZKMetadata)) {
              throw new RuntimeException(
                  "Failed to update ZK metadata for segment: " + segmentName + " of table: " + offlineTableName);
            }
  • s

    Seunghyun

    09/05/2019, 9:52 PM
    we are updating the segment metadata in order to update the refresh time
  • k

    Kishore G

    09/05/2019, 10:00 PM
    Why?
  • t

    Ting Chen

    09/05/2019, 10:39 PM
    the above code writes existingSegmentZKMetadata to zk. so it could contain more than refreshed time: in our setup, it also contains a possibly different controller's download url. And if that controller uses local disk storage and does not have the segment, it could means download failures.
  • t

    Ting Chen

    09/05/2019, 10:46 PM
    it looks like we should reset the download url to the existing znRecord's download url. I.e.,
  • t

    Ting Chen

    09/05/2019, 10:46 PM
    existingSegmentZKMetadata.setDownloadUrl(znRecord.getSimpleField(CommonConstants.Segment.Offline.DOWNLOAD_URL));
  • k

    Kishore G

    09/05/2019, 11:33 PM
    Still trying to understand why we are changing the time? Is this for GDPR?
  • s

    Subbu Subramaniam

    09/06/2019, 12:35 AM
    https://github.com/apache/incubator-pinot/pull/4112
  • s

    Subbu Subramaniam

    09/06/2019, 12:35 AM
    I believe this is for GDPR. We rely on segment refresh time to see if the segment needs to be checked for purge
  • c

    Chinmay Soman

    09/06/2019, 12:36 AM
    I see. But we're also modifying the zkDownloadUri
  • s

    Subbu Subramaniam

    09/06/2019, 12:36 AM
    Or, is it that we just flagged the segment as purged. ,I am not sure
  • s

    Subbu Subramaniam

    09/06/2019, 12:36 AM
    need to check minion coe
  • s

    Subbu Subramaniam

    09/06/2019, 12:36 AM
    code
  • c

    Chinmay Soman

    09/06/2019, 12:36 AM
    maybe we can skip that ? (if crc is the same)
  • s

    Subbu Subramaniam

    09/06/2019, 12:41 AM
    hmm. i dont see reference to refresh time in minion, only to crc. @User do you know why we did PR 4112?
  • j

    Jackie

    09/06/2019, 6:14 PM
    @User Refresh time is used to determine whether a segment is subject to be purged
  • j

    Jackie

    09/06/2019, 6:38 PM
    @User The issue you got is not caused by setting new creation time/refresh time but by setting new download URI right?
  • c

    Chinmay Soman

    09/06/2019, 6:40 PM
    yes
  • c

    Chinmay Soman

    09/06/2019, 6:40 PM
    correct
  • c

    Chinmay Soman

    09/06/2019, 6:40 PM
    It's really a transient issue (once we move to HDFS based filesystem we should be good to go). But if its easy to fix - I'd love to do it
  • j

    Jackie

    09/06/2019, 6:41 PM
    @User We upload the metadata because we lock the segment by updating metadata when pushing segment, and we need to unlock it, not because updating creation/refresh time
  • j

    Jackie

    09/06/2019, 6:43 PM
    The issue here is download URI should be modified only if segment is moved to the permanent directory
  • c

    Chinmay Soman

    09/06/2019, 6:46 PM
    not exactly, even if its moved to the permanent directory - this can still happen
  • c

    Chinmay Soman

    09/06/2019, 6:46 PM
    because we don't have a shared filesystem between the controllers today - the second controller overrides the download URI to its local path
  • c

    Chinmay Soman

    09/06/2019, 6:47 PM
    which obviously doesn't have that segment
  • c

    Chinmay Soman

    09/06/2019, 6:47 PM
    to clarify - this is very rare (but does happen every now and then)
  • j

    Jackie

    09/06/2019, 6:50 PM
    https://github.com/apache/incubator-pinot/pull/3234 This PR changed the updating download uri logic. Will submit a fix soon
  • c

    Chinmay Soman

    09/06/2019, 6:51 PM
    awesome ! thank you
  • c

    Chinmay Soman

    09/06/2019, 6:53 PM
    on a related note, it seems we're moving away from using the original download URI (eg: on HDFS) to using Pinot's "filesystem" (either local disk/NFS/HDFS). Is this because we think the original download URI is not under Pinot's control and will not be valid after some time ?
1...848586...160Latest