Table - gitHubEvents
# troubleshooting
s
Table - gitHubEvents
x
what’s segment metadata?
can you check download URL and modify segment metadata to change it to hdfs
s
How to check download url
x
click on segment name
from ui
download uri won’t change when you change the deep store, it’s only for new uploaded segments
for existing segments, you need to modify the download uri right now
We have one issue tracking this: https://github.com/apache/pinot/issues/7275
with all the detailed steps
s
is there any restapi to update download URL to hdfs...
there is no way to edit
x
You can find it from zk path /PropertyStore/table/segments
Then update the metadata from the zk record
s
u mean via UI or in the controller machine?
x
Your offline download uri is hdfs already, so you changed the hdfs path?
Yes, controller UI
s
above one is not that table....
x
There is a zookeeper browser
Ok
s
Screenshot from 2021-08-18 11-12-20.png
this is the table... i want to migrate to hdfs
x
Theb update this download uri field should work
s
Screenshot from 2021-08-18 11-13-02.png
this is zk browser
x
You saw the propertStore
Go to that and find the table and segment metadata
s
yeah
in production setup, i dont have ui access,
is there any api to do that
i got that
x
Yes, its using /zk api
You can go to controller:9000/help
To find the swagger api of zookeeper
s
yeah
x
Use put API there to modify the record
s
Screenshot from 2021-08-18 11-15-51.png
x
Image from iOS.png
Yes
Your UI is also using this call
s
what are the input values each we have to give
x
You can check the payload from your local setup as an example call
In your browser right click and check the inspect
Then update the zk record to check the network call and the paylod
This call is using url parameter to pass all the info
You so you also need to do uri encode
s
yeah i understand that,
here what does, path and data denotes?
assume this is the json :
{ "id": "githubEvents_OFFLINE_1514804400000_1514805700000_0", "simpleFields": { "segment.crc": "1975231790", "segment.creation.time": "1629188299999", "segment.end.time": "1514805700000", "segment.index.version": "v3", "segment.name": "githubEvents_OFFLINE_1514804400000_1514805700000_0", "segment.offline.download.url": "http://172.20.16.49:9000/segments/githubEvents/githubEvents_OFFLINE_1514804400000_1514805700000_0", "segment.offline.push.time": "1629188301815", "segment.offline.refresh.time": "-9223372036854775808", "segment.start.time": "1514804400000", "segment.table.name": "githubEvents", "segment.time.unit": "MILLISECONDS", "segment.total.docs": "10000", "segment.type": "OFFLINE" }, "mapFields": { "custom.map": { "input.data.file.uri": "file:/home/sas/pinot/examples/batch/githubEvents/rawdata_json_index/githubEvents_data.json" } }, "listFields": {} }
how to put in that zk node
assuming zk path will be this ---> /PropertyStore/table/segments
and data field is full updated json value
x
It’s part of the url
You can try once with ui
And check the url payload browser sent
s
Using UI it looks easy
Ok
x
Check the uri browser generated
It’s a PUT call
s
Ok
after updating, do we need to restart controller?
after updating, i tried querying the data, its not working
restarted all components, even then its not working
2021/08/18 121133.417 ERROR [BaseCombineOperator] [pqr-14] Timed out while polling results block, numBlocksMerged: 0 (query: QueryContext{_tableName='githubEvents_OFFLINE', _selectExpressions=[], _aliasList=[null], _filter=null, _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9998}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:githubEvents_OFFLINE), pinotQuery:PinotQuery(dataSource:DataSource(tableName:githubEvents_OFFLINE), selectList:[Expression(type:IDENTIFIER, identifier:Identifier(name:))], orderByList:[], limit:10, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9998}))}) 2021/08/18 121133.419 INFO [QueryScheduler] [pqr-14] Processed requestId=10,table=githubEvents_OFFLINE,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=1,reqDeserMs=0,totalExecMs=9999,resSerMs=0,totalTi
---------------------
even reloaded the segment too
@Xiang Fu
x
can you check if pinot server loaded the segment ?
s
its not loaded it seems
Screenshot from 2021-08-18 13-18-31.png
i inititated 1hour back today
now also i reloaded again, but no use
Screenshot from 2021-08-18 13-26-33.png
@Xiang Fu please let me know, what is next step to solve this?
x
hmm can you try to call the reload api?
is there any exceptions
s
Screenshot from 2021-08-18 14-33-38.png
x
another try is to delete the local directory of that segment on server and then restart server
s
i tried, but no use
x
any exceptions from server?
s
ERROR [BaseCombineOperator] [pqr-5] Timed out while polling results block, numBlocksMerged: 0 (query: QueryContext{_tableName='githubEvents_OFFLINE', _selectExpressions=[], _aliasList=[null], _filter=null, _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9998}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:githubEvents_OFFLINE), pinotQuery:PinotQuery(dataSource:DataSource(tableName:githubEvents_OFFLINE), selectList:[Expression(type:IDENTIFIER, identifier:Identifier(name:))], orderByList:[], limit:10, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9998}))})
only timeout error coming from logs
x
hmm, do you see segment loading logs ?
can you search for that segment
s
No
x
technically you should see a segment loading info from there server log
s
2021/08/18 144101.333 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Reloaded segment: githubEvents_OFFLINE_1514804400000_1514805700000_0 in table: githubEvents_OFFLINE 2021/08/18 144101.338 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Reloaded all segments in table: githubEvents_OFFLINE
got this message from logs
x
can you stop pinot server, delete server local data dir then start pinot server
and grep for the segment name from server log
s
2021/08/18 144101.329 INFO [ImmutableSegmentLoader] [HelixTaskExecutor-message_handle_thread] Successfully loaded segment githubEvents_OFFLINE_1514804400000_1514805700000_0 with config: org.apache.pinot.spi.env.PinotConfiguration@2729606a 2021/08/18 144101.329 INFO [githubEvents_OFFLINE-OfflineTableDataManager] [HelixTaskExecutor-message_handle_thread] Adding immutable segment: githubEvents_OFFLINE_1514804400000_1514805700000_0 to table: githubEvents_OFFLINE 2021/08/18 144101.329 INFO [githubEvents_OFFLINE-OfflineTableDataManager] [HelixTaskExecutor-message_handle_thread] Replaced immutable segment: githubEvents_OFFLINE_1514804400000_1514805700000_0 of table: githubEvents_OFFLINE 2021/08/18 144101.329 INFO [githubEvents_OFFLINE-OfflineTableDataManager] [HelixTaskExecutor-message_handle_thread] Closing segment: githubEvents_OFFLINE_1514804400000_1514805700000_0 of table: githubEvents_OFFLINE 2021/08/18 144101.330 INFO [ImmutableSegmentImpl] [HelixTaskExecutor-message_handle_thread] Trying to destroy segment : githubEvents_OFFLINE_1514804400000_1514805700000_0 2021/08/18 144101.332 INFO [githubEvents_OFFLINE-OfflineTableDataManager] [HelixTaskExecutor-message_handle_thread] Closed segment: githubEvents_OFFLINE_1514804400000_1514805700000_0 of table: githubEvents_OFFLINE
x
hmm it says success loaded
s
u mean to delete data dir from tmp
x
yes
so pinot server won’t look find anything on local disk
also try to give longer timeout from query console
s
how to increase timeout?
x
image.png
s
i restarted server after removing data dir
x
cool, let’s wait on the server reloading
s
and gave 30sec as timeout
still not working
Screenshot from 2021-08-18 14-50-19.png
properly added in hdfs
Screenshot from 2021-08-18 14-54-14.png
but querying is problem now
please refer above screenshots for reference
x
Hmm
Do you find the segment files under your server data directory
s
no
x
Hmm
Did you add hdfs access on server configs
s
yes
x
any logs on server when you reboot server?
s
iam able to read other table whcih is configured for hdfs deepstore
x
Ok
Another try is to create a new table then use this segment to do a metadata push and see if the new table can load it
s
but i cant do this in production
x
Interesting, it also shows the segment is in good status
Oh
s
yeah
x
So you cannot delete the local disk
s
i deleted that
x
Ok
s
nothing there in local disk
x
Oh, you mean create a new table ?
s
yeah
in production already some tables are there, so i need to migrate
i cant delete the data
i ahve around 1TB of data which needs to be migrated
x
Ok, so you are testing the migration process
s
yes exactly
x
And your local server can access hdfs?
s
yes
x
Can you try to create a new table in your local and use metadata push to give the hdfs uri ?
So the new table should download the segment from hdfs
Wanna isolate the issue first
s
u mean to change jobType ?
x
Yes, change job type to segmentmetadaPush
OutputDir to your hdfs
Then run the job
s
# jobType: Pinot ingestion job type. # Supported job types are: # 'SegmentCreation' # 'SegmentTarPush' # 'SegmentUriPush' # 'SegmentCreationAndTarPush' # 'SegmentCreationAndUriPush'
x
Change table name as well
s
OutputDir to your hdfs ----> controller segment directory
x
New Pinot version should have segmentMetadataPush
s
im using 0.8.0
x
Yes
s
recent release
x
Then it has
OutputDir is where this segment is on hdfs
Not your controller hdfs directory
It’s different
s
controller.data.dir=hdfs://localbdaas/akram/pinotcontroller/segments controller.local.temp.dir=/home/sas/temp/ controller.zk.str=localhost:2191 controller.enable.split.commit=true controller.access.protocols.http.port=9000 controller.helix.cluster.name=PinotCluster pinot.controller.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS pinot.controller.storage.factory.hdfs.hadoop.conf.path=/home/sas/hadoop/hadoop/etc/hadoop/ pinot.controller.segment.fetcher.protocols=file,http,hdfs pinot.controller.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher #pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.principle=<your kerberos principal> #pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.keytab=<your kerberos keytab> controller.vip.port=9000 controller.port=9000 pinot.set.instance.id.to.hostname=true pinot.server.grpc.enable=true
above is controller.conf
pinot.server.instance.enable.split.commit=true pinot.server.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS pinot.server.storage.factory.hdfs.hadoop.conf.path=/home/sas/hadoop/hadoop/etc/hadoop/ pinot.server.segment.fetcher.protocols=file,http,hdfs pinot.server.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher #pinot.server.segment.fetcher.hdfs.hadoop.kerberos.principle=<your kerberos principal> #pinot.server.segment.fetcher.hdfs.hadoop.kerberos.keytab=<your kerberos keytab> pinot.set.instance.id.to.hostname=true pinot.server.instance.dataDir=/home/sas/pinot/data/server/index pinot.server.instance.segmentTarDir=/home/sas/pinot/data/server/segment pinot.server.grpc.enable=true pinot.server.grpc.port=8090
above is server conf
x
Ok
s
which one are u referring?
are these conf fine?
x
The download uri
I don’t see much problem, so let’s first verify the hdfs access
I’m on my phone, so cannot verify the full details
s
ohhh
x
But if you can load data from deepstore already then I assume it works. Just first verify that works on your local as well
s
only controller ---> i have given hdfs path,
x
Yes
Server needs hdfs configs, so it can write to it
i used above doc to enable hdfs as deep store
but in server config, no config was there
related to hdfs
x
Hmm, I see, you need to config that for server as well, so server can read segment from hdfs
s
pinot.server.instance.dataDir=/path/in/local/filesystem/for/pinot/data/server/index pinot.server.instance.segmentTarDir=/path/in/local/filesystem/for/pinot/data/server/segment
above 2 properties, right?
x
Right
That should be local
s
i have local only as of now
this is bit confusing
x
My feeling is ur local setup cannot access hdfs, that's why segmentis not downloaded
s
i dont see any issue with hdfs access
please check above config and confirm me
if i am creating new table using hdfs deepstore, it is working fine.... only migration fails
if hdfs access is the problem, then the table new table created should not work
x
ic
hmm
can you try to delete the data dir from your local server again and restart
also can you compare the download uri from both table segments ?
s
it is correct
i compared that
x
can you try to delete server data dir and restart server
s
i did that too
x
wanna check if both table cannot download or just one
s
only one , which was migrated
other table is fine
x
hmm interested
s
yeah
interesting and confused as well
x
yeah, technically it should just download the segment from hdfs
s
yeah
x
can you paste two segment metadata here
s
same is happening for new table
one sec
x
one valid table and one problematic table
same means?
new table cannot download ?
s

https://apache-pinot.slack.com/files/U0208LD4LLA/F02BJEN7TNX/screenshot_from_2021-08-18_10-40-58.png

this table is working fine
this table doesnot work after migration
x
hmm the download url suffix is
.tar.gz
or just the segment name
s
just segment name
x
hmm what about the airineStats table
this is airlineStats
Screenshot from 2021-08-18 15-54-19.png
x
ic
can you check the segment file name on hdfs for airlineStats
s
i dont have segment file in hdfs
x
does it has
.tar.gz
suffix or not
s
it have no .tar.gz format
x
hmmm
s
just segment name
Screenshot from 2021-08-18 15-57-25.png
x
can you try to rename githubEvents segment to have
.tar.gz
suffix then reload the table
or restart server
s
in the above mentioned hdfs path?
x
yes
s
one sec
x
on hdfs and change the segment download uri to have .tar.gz suffix as well
I will check tomorrow, need to sleep now
s
yeah ok
thanks