What api call would that be?
# general
e
What api call would that be?
m
@Elon The data dir in controller conf will be the deep-store where controller will store the segment, when a segment is uploaded
The controller's upload api will make a copy from source uri to dest uri (they can both be Gcs).
e
I'm not sure what I did yesterday when testing but I see one segment there, what api call do I call to upload a segment and have it stored there? Is there a curl example for that?
m
The same curl for uploading segment will do that
there is no additional api requirement just to save a copy on controller's deep store
e
I'm doing this
Copy code
curl -X POST -H "UPLOAD_TYPE:URI" -H "DOWNLOAD_URI:<gs://mybucket/segments/archive.tgz>" -H "content-type:application/json" -d '' localhost:9000/segments
And I see the segment going to the servers but not the "deep" store.
m
What is your dataDir in controller conf?
e
m
Then you should see
<gs://mybucket/data/><table>/<segment>
You are not seeing that? Even though your segment upload was successful?
May be a PinotFs Gcs implementation bug?
e
I hope so:) That would be easier to track down:) Would it be calling GcsPinotFS.copy(URI, URI)?
Or is it a move command?
Wanted to add extra logging to see
m
Let me check the code
👍 1
Copy code
private void moveSegmentToPermanentDirectory(File currentSegmentLocation, URI finalSegmentLocationURI)
      throws Exception {
    PinotFS pinotFS = PinotFSFactory.create(finalSegmentLocationURI.getScheme());

    // Overwrites current segment file
    <http://LOGGER.info|LOGGER.info>("Copying segment from {} to {}", currentSegmentLocation.getAbsolutePath(),
        finalSegmentLocationURI.toString());
    pinotFS.copyFromLocalFile(currentSegmentLocation, finalSegmentLocationURI);
  }
e
Oh great, what class is that in?
Ah, the ZkOperator?
m
Yeah, lol
e
Nice, fingers crossed - I hope it's my bug 🙂
Added a bunch of logging
m
Yeah, that would help trace what happened
e
So it looks like the
uploadSegmentAsJson()
skips the move
i.e. I see
Copy code
Adding new segment ...
and then
Copy code
Skipping segment move, keeping segment at gs://...
m
hmm
e
The other endpoints in PinotSegmentUploadDownloadRestletResource set the flag to "true" but not uploadAsJson
m
which flag?
Ok found it
e
Copy code
moveSegmentToFinalLocation
m
Let me check the motivation for that
👍 1
e
I will try /v2/segments
m
There may be a uri upload command that does not require header json,
Copy code
// We use this endpoint with URI upload because a request sent with the multipart content type will reject the POST
  // request if a multipart object is not sent. This endpoint does not move the segment to its final location;
  // it keeps it at the downloadURI header that is set. We will not support this endpoint going forward.
Your segment is not
json
right?
e
No, it's a tgz archive
m
Ok, let me get the uri push example
e
with star tree index, etc
Thanks!
m
can you try
/v2/segments
?
Recalling the history, I think the original endpoint's behavior was to not move, and for not creating backward compatibility issues, we ended up creating v2
e
Nice, that worked!
local archive.tgz -> segment
m
👍
e
Thanks!!!
m
@Jennifer Dai could you please update the doc?
🙏 1
e
Will be updating the pull request
Appreciate all the help from everyone!
👍 1
sorry about that @Elon thanks for helping us debug
e
No worries, you all are really helpful, great community, hopefully I have something valuable to contribute back:)
🤩 1
So although I see the segments in the segment api, when I go to select count(*) from the table it doesn't include these offline segments
Maybe the offline json schema was incorrect? The segments/<table> seem to list the offline segments
m
Can you specifically query _OFFLINE table ( add suffix to table name)
If you get that then time boundary is probably incorrect
e
I tried and get no data
Does that mean the offline schema is incorrect?
m
I doubt that is the issue
Any way for you to run DumpClusterInfo command?
It is in pino-admin
It will tell what ideal state and external view look like
e
Sure, is there a rest api for that also?
m
Yeah
“/idealState”
“/externalView”
e
From the controller? like <controller host>:9000/idealState/externalView ?
m
Yes
Separately
For each
e
I don't see it -
Copy code
localhost:9000/idealState
Gets 404
m
Not in front of my mac
👍 1
May have table name
e
Got it!
I see that the idealState shows the offline segments
status is "ONLINE"
m
And external view?
e
Aha:) That one says status "ERROR" for all the offline segments
m
Access to server log?
e
Yep, I will check
m
My guess segment download uri in Zk incorrect
Bug
👍 1
e
So I should look in zk operator then?
m
No segment metadata
API for that as well
So sorry in the gym right now
Should look like table/segments/segmentName
e
Oh, enjoy your workout! Thanks for all the help - if you are around later lmk 🙂 I will check - so the metadata.properties is incorrect then?
m
Yes
I’ll be available in half hour
👍 1
back now
e
🙂
m
were you able to find the rest api for segment metadata?
it is not the metadata.properties
e
Ah ok, checking that now
m
"/segments/{tableName}/{segmentName}"
try ^^
there will be a zkdownloaduri
see if it matches where you expect the segment to be in Gcs
e
Ah, I think URIUtils is putting extra "/"'s - I need to handle that in the GcsPinotFS
Should have something working soon:)
m
i see
but the canonical path looks ok?
e
So now I see the file being backed up but the servers give an error of "unknown host" - they just used the hostname:
Copy code
2019/12/13 00:59:18.284 ERROR [ServerSegmentCompletionProtocolHandler] [flattened_orders_hours__15__0__20191213T0049Z] Could not send request <http://pinot-controller-0:9000/segmentConsumed?name=flattened_orders_hours__15__0__20191213T0049Z&offset=3188066&instance=Server_pinot-server-0.pinot-server-headless.pinot.svc.cluster.local_8098&reason=rowLimit&rowCount=100000>
<http://java.net|java.net>.UnknownHostException: pinot-controller-0
	at <http://java.net|java.net>.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_232]
	at <http://java.net|java.net>.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_232]
	at <http://java.net|java.net>.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_232]
	at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) ~[httpclient-4.5.3.jar:4.5.3]
	at org.apache.pinot.common.utils.FileUploadDownloadClient.sendRequest(FileUploadDownloadClient.java:360) ~[pinot-common-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
	at org.apache.pinot.common.utils.FileUploadDownloadClient.sendSegmentCompletionProtocolRequest(FileUploadDownloadClient.java:635) ~[pinot-common-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
	at org.apache.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.sendRequest(ServerSegmentCompletionProtocolHandler.java:184) ~[pinot-core-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
	at org.apache.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.segmentConsumed(ServerSegmentCompletionProtocolHandler.java:151) ~[pinot-core-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.postSegmentConsumedMsg(LLRealtimeSegmentDataManager.java:848) ~[pinot-core-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:526) ~[pinot-core-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
So it looks like the instances endpoint shows why the controller hostname is unqualified
Looks like I got it to work by updating controller host with fdqn name
m
Are you able to query now?
e
Yep
I will update the pull request, thanks so much for the help!
m
Awesome!
Also it would be good to learn from your experience and improve our documents and process. Would you be willing to share you experience
e
Absolutely!