https://pinot.apache.org/ logo
n

Neha Pawar

05/05/2020, 5:22 PM
Good to know @Ranveer Singh. Another problem is the datatype for your DateTimeFieldSpecs. It should be STRING. I was able to generate sample data - the DateTimeFieldSpecs show up correctly
r

Ranveer Singh

05/06/2020, 10:07 AM
Neha, I see one problem. I dont see any dateTime column in metadata. Because of this i am not able to plot time on graphs
In SegmentsConfig , if i am mentioning this columns , i am getting classcast expception
n

Neha Pawar

05/06/2020, 4:15 PM
which metadata?
so you dont see the date_time columns when you do select * ?
r

Ranveer Singh

05/06/2020, 4:17 PM
yes i can see that as column
message has been deleted
left section on pinot Data Explorer
When i configure the table in SuperSet, i dont see the columns there
message has been deleted
n

Neha Pawar

05/06/2020, 4:21 PM
you see it in pinot data explorer left section? but not in superset?
strange, let me check
r

Ranveer Singh

05/06/2020, 4:21 PM
no not able to see in pinot data explorer as well
i was refering same as table metadata
but coming as query result in Pinot data Expoloreer
n

Neha Pawar

05/06/2020, 4:24 PM
i think Pinot data explorer hides time columns. And as long as it is queryable i guess it doesn't matter. But regarding Superset, maybe the connector hasn't been written to understannd DateTime columns.
either case, can SuperSet understand time columns of the format like you have?
or does it need epoch millis?
r

Ranveer Singh

05/06/2020, 4:25 PM
let me check that as well
If i refer code of pinot, for dateTimeSpecs, "segmentsConfig": { "timeColumnName": "StatusRecModifyTS", "schemaName": "rfp", "replication": "1" },
timeColumnName is not supported
as that gives classCastException
n

Neha Pawar

05/06/2020, 4:27 PM
yes, dateTimeFieldSpec will not work as time column. We are working to fix that.
but Superset shouldn't care if it is a time column or not in pinot
r

Ranveer Singh

05/06/2020, 4:28 PM
yes thats the problem..
i can see with some loaded example, date and dateTime supported in Superset
You have sample data yday which you validated, can you check whether you are able to see those columns
n

Neha Pawar

05/06/2020, 4:31 PM
sure
r

Ranveer Singh

05/06/2020, 4:32 PM
message has been deleted
n

Neha Pawar

05/06/2020, 4:39 PM
what i mean is - pinot has 3 types of columns - Dimension, Metric, Time. Recently, we added DateTime. I think the SuperSer-Pinot connector has been written for only Dimension, Metric and Time. Maybe it is not reading DateTime.
r

Ranveer Singh

05/06/2020, 4:47 PM
got it..
This is what i was guessing..
Let me chekout the superset code.
n

Neha Pawar

05/06/2020, 5:00 PM
can you try one thing - instead of putting the time columns in dateTimeFieldSpecs, put them as STRING in dimensions
r

Ranveer Singh

05/06/2020, 5:24 PM
Intially i did that but i need time series chart based on those metrics
n

Neha Pawar

05/06/2020, 5:25 PM
they can be dimensions in pinot, you can mark them as temporal in Superset right?
SuperSet shouldnt care how the column is configured in Pinot
r

Ranveer Singh

05/06/2020, 5:26 PM
let me try what you are telling..
n

Neha Pawar

05/06/2020, 5:30 PM
i confirmed that the Pinot-SuperSet connector ignores dateTimeFieldSpec 🙂 I will see if I can fix that, but until then, putting it as dimension is the only option
r

Ranveer Singh

05/06/2020, 5:32 PM
Thanks Neha.. Thanks for clarity..
n

Neha Pawar

05/06/2020, 5:33 PM
is your data APPEND (append some data hourly/daily) or REFRESH (upload the whole snapshot data at once)?
r

Ranveer Singh

05/06/2020, 6:06 PM
we want to do in append mode..
batch i am just tryin to understand.. In real we have data ingestion directly from kafka
Again blocked on Temporal
using following date-time format:
Sun Apr 26 013801 UTC 2020 %a %b %d %H:%M:%S %Z %Y
but not working
n

Neha Pawar

05/06/2020, 6:12 PM
you changed it to dimension? can you atleast see the column in SS this time?
something wrong with this pattern then
%a %b %d %H:%M:%S %Z %Y
?
r

Ranveer Singh

05/06/2020, 6:20 PM
yes , that is coming
for a given datetime , looks correct..
but not working in superset
n

Neha Pawar

05/06/2020, 6:27 PM
when you say not working, what is exactly happening ?
I tried out with my sample data. It's not going to work because format is not supported by superset. If you see the comment below the "DateTime format" text box:
Copy code
The pattern of timestamp format. For strings use python datetime string pattern expression which needs to adhere to the ISO 8601 standard to ensure that the lexicographical ordering coincides with the chronological ordering. If the timestamp format does not adhere to the ISO 8601 standard you will need to define an expression and type for transforming the string into a date or timestamp. Note currently time zones are not supported. If time is stored in epoch format, put epoch_s or epoch_ms.
"needs to adhere to ISO 8601 standard". The format you have is not ISO standard: https://en.wikipedia.org/wiki/ISO_8601
Here's a suggestion: Convert the values to millis while creating segment, using Groovy transform functions. Here's the schema i used to do this
Copy code
{
  "schemaName": "rfp",
  "dimensionFieldSpecs": [
    {
      "name": "status",
      "dataType": "STRING"
    },
    {
      "name": "fulfilmentType",
      "dataType": "STRING"
    },
    {
      "name": "soOrderHeaderKey",
      "dataType": "STRING"
    },
    {
      "name": "SONumber",
      "dataType": "STRING"
    },
    {
      "name": "CommsResponse",
      "dataType": "INT"
    },
    {
      "name": "extnOriginalNo",
      "dataType": "INT"
    },
    {
      "name": "messageId",
      "dataType": "STRING"
    },
    {
      "name": "orderLineKey",
      "dataType": "STRING"
    },
    {
      "name": "fulfilmentSubType",
      "dataType": "STRING"
    },
    {
      "name": "storeId",
      "dataType": "STRING"
    },
    {
      "name": "soOrderLineKey",
      "dataType": "STRING"
    },
    {
      "name": "primeLineNumber",
      "dataType": "STRING"
    },
    {
      "name": "PONumber",
      "dataType": "STRING"
    },
    {
      "name": "itemId",
      "dataType": "STRING"
    },
    {
      "name": "orderHeaderKey",
      "dataType": "STRING"
    },
    {
      "name": "releaseStatusKey",
      "dataType": "STRING"
    },
    {
      "name": "RFP",
      "dataType": "STRING"
    },
    {
      "name": "EmailAck",
      "dataType": "STRING"
    },
    {
      "name": "StatusRecModifyMillis",
      "dataType": "LONG",
      "transformFunction": "Groovy({new Date().parse('EEE MMM dd HH:mm:ss z yyyy', StatusRecModifyTS).getTime()}, StatusRecModifyTS)"
    },
    {
      "name": "StatusRecCreateMillis",
      "dataType": "LONG",
      "transformFunction": "Groovy({new Date().parse('EEE MMM dd HH:mm:ss z yyyy', StatusRecCreateTS).getTime()}, StatusRecCreateTS)"
    },
    {
      "name": "EmailSendCreateMillis",
      "dataType": "LONG",
      "transformFunction": "Groovy({new Date().parse('EEE MMM dd HH:mm:ss z yyyy', EmailSendCreate).getTime()}, EmailSendCreate)"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "TimeTaken",
      "dataType": "LONG"
    }
  ]
}
LMK if this works out!
r

Ranveer Singh

05/07/2020, 12:30 PM
Looks like issue with CSV reader. It is uploading -9223372036854776000 which id default Long value for these columns. I tried running this ExpressionTransformerTest with adding this as usecase that is working fine
Copy code
@Test
public void testDateTransformFromString() {
  Schema pinotSchema = new Schema();
  DimensionFieldSpec dimensionFieldSpec = new DimensionFieldSpec("StatusRecCreateTSInMillis", FieldSpec.DataType.LONG, true);
  dimensionFieldSpec.setTransformFunction("Groovy({new Date().parse('EEE MMM dd HH:mm:ss z yyyy', StatusRecModifyTS).getTime()}, StatusRecModifyTS)");

  pinotSchema.addField(dimensionFieldSpec);
  ExpressionTransformer expressionTransformer = new ExpressionTransformer(pinotSchema);

  GenericRow genericRow = new GenericRow();
  genericRow.putValue("StatusRecModifyTS", "Sun Apr 26 01:38:01 UTC 2020");
 // genericRow.putValue("StatusRecCreateTSInMillis", "1587865081000");


  // no transformation
  expressionTransformer.transform(genericRow);
  Assert.assertEquals(genericRow.getValue("StatusRecCreateTSInMillis"), 1587865081000L);

 /* pinotSchema = new Schema();
  TimeFieldSpec timeFieldSpec = new TimeFieldSpec(new TimeGranularitySpec(FieldSpec.DataType.LONG, TimeUnit.MILLISECONDS, "incoming"), new TimeGranularitySpec(
          <http://FieldSpec.DataType.INT|FieldSpec.DataType.INT>, TimeUnit.DAYS, "outgoing"));
  pinotSchema.addField(timeFieldSpec);
  expressionTransformer = new ExpressionTransformer(pinotSchema);

  genericRow = new GenericRow();
  genericRow.putValue("incoming", "123456789");
  genericRow.putValue("outgoing", "123");

  // no transformation
  expressionTransformer.transform(genericRow);
  Assert.assertEquals(genericRow.getValue("outgoing"), "123");*/
}
looks like CSVReader ignoring the expression..My guess
n

Neha Pawar

05/07/2020, 1:38 PM
Are you using latest code, or the release?
This is not available in release
It worked for me. I was using latest code
r

Ranveer Singh

05/07/2020, 2:52 PM
I am using the latest code.. And deployed locally
but i build last week..'
let me take the latest and rebuild and try
n

Neha Pawar

05/07/2020, 3:21 PM
hmm last week's code should have had it.. anyway, latest will work. even i tried using docker which is 1 week ago's code, and it didnt work for me. When i moved to latest code locally, it worked
r

Ranveer Singh

05/07/2020, 4:05 PM
let me check..
do you any specific branch where latest code are there
n

Neha Pawar

05/07/2020, 4:47 PM
no, just the latest master
r

Ranveer Singh

05/07/2020, 4:47 PM
i see i have taken master
ok thanks
n

Neha Pawar

05/08/2020, 4:44 PM
did it work?
r

Ranveer Singh

05/11/2020, 10:50 AM
Yes it worked.. Thank you for late response