Xiang Fu
Yupeng Fu
03/12/2020, 5:08 PMXiang Fu
Xiang Fu
Xiang Fu
Xiang Fu
Yupeng Fu
03/12/2020, 5:12 PMdistinctcounthll(toByte(column))
is equivalent to fasthll(column)
Yupeng Fu
03/12/2020, 5:13 PMdistinctcounthll
just expects the serialization done early in the ingestion phase?Xiang Fu
Yupeng Fu
03/12/2020, 5:15 PMfasthll
to distinctcounthll
. the query runs but they saw drastically different result,Xiang Fu
Xiang Fu
Yupeng Fu
03/12/2020, 5:15 PMdisintcounthll
to validate the expected format would be helpful, and reduce such user confusionYupeng Fu
03/12/2020, 5:16 PMfasthll
would be goodXiang Fu
Xiang Fu
private static HyperLogLog convertStringToHLL(String value) {
char[] chars = value.toCharArray();
int length = chars.length;
byte[] bytes = new byte[length];
for (int i = 0; i < length; i++) {
bytes[i] = (byte) (chars[i] - BYTE_TO_CHAR_OFFSET);
}
return ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(bytes);
}
Xiang Fu
Yupeng Fu
03/12/2020, 5:59 PMFastHLLAggregationFunction
but not distinctcounthll
?Xiang Fu
Yupeng Fu
03/12/2020, 5:59 PMdistinctounthll
on string value directlyXiang Fu
Xiang Fu
Xiang Fu
Yupeng Fu
03/12/2020, 6:00 PMXiang Fu
@Override
public void aggregate(int length, AggregationResultHolder aggregationResultHolder, BlockValSet... blockValSets) {
DataType valueType = blockValSets[0].getValueType();
if (valueType != DataType.BYTES) {
HyperLogLog hyperLogLog = getDefaultHyperLogLog(aggregationResultHolder);
switch (valueType) {
case INT:
int[] intValues = blockValSets[0].getIntValuesSV();
for (int i = 0; i < length; i++) {
hyperLogLog.offer(intValues[i]);
}
break;
case LONG:
long[] longValues = blockValSets[0].getLongValuesSV();
for (int i = 0; i < length; i++) {
hyperLogLog.offer(longValues[i]);
}
break;
case FLOAT:
float[] floatValues = blockValSets[0].getFloatValuesSV();
for (int i = 0; i < length; i++) {
hyperLogLog.offer(floatValues[i]);
}
break;
case DOUBLE:
double[] doubleValues = blockValSets[0].getDoubleValuesSV();
for (int i = 0; i < length; i++) {
hyperLogLog.offer(doubleValues[i]);
}
break;
case STRING:
String[] stringValues = blockValSets[0].getStringValuesSV();
for (int i = 0; i < length; i++) {
hyperLogLog.offer(stringValues[i]);
}
break;
default:
throw new IllegalStateException(
"Illegal data type for DISTINCT_COUNT_HLL aggregation function: " + valueType);
}
} else {
// Serialized HyperLogLog
byte[][] bytesValues = blockValSets[0].getBytesValuesSV();
try {
HyperLogLog hyperLogLog = aggregationResultHolder.getResult();
if (hyperLogLog != null) {
for (int i = 0; i < length; i++) {
hyperLogLog.addAll(ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(bytesValues[i]));
}
} else {
hyperLogLog = ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(bytesValues[0]);
aggregationResultHolder.setValue(hyperLogLog);
for (int i = 1; i < length; i++) {
hyperLogLog.addAll(ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.deserialize(bytesValues[i]));
}
}
} catch (Exception e) {
throw new RuntimeException("Caught exception while merging HyperLogLogs", e);
}
}
}
Xiang Fu
DistinctCountHLLAggregationFunction
, type check is on bytesXiang Fu
Xiang Fu
Yupeng Fu
03/12/2020, 6:55 PMdiscountCountHLL
and fastHLL
Xiang Fu
Xiang Fu
Xiang Fu
Yupeng Fu
03/12/2020, 8:01 PMXiang Fu
Xiang Fu
Xiang Fu