This message was deleted.
# dev
s
This message was deleted.
c
will have a look, any idea on the right way to fix this?
x
not sure, I didn't look in detail into what is causing the massive allocations
I have a feeling that some of the json parsing is getting evaluated lazily
c
yeah flattener stuff is all lazy, which is why it wasn’t working with the wrapper
i guess i did this before i did the new type-aware schema discovery stuff, so maybe i have a better way to do it now
hmm, though i sort of wonder if the json input format (and all of the other input formats that support nested data such as avro, orc, parquet) would look something like this if used directly since most of the time actually seems to be in the flattener stuff
hmm, maybe finally need to do the thing where we actually resolve all of the columns that are required from the input reader for dims/transforms/aggs and just eagerly convert them all to a plain map, will do some experiments
thanks for sharing 👍
x
I can share the full profiles if that helps
I can send inputSpecs privately if you need to
c
i think im ok, but will ping you if needed
do you define a schema or use schema discovery?
x
we define a schema
c
cool
x
useFieldDiscovery is true on the flattenspec, but we also specify fields directly for nested ones
c
i wonder how many events i need to see a similar profile, how big was this stream in the profile?
i suppose it probably doesn’t take that much to make the flattener stand out though, its so expensive
x
not that big, 20k events/s
c
cool, i think that should be enough info, thanks, will reach out if i need anything else
so, it does look like the ‘regression’ profiles the same way as using the input format directly
using json input format directly
using kafka input format in master
reverted commit in question
going to repeat the reverted measurement again soonly just to make sure
repeat looks same-ish as last run
so like… the interesting thing here, wall-time wise the reverted commit runs took like 30-40 seconds longer to process the same number of rows
so it is perhaps only a regression for some schema shapes
my flattenSpec on my generated data
and dims
its fairly flatten heavy i guess, idk how that compares to your schema
my screenshots were cpu time, which makes sense that they would be different given that the reverted commit makes the flattening eager
looking at memory allocations, it doesn’t seem to be that much extra data, but it is split between parse and add methods
well i guess its a decent chunk extra
maybe 10-20gb or so (slight variation between runs)
reverted allocations
plain json input format:
another view of plain json format
here is what i mean about wall-time
kafka input format in master:
kafka input format with revert
im not sure why that is the case, could be related to my test data schema
i repeated tests to try to get same-ish results and they were pretty consistent
my test data was that above schema with 10m generated rows
and i just let them run until processed all the rows and then shut off profiler without forcing a persist
i should probably repeat the measurements but suspend the supervisor at the end of the run and see if the lazy nature of using flattener backed map instead of eager copying is related to why it seemed faster without the revert
will do that experiment in a bit
anyway, so far have at least confirmed my suspicions that my changes made the kafka reader perform the same as the underlying reader
so the regression perhaps is more of a case of uncovering a performance issue/difference in the underlying stuff, so the “fix” will likely be changes to how flattening works rather than a fix specific to the kafka reader
and i think it means my change is basically “correct” since its delegating to the underlying reader, just the underlying reader could be better
if that makes sense
btw, adding suspending the task to the captures results in them having approximately the same total time, just spent in different places
reverted
master
allocated size was actually smaller for some reason on master than reverted on this run
x
We saw massive GC pressure with the change compared to before. It might be an artifact of our spec but the difference was large
The flame graphs I posted in GitHub were allocation profiles, not cpu profiles.
c
is that different than the ‘memory allocations’ of profiling with intellij?
x
No probably the same assuming it uses asyncprofiler.
c
could share your spec so i can compare to mine and adjust my testing one to try to reproduce, i didn’t really have any aggs or transforms on mine, so that might have blown up the cost of the flattener due to repeated reads of the same value
though im certain that there is definitely a difference between eager and lazy flattening, maybe its enough evidence to just always eagerly flatten
just being cautious since the same pattern is used by avro/json/orc/parquet/protobuf so the right way to do this will probably impact all of them
x
We have fairly large nested payloads of which we only keep a relatively small number of nested fields.
c
ah yeah deeper nesting would also probably exaggerate things compared to my run
since it has to recreate the stuff all along the path for each flatten value
x
Doing it lazily might also keep the full json payload longer in memory, causing the GC to have to work harder
But I’m just guessing here