This message was deleted Observable-Community #help

Join Slack

This message was deleted.

# help

Slackbot

03/20/2023, 11:13 PM

This message was deleted.

Fil

03/21/2023, 6:45 AM

ah…! was going to say my formula works only for linear scales 😄 You can see the code starting at line 240.

👍 1

Yuri Vishnevsky

03/21/2023, 12:27 PM

Thanks!

Yuri Vishnevsky

03/21/2023, 12:27 PM

Yeah,the log scale makes this tricky (also makes interpretation tricky which is why we wanted to try normalizing by the dx)

Yuri Vishnevsky

03/21/2023, 12:33 PM

Reading the code I noticed it samples at grid centers (I had assumed corners) so my previous approach was also not right in other ways

Fil

03/21/2023, 1:42 PM

"normalizing" a log scale is ~~bonkers~~ like dividing by the value? since dln x/dx = 1/x?

Yuri Vishnevsky

03/21/2023, 1:54 PM

In this example I’m plotting a histogram heatmap/raster of counts on a log axis, which there are a few good reasons to try with this particular use case – but which means that the evenly-sized screen space bins represent different lengths in data space. This means that, for example, if your data was uniformly distributed, it wouldn't show up as visually uniform – the longer bins would have higher counts. So the idea is to normalize each "cell" by the "width" of its log bin and plot the width-normalized counts, so that a uniform distribution would appear visually uniform (all else being equal, wider bins have more of a chance for data to fall into them, which the normalization controls for)

Yuri Vishnevsky

03/21/2023, 2:03 PM

the reason it's a heatmap and not a bar chart is because I'm plotting a bunch of these histograms over time, to show the time evolution of a distribution, with each time slice visualized the same way as the example above. That also got a bit weird since my time axis is ordinal (it's an array of histograms). Not sure if faceting would help, since it would probably still require multiple samples at the same latency for each histogram.

Yuri Vishnevsky

03/21/2023, 2:10 PM

Another way to put it is that I'm using the raster mark as a way to do screen-space-dependent binning, but with a nonlinear axis the bins become varying-length so I want a way for the sampling function to be aware of the "area" that each sample represents so that I can control for it. This interpretation might get very fiddly in the more abstract setting where you have other kinds of interpolation, but I've been using it with

imageRendering: pixelated

where each sample turns into a rectangular area on the screen

Fil

03/21/2023, 2:29 PM

OK, so in this case it's "easy", since the width of a log bin (in data space) is proportional to its value.

Yuri Vishnevsky

03/21/2023, 2:32 PM

...hah

Yuri Vishnevsky

03/21/2023, 2:32 PM

that is a very astute observation

Yuri Vishnevsky

03/21/2023, 2:33 PM

or is it... let me think about it a bit more

Yuri Vishnevsky

03/21/2023, 2:33 PM

😄

Yuri Vishnevsky

03/21/2023, 2:34 PM

reminds me of the joke about a mathematician who was lecturing in front of a class, remarking that some fact is "obvious", then ends up having to think about it for an hour before concluding that yes, it is obvious

Yuri Vishnevsky

03/21/2023, 2:55 PM

ok, if I understood your point, then 1. if the samples don't need to be interpretable then I can divide each sample value by log(x) before returning it. This will normalize each bin by a factor that grows at the same rate as the bin width. 2. if I want the samples to be interpretable, then I need to invert the specific log scale used by the chart, since otherwise there's still a remaining constant factor that scales all the bins after the log(x) normalization. 3. since the underlying function that I am sampling is a CDF but I want to visualize the PDF, I would still need to invert by specific x scale in order to get the precise bin edges in order to compute

pdf(x) = cdf(x + binwidth/2) - cdf(x - binwidth/2)

Yuri Vishnevsky

03/21/2023, 2:58 PM

on a slightly different but related note, a possible API idea would be if there was a "postprocessing" hook that accepted a 2D raster grid and did whatever transformations it wanted after all of those counts were available. Then I could just sample the CDF at the precise sample values (though there is another subtlety to do with the 0.5 sample offset, which is that I would need to somehow ensure that I sample the CDF at its extremes, and not just in the middle of the grid cells), and compute the PDF in post

Yuri Vishnevsky

03/21/2023, 3:00 PM

I think my density plot mark prototype had this kind of postprocessing hook, which was useful when you wanted to normalize the rows or columns of the heatmap by a property such as the total weight in that row or column. edit: though maybe there's a way to use

normalizeX

and friends instead

Fil

03/21/2023, 4:52 PM

I think you should divide each sample by x not log(x)

Fil

03/21/2023, 4:53 PM

(I may be wrong—it's kind-a difficult to think about this in the abstract)

Yuri Vishnevsky

03/21/2023, 4:54 PM

ah, hm - i'll make an example notebook in a bit - with a uniform dataset it should be easy to tell if the normalization is off (Edit: still planning to look into this. I think you might be right, though - if eg. a square near the data value 100 on a log 10 axis covers 100 data units in 1 unit of screen space while only 1 unit would be covered near the data value 1…)

Open in Slack

Previous Next