410 points by rossant 18 hours ago | 34 comments
CreRecombinase 17 hours ago
Every two weeks or so I peruse github looking for something like this and I have to say this looks really promising. In statistical genetics we make really big scatterplots called Manhattan plots https://en.wikipedia.org/wiki/Manhattan_plot and we have to use all this highly specialized software to visualize at different scales (for a sense of what this looks like: https://my.locuszoom.org/gwas/236887/). Excited to try this out
clewis7 15 hours ago
Hey! This sounds like a really interesting use case. If you run into any issues or need help with the visualization, please don't hesitate to post an issue on the repo. We can also think about adding an example demo of a manhattan plot to help too!
j_bum 13 hours ago
If you’re working in R with ggplot2, you could also consider the `ggrastr` package, specifically, `ggrastr::geom_point_rast`
samstave 7 hours ago
Have you tried ManimGL?

https://github.com/3b1b/manim/releases

Super awesome, and you can make it into an MCP for Cursor.

dcl 6 hours ago
I always thought it was interesting that my modern CPU takes ages to plot 100,000 or so points in R or Python (ggplot2, seaborn, plotnine, etc) and yet somehow my 486DX 50Mhz could pump out all those pixels to play Doom interactively and smoothly.
sieste 2 hours ago
This SO thread [1] analyses how much time ggplot spends on various tasks. Not sure if a better GPU integration to produce the visual output would help speed it up significantly.

[1] https://stackoverflow.com/questions/73470828/ggplot2-is-slow...

zoogeny 11 hours ago
> powered by WGPU, a cross-platform graphics API that targets Vulkan (Linux), Metal (Mac), and DX12 (Windows).

The fact that they are using WGPU, which appears to be a Python native implementation of WebGPU, suggests there is an interesting possible extended case. As a few other comments suggest, if one knows that the data is available on a machine in a cluster rather than on the local machine of a user, it might make sense to start up a server, expose a port and pass along the data over http to be rendered in a browser. That would make it shareable across the lab. The limit would be the data bandwidth over http (e.g. for the 3 million point case) but it seems like for simpler cases it would be very useful.

That would lead to an interesting exercise of defining a protocol for transferring plot points over http in such a way they could be handed over to a the browser WebGPU interface efficiently. Perhaps even a more efficient representation is possible with some pre-processing on the server side?

kushalkolar 10 hours ago
> the data is available on a machine in a cluster rather than on the local machine of a user

jupyter-rfb lets you do remote rendering for this, render to a remote frame buffer and send over a jpeg byte stream. We and a number of our scientific users use it like this. https://fastplotlib.org/ver/dev/user_guide/faq.html#what-fra...

> defining a protocol for transferring plot points

This sounds more like GSP, which Cyrille Rossant (who's made some posts here) works on, it has a slightly different kind of use case.

zoogeny 10 hours ago
What is GSP in this context? Searching Python GSP brings up Generalized Sequence Pattern (GSP) algorithm [1] and Graph Signal Processing [2], neither of which seem to be a protocol. I also found "Generic Signaling Protocol" and "Global Sequence Protocol" which also don't seem relevant. Forgive me if GSP is some well know thing which I am just not familiar with.

1. https://github.com/jacksonpradolima/gsp-py

2. https://pygsp.readthedocs.io/en/stable/

bglazer 9 hours ago
Graphics Server Protocol

Forgive me for doing this, but I used an LLM to find that. They’re exceptionally useful for disambiguation tasks like this. Knowing what an acronym refers to is very useful for next token prediction, so they’re quite good at it. It’s usually trivial to figure out if they’re hallucinating with a search engine.

[1] https://news.ycombinator.com/item?id=43335769

kushalkolar 9 hours ago
I don't think it's ready yet and I think it might be private at the moment, Cyrille can comment more on it.

But if I understand correctly it's a protocol for serializing graphical objects, pretty neat idea.

mkl 11 hours ago
WGPU is a Rust thing more than a Python thing.
almarklein 3 hours ago
To clarify this a bit, wgpu is a Rust implementation of WebGPU, just like Dawn is a C++ implementation of WebGPU (by Google). Both projects expose a C-api following webgpu.h. wgpu-py Should eventually be able to work with both. (Disclaimer: I'm the author of wgpu-py)
zoogeny 11 hours ago
Fair, I was looking at the wgpu-py [1] page but only skimmed it. It does indeed look like a wrapper over wgpu-native [2] which is written in Rust.

1. https://github.com/pygfx/wgpu-py

2. https://github.com/gfx-rs/wgpu-native

Swannie 6 hours ago
What you describe sounds a bit like Graphistry:

https://pygraphistry.readthedocs.io/en/latest/performance.ht...

wodenokoto 5 hours ago
How is it compared to HoloViz?[1]

I followed one of their online workshops, and it feels really powerful, although it is a bit confusing which part of it does what (it's basically 6 or 7 projects put together under an umbrella)

[1] https://holoviz.org/

almarklein 3 hours ago
One big difference is that Fastplotlib is based on GPU tech, so its capable of rendering much larger datasets interactively.
theLiminator 17 hours ago
Do you have any numbers for the rough number of datapoints that can be handled? I'm curious if this enables plotting many millions of datapoints in a scatterplot for example.
clewis7 15 hours ago
Yes! The number of data points can range in the millions. Quite honestly, the quality of your GPU would be the limiting factor here. I will say, however, that for most use cases, an integrated GPU is sufficient. For reference, we have plotted upwards of 3 million points on a mid-range integrated GPU from 2017.

I will work on adding somewhere in our docs some metrics for this kind of thing (I think it could be helpful for many).

enriquto 12 hours ago
>I will work on adding somewhere in our docs some metrics for this kind of thing (I think it could be helpful for many).

Certainly! A comparison of performance with specialized tools for large point clouds would be very interesting (like cloudcompare and potree).

Vipitis 12 hours ago
I have watched recordings of your recent representation and decided to finally give it a try last week. My goal is to create some interactive network visualizations - like letting you click/box select nodes and edges to highlight subgraphs which sounds possible with the callbacks and selectors.

Haven't had the time to get very far yet, but will gladly contribute an example once I figure something out. Some of the ideas I want to eventually get to is to render shadertoys(interactively?) into a fpl subplot (haven't looked at the code at all, but might be doable), eventually run those interactively in the browser and do the network layout on the GPU with compute shaders (out of scope for fpl).

kushalkolar 12 hours ago
Hi! I've seen some of your work on wgpu-py! Definitely let us know if you need help or have ideas, if you're on the main branch we recently merged a PR that allows events to be bidirectional.
crazygringo 15 hours ago
Sounds really compelling.

But it doesn't seem to answer how it works in Jupyter notebooks, or if it does at all. Is the GPU acceleration done "client-side" (JavaScript?) or "server-side" (in the kernel?) or is there an option for both?

Because I've used supposedly fast visualization libraries in Google Colab before, but instead of updating at 30 fps, it takes 2 seconds to update after a click, because after the new image is rendered it has to be transmitted via the Jupyter connector and network and that can turn out to be really slow.

ivoflipse 15 hours ago
Fastplotlib definitely works in Jupyterlab through jupyter-rfb https://github.com/vispy/jupyter_rfb

I believe the performance is pretty decent, especially if you run the kernel locally

Their docs also cover this as mentioned by @clewis7 below: https://www.fastplotlib.org/ver/dev/user_guide/faq.html#what...

kushalkolar 15 hours ago
Thanks Ivo!

Just to add on, colab is weird and not performant, this PR outlines our attempts to get jupyter-rfb working on colab: https://github.com/vispy/jupyter_rfb/pull/77

crazygringo 14 hours ago
Thanks. Yeah I've been baffled as to why just interactive Matplotlib with a Colab kernel is so slow. The Colab CPU is fast (enough), the network is fast, I haven't been able to figure out where the bottleneck is either.
paddy_m 13 hours ago
Is google colab slower than an equivalently powerful kernel running on a remote jupyter kernel? Are you running into network problems, or is it something specific to colab?
clewis7 15 hours ago
Thanks Ivo!
juliusbk 14 hours ago
This looks super cool! Looking forward to trying it.

I think a killer feature of these gpu-plotting libraries would be if they could take torch/jax cuda arrays directly and not require a (slow) transfer over cpu.

fpl-dev 14 hours ago
Thanks! That is a great question and one that I've we've been battling with as well. As far as we know, this is not possible due to the way different contexts are set up on the GPU https://github.com/pygfx/pygfx/issues/510

tinygrad which I haven't used seems torch-like and has a WGPU backend: https://github.com/tinygrad/tinygrad

juliusbk 14 hours ago
Yeah, I remember looking into it myself as well, and not finding any easy path. A shame.... Maybe there's a hard way to do it though :)
rossant 13 hours ago
I've been looking into this issue with Datoviz [1] following a user request. It turns out there may be a way to achieve it using Vulkan [2] (which Datoviz is based on) and CuPy's UnownedMemory [3]. I wrote a simple proof of concept using only Vulkan and CuPy.

I'm now working on a way for users to wrap a Datoviz GPU buffer as a CuPy array that directly references the Datoviz-managed GPU memory. This should, in principle, enable efficient GPU-based array operations on GPU data without any transfers.

[1] https://datoviz.org/

[2] https://registry.khronos.org/vulkan/specs/latest/man/html/VK...

[3] https://docs.cupy.dev/en/latest/reference/generated/cupy.cud...

rossant 48 minutes ago
kushalkolar 13 hours ago
This looks cools thanks! Makes me wonder if there's any way to do that with WGPU if WGPU is interfacing with Vulkan, probably not easy if possible I"m guessing.

WGPU has security protections since it's designed for the browser so I'm guessing it's impossible.

rossant 12 hours ago
Indeed, it doesn't seem to be possible at the moment, see e.g. https://github.com/gfx-rs/wgpu/issues/4067
paddy_m 13 hours ago
Wow. So are you saying that you can have some array on the GPU that you setup with python via CuPy, then you call to the webbrowser and give it the pointer address for that GPU array, and the browser through WASM/WebGPU can access that same array? That sounds like a huge browser security hole.
kushalkolar 13 hours ago
Yea the security issue is why I'm pretty sure you can't do it on WGPU, but Vulkan and cupy can fully run locally so it doesn't have the same security concern.
rossant 12 hours ago
Exactly, this is the sort of thing you can more easily do on desktop than in a web browser.
PerryStyle 13 hours ago
Would it be possible to leverage the python array api standard? Or is that more suited for just computations?
paddy_m 17 hours ago
Really nice post introducing your library.

When would you reach for a different library instead of fastplotlib?

How does this deal with really large datasets? Are you doing any type of downsampling?

How does this work with pandas? I didn't see it as a requirement in setup.py

Does this work in Jupyter notebooks? What about marimo?

kushalkolar 15 hours ago
Thanks!

> When would you reach for a different library instead of fastplotlib?

Use the best tool for your usecase, we're focused on GPU accelerated interactive visualization. Our use cases broadly are developing ML algorithms, user-end ML Ops tools, and looking live data off of live scientific instruments.

> How does this deal with really large datasets? Are you doing any type of downsampling?

Depends on your hardware, see https://fastplotlib.org/ver/dev/user_guide/faq.html#do-i-nee...

> How does this work with pandas? I didn't see it as a requirement in setup.py

If you pass in numpy-like types that use the buffer protocol it should work, we also want to support direct dataframe input in the future: https://github.com/fastplotlib/fastplotlib/issues/395

There are more low-level priorities in the meantime.

> Does this work in Jupyter notebooks? What about marimo?

Jupyter yes via juptyer-rfb, see our repo: https://github.com/fastplotlib/fastplotlib?tab=readme-ov-fil...

applied_heat 7 hours ago
Looking forward to checking out your library, thanks for sharing it with the world.

I’ve been using kst-plot for live streaming data from instruments and interactive plots. It’s fast and I haven’t found any limit for the amount of data it can plot. Development has basically stopped - the product is done, feature complete, and works perfectly! It is used by European and Canadian space agencies. Maybe it will be interesting to you to see how they have solved or approached some of the same problems you have solved or will also solve !

vegabook 1 hour ago
Syntax looks matplotlib-ish so we’re going back to 2003
jampekka 1 hour ago
I'd prefer even more matplotlib-ish. Don't fix what's not broken.

A major reason why other plotting libraries don't take of is use of complicated APIs. But data analysis doesn't need Application Programming Interfaces, it needs User Interfaces.

fransje26 59 minutes ago
> Don't fix what's not broken.

I would argue that the Matplotlib syntax is horribly broken (or rather, the Matlab syntax it historically tried to emulate, and had to stick with for better or worse..)

jampekka 56 minutes ago
What are your issues with the matplotlib API more specifically?
menaerus 1 hour ago
> of complicated APIs

of which the matplotlib is the embodiment. Terrible API with terrible terrible performance.

jampekka 1 hour ago
Not sure what you mean by complicated API. The (pylab) API is a very straightforward (mostly) immediate rendering typeish interface, with a lot of convenient shortcuts for operations used a lot in data analysis.

For architecture astronauts there's also the OOP API over which the pylab API is a wrapper.

Of course there are also a lot of all sorts of declarative APIs, which are popular with people copy-pasting code from cookbooks. These become very painful very fast if you do something that's not in the cookbook.

Matplotlib does struggle with performance in some/many cases, but it has little to do with the API.

menaerus 12 minutes ago
Just my personal experience from using the library for at least 7-8 years. So many things and concepts are glued onto each other, making the API so much non-intuitive whenever you try to do anything more sophisticated that isn't a 1:1 match from examples found in the cookbook. It's really a PITA and performance, I have to say this again, is really really bad. If this had been part of my daily job I would certainly try to switch to something else.
rossant 44 minutes ago
I feel like everyone has different expectations for a scientific plotting API. The tension between ease of use and expressivity is so strong that a one-size-fits-all solution is unlikely ever to exist.
disgruntledphd2 1 hour ago
Yeah, I'm not sure why anyone likes matplotlib, but then I guess I liked base-R which is even more niche, so :shrug:.
doright 10 hours ago
Sometimes I wish these plotting libraries were more portable beyond Python only. I was looking for something similar for Ruby just a while ago but the install instructions seemed out of date and unsupported on Windows.
noosphr 10 hours ago
Any sufficiently advanced plotting library with an api that can be called externally becomes indistinguishable from a GUI toolkit: https://www.gnu.org/software/guile/docs/guile-tut/tutorial.h...

Not sure if that is the right tutorial, but many years ago in the guile 1.x days I wrote a local visualizer for the data from a particle physics accelerator entirely in Guile and Gnuplot. It was very MVC and used guile as the controller and Gnuplot as the viewer.

Was it stupid? Yes. Did it work better than all the other tools I had at the time? Also yes.

kushalkolar 10 hours ago
I do not know ruby but sometimes that's an opportunity to try and make one which others will also find useful :)
lagrange77 15 hours ago
Nice, i'd be interested to know which method for drawing lines (which is hard [0]) it uses.

[0] https://mattdesl.svbtle.com/drawing-lines-is-hard

kushalkolar 14 hours ago
Almar made blog posts about the line shader he wrote!

https://almarklein.org/triangletricks.html

https://almarklein.org/line_rendering.html

A big shader refactor was done in this PR: https://github.com/pygfx/pygfx/pull/628

lagrange77 12 hours ago
Thank you!
pama 17 hours ago
I know 3D is in the roadmap. Once the basic functionality is in place, it would be great to also consider integrating molecular visualization or at least provide enough fast primitives to simplify the integration of molecular visualization tools with this library.
clewis7 15 hours ago
We are definitely looking forward to adding more 3D graphics in the future, and this sounds really cool. Would you mind posting an issue on the repo? I think this is something we would want to have on the roadmap or at least an open issue to plan out how we could do this. Thanks!
enriquto 12 hours ago
That would be preposterous if it wasn't so hilariously false:

> These days, having a GPU is practically a prerequisite to doing science, and visualization is no exception.

It becomes really funny when they go on to this, as if it was a big deal:

> Depicted below is an example of plotting 3 million points

Anybody who has ever used C or fortran knows that a modern CPU can easily churn through "3 million points" at more than 30 frames per second, using just one thread. It's not a particularly impressive feat, three million points is the size of a mid-resolution picture, and you can zoom-in and out those trivially in real-time using a CPU (and you could do that 20 years ago, as well). Maybe the stated slowness of fastplotlib comes from the unholy mix of rust and python?

Now, besides this rant, I think that fastplotlib is fantastic and, as an (unwilling) user of Python for data science, it's a godsend. It's just that the hype of that website sits wrong in me. All the demos show things that could be done much easier and just as fast when I was a teenager. The big feat, and a really big one at that, is that you can access this sort of performance from python. I love it, in a way, because it makes my life easier now; but it feels like a self-inflicted problem was solved in a very roundabout way.

cycomanic 11 hours ago
>> Depicted below is an example of plotting 3 million points

> Anybody who has ever used C or fortran knows that a modern CPU can easily churn through "3 million points" at more than 30 frames per second, using just one thread. It's not a particularly impressive feat, three million points is the size of a mid-resolution picture, and you can zoom-in and out those trivially in real-time using a CPU (and you could do that 20 years ago, as well). Maybe the stated slowness of fastplotlib comes from the unholy mix of rust and python?

That's a misrepresentation though, it's 3 million points in sine waves, e.g. something like 1000 sine waves with e.g. 3000 points in each. If you look at the zoomed in image, the sine waves are spaced significantly, so if you would represent this as an image it would be at least a factor 10 larger. In fact that is likely a significant underestimation, i.e. you need to connect the points inside the sine waves as well.

The comparison case would be to take a vector graphics (e.g. svg) with 1000 sine wave lines and open it in a viewer (written in C or Fortran if you want) and try zooming in and out quickly.

kushalkolar 10 hours ago
Thanks, and the purpose was to show what's possible on modest hardware that most people have. We have created gigabytes of graphics that live on the gpu for more complex use cases and they remain performant, but you need a gaming gpu.
enriquto 4 hours ago
But why do you want to fit the whole dataset in memory? If the dataset is stored in a tiled and multi-scaled representation you need to only grab the part of it that is needed to fit your screen (which is a constant, small amount of data, even if the dataset is arbitrarily large).

If you insist to fit the entire thing in memory, it may seem better to do so in the plain RAM, which nowadays is of humongous size even in "modest" systems.

rossant 1 hour ago
Maybe it's an instance of Parkinson's law [1]: if it all fits in GPU memory, just put it all in and plot it. This is much simpler to implement than any out-of-memory technique. It's also easier for the user—`scatter(x, y)` would work effortlessly with, say, 10 million points.

But with 10 billion points, you need to consider more sophisticated approaches.

[1] https://en.wikipedia.org/wiki/Parkinson%27s_law

vibranium 12 hours ago
I’m often working with a windows desktop and a remote Linux box on which I have my data & code. I’d like to plot “locally” on my desktop workstation from the remote host. This usually either means using X11 (slow) or some sort of web-based library like plotly. Does fastplotlib offer any easy solution here?
kushalkolar 12 hours ago
This is exactly why we use jupyter-rfb, I often have large datasets on a remote cluster computer and we perform remote rendering.

see: https://fastplotlib.org/ver/dev/user_guide/faq.html#what-fra...

aplzr 12 hours ago
I'm in the same boat as the person you replied to, but have zero experience with remote plotting other that doing static plots in in a remote session in the interactive window provided by VS Code's python extension. Would this also work there, or would I have to start using jupyter notebooks?
kushalkolar 10 hours ago
non-jupyter notebook implementations have their quirks, eventually we hope to make a more universal jupyter-rfb kind of library, perhaps using anywidget. Anywidget is awesome: https://github.com/manzt/anywidget

People have used fastplotlib and jupyter-rfb in vscode, but it can be troublesome and we don't currently have the resources to figure out exactly why.

aplzr 9 hours ago
Alright, thanks. I don't particularly like notebook, but this might a reason to give it another go.
asterix_pano 4 hours ago
Looks very interesting. Does it allow to plot lines of varying thickness?
klaussilveira 16 hours ago
Very cool to see imgui empowering so many different things.
fpl-dev 15 hours ago
We love imgui! Big thanks to the imgui devs, and Pascal Thomet who maintains the python bindings for imgui-bundle, and https://github.com/panxinmiao who made an Imgui Renderer for wgpu-py!
rossant 14 hours ago
Imgui is awesome! Thanks for mentioning imgui-bundle—I hadn’t heard of it before, but it looks great! [1]

[1] https://github.com/pthom/imgui_bundle

trostaft 13 hours ago
Looks very interesting for interactive visualization. I like the animation interface. Also love imgui, glad to see it here. I wish I had better plotting tools for publication quality images (though, honestly I'm pretty happy with matplotlib).
kushalkolar 13 hours ago
Thanks! Yup our focus is not publication figures, matplotlib and seaborn cover that space pretty well.
roter 15 hours ago
Very interesting and promising package.

I especially like that there is a PyQt interface which might provide an alternative to another great package: pyqtgraph[0].

[0] https://github.com/pyqtgraph/pyqtgraph

kushalkolar 15 hours ago
Thanks! I used pyqtgraph for many years and love what can be done by it, we started off wanting to build something like it but based on WGPU and not bound to Qt.
clewis7 15 hours ago
Thank you for your interest! We have taken a lot of inspiration from pyqtgraph and really like their library.
unit149 13 hours ago
[dead]
13 hours ago
abdullahkhalids 16 hours ago
Is it possible to put the interactive plots on your website? Or is this a Jupyter notebook only tool.
clewis7 15 hours ago
See here: https://www.fastplotlib.org/ver/dev/user_guide/faq.html#what...

We are hoping for pyodide integration soon, which would allow fastplotlib to be run strictly in the browser!

abdullahkhalids 15 hours ago
Thanks. That will be very cool.
fpl-dev 15 hours ago
In the browser only jupyter for now, you can use voila to make a server based application using jupyter: https://github.com/voila-dashboards/voila

As Caitlin pointed out below pyodide is a future goal.

abdullahkhalids 15 hours ago
This is very nice. But thinking more along the lines of, can I embed a single interactive widget in a blog post.
ivoflipse 14 hours ago
Not today, it requires wgpu-py to support running on WASM / pyodide, which it doesn't yet (unfortunately)
meisel 16 hours ago
One of the big bottlenecks of plotting libraries is simply the time it takes to import the library. I’ve seen matplotlib being slow to import, and in Julia they even have a “time to first plot” metric. I’d be curious to see how this library compares.
clewis7 15 hours ago
I think one nice thing that we have tried to do is limit super heavy dependencies and also separate optional dependencies to streamline things.

The quickest install would be `pip install fastplotlib`. This would be if you were interested in just having the barebones (no imgui or notebook) for desktop viz using something like glfw.

We can think about adding in our docs some kind of import time metrics.

almarklein 3 hours ago
I have a feeling there's room for improvement for importing Pygfx as well. I think we should indeed strive that simple plots load super-quick.
kushalkolar 15 hours ago
Almar did some work on speeding up imports a year ago: https://github.com/fastplotlib/fastplotlib/pull/431

but we haven't benchmarked it yet

_aleph2c_ 12 hours ago
gooboo 15 hours ago
Yeah, many browsers have webgpu turned off by default, So you're stuck with wasm (wasm Simd if you're lucky)

Hopefully both are implemented.

almarklein 2 hours ago
That's because WebGPU is still experimental. This will change, as it's set to replace WebGL.

Fastplotlib / pygfx are primarily meant to run on desktop. When using it via the notebook the server does the rendering.

As Ivo said, we have plans to support running in the browser via Pyodide, which opens some interesting things, but is not the primary purpose.

ivoflipse 15 hours ago
This library builds upon pygfx and wgpu-py. Unfortunately, the latter doesn't support running on WASM, pyscript or pyodide yet, but there's an issue about it:

https://github.com/pygfx/wgpu-py/issues/407

PRs welcome though :-)

carabiner 17 hours ago
GPU all the things! GPU-accelerated Tableau would be incredible.
MortyWaves 12 hours ago
Another tool that requires precise control over memory layout, bandwidth, performance… using Python.
rossant 12 hours ago
... using Python... itself leveraging NumPy, C, the GPU...
7speter 13 hours ago
I’m not making neuroscience visualizations. I’m working with rather line graphs and would like to animate based on ~10000 points. I’m looking to convert these visuals to video for youtube, in hd and at 60fps using the HEVC/h.265 codec. I took a quick look at the documentation to see if this is possible and I didn’t see anything. Are or will this sort of rendering be supported?

I previously tried this on matplotlib and it took 20-30 minutes to make a single rendering because matplotlib only uses a single core on a cpu and doesn’t support gpu acceleration. I also tried Man im, but I couldn’t get an actual video file, and opengl seems to be a bit complicated to work with (I went and worked on other things though I should ask around about the video file output). Anyway, I’m excited about the prospect of a gpu accelerated dataviz tool that utilizes Vulkan, and I hope this library can cover my usecase.

kushalkolar 12 hours ago
Rendering frames and saving them to disk can be done with rendercanvas but we haven't exposed this in fastplotlib yet: https://github.com/pygfx/rendercanvas/issues/49
asangha 17 hours ago
>sine_wave.colors[::3] = "red"

I never knew I needed this until now

kushalkolar 13 hours ago
17 hours ago
sfpotter 17 hours ago
Very cool effort. That said, and it's probably because of the kind of work that I do, but I have almost never found the four challenges to be any kind of a problem for me. Although I do think there is some kind of contradiction there. Plotting (exploratory data analyis ("EDA"), really) is all about distilling key insights and finding features hidden in data. But you have to some kind of intuition about where the needle in the haystack is. IME, throwing up a ton of plots and being able to scrub around in them never seems to provide much insight. It's also very fast---usually the feedback loop is like "make a plot, go away and think about it for an hour, decide what plot I need to make next, repeat". If there is too much data on the screen it defeats the point of EDA a little bit.

For me, matplotlib still reigns supreme. Rather than a fancy new visualization framework, I'd love for matplotlib to just be improved (admittedly, fastplotlib covers a different set of needs than what matplotlib does... but the author named it what they named it, so they have invited comparison. ;-) ).

Two things for me at least that would go a long way:

1) Better 3D plotting. It sucks, it's slow, it's basically unusable, although I do like how it looks most of the time. I mainly use PyVista now but it sure would be nice to have the power of a PyVista in a matplotlib subplot with a style consistent with the rest of matplotlib.

2) Some kind of WYSIWYG editor that will let you propagate changes back into your plot easily. It's faster and easier to adjust your plot layout visually rather than in code. I'd love to be able to make a plot, open up a WYSIWYG editor, lay things out a bit, and have those changes propagate back to code so that I can save it for all time.

(If these features already exist I'll be ecstatic ;-) )

kkoncevicius 17 hours ago
I have to agree with your point about EDA. The library is neat, but even the example of covariance matrix animation is a bit contrived.

Every pixel has a covariance with every other pixel, so sliding though the rows of the covariance matrix generates as many faces on the right as there are pixels in a photograph of a face. However the pixels that strongly co-vary will produce very similar right side "face" pictures. To get a sense of how many different behaviours there are one would look for eigenvectors of this covariance matrix. And then 10 or so static eigenvectors of the covariance matrix (eigenfaces [1]) would be much more informative than thousands of animated faces displayed in the example.

Some times a big interactive visualisation can be a sign of not having a concrete goal or not knowing how to properly summarise. After all that's the purpose of a figure - to highlight insights, not to look for ways to display the entire dataset. And pictures that try to display the whole dataset end up shifting the job of exploratory analysis to a visual space and leave it for somebody else.

Thou of course there are exceptions.

[1]: https://en.wikipedia.org/wiki/Eigenface

fpl-dev 15 hours ago
Hi, one of the other devs here. As the poster below pointed out what you're missing is that in this case we know that an eigendecomposition or PCA will be useful. However if you're working on matrix decomposition algorithms like us, or if you're trying to design new forms of summary matrices because a covariance matrix isn't informative for your type of data then these types of visualizations are useful. We broadly work on designing new forms of matrix decomposition algorithms so it's very useful to look at the matrices and then try to determine what types of decompositions we want to do.
sfpotter 15 hours ago
I've also worked on designing new matrix decompositions, and I've never found the need for anything but `imshow`...
fpl-dev 14 hours ago
ok, different libraries have different use cases, the type of data we work with absolutely necessitates dynamic visualization. You wouldn't view a video with imshow would you?
sfpotter 14 hours ago
Every time I've needed to scrub through something in time like that, dumping a ton of frames to disk using imshow has been good enough. Usually, the limiting factor is how quickly I can generate a single frame.

It's hard for me to imagine what you're doing that necessitates such fancy tools, but I'm definitely interested to learn! My failure of imagination is just that.

fpl-dev 14 hours ago
The example from the article with the subtitle "Large-scale calcium imaging dataset with corresponding behavior and down-stream analysis" is a good example. We have brain imaging video that is acquired simultaneously with behavioral video data. It is absolutely essential to view the raw video at 30-60Hz.
wtallis 16 hours ago
Aren't you missing the entire point of exploratory data analysis? Eigenfaces are an example of what you can come up with as the end product of your data exploration, after you've tried many ways of looking at the data and determined that eigenfaces are useful.

Your whole third paragraph seems to be criticizing the core purpose of exploratory data analysis as though one should always be able to skip directly to the next phase of having a standardized representation. When entering a new problem domain, somebody needs to actually look at the data in a somewhat raw form. Using the strengths of the human vision system to get a rough idea of what the typical data looks like and the frequency and character of outliers isn't dumping the job of exploratory data analysis onto the reader, it's how the job actually gets done in the first place.

fpl-dev 15 hours ago
> Using the strengths of the human vision system to get a rough idea of what the typical data looks like and the frequency and character of outliers isn't dumping the job of exploratory data analysis onto the reader, it's how the job actually gets done in the first place.

Yup this is a good summary of the intent, we also have to remember that the eigenfaces dataset is a very clean/toy data example. Real datasets never look this good, and just going straight to an eigendecomp or PCA isn't informative without first taking a look at things. Often you may want to do something other than an eigendecomp or PCA, get an idea of your data first and then think about what to do to it.

Edit: the point of that example was to show that visually we can judge what the covariance matrix is producing in the "image space". Sometimes a covariance matrix isn't even the right type of statistic to compute from your data and interactively looking at your data in different ways can help.

kkoncevicius 15 hours ago
As a whole, of course you have a point - big visualisations when done properly should help with data exploration. However, from my experience they rarely (but not never) do. I think it's specific to the type of data you work with and the visualisation you employ. Let me give an example.

Imagine we have some big data - like an OMIC dataset about chromatin modification differences between smokers and non-smokers. Genomes are large so one way to visualise might be to do a manhattan plot (mentioned here in another comment). Let's (hypothetically) say the pattern in the data is that chromatin in the vicinity of genes related to membrane functioning have more open chromatin marks in smokers compared to non smokers. A manhattan plot will not tell us that. And in order to be able to detect that in our visualisation we had to already know what we were looking for in the first place.

My point in this example is the following: in order to detect that we would have to know what to visualise first (i.e. visualise the genes related to membrane function separately from the rest). But then when we are looking for these kinds of associations - the visualisation becomes unnecessary. We can capture the comparison of interest with a single number (i.e. average difference between smokers vs non-smokers within this group of genes). And then we can test all kinds of associations by running a script with a for-loop in order to check all possible groups of genes we care about and return a number for each. It's much faster than visualisation. And then after this type of EDA is done, the picture would be produced as a result, displaying the effect and highlighting the insights.

I understand your point about visualisation being an indistinguishable part of EDA. But the example I provided above is much closer to my lived experience.

sfpotter 14 hours ago
Yeah, I agree with the general sentiment of what you're saying.

Re: wtallis, I think my original complaint about EDA per se is indeed off the mark.

Certainly creating a 20x20 grid of live-updating GPU plots and visualizations is a form of EDA, but it seems to suggest a complete lack of intuition about the problem you're solving. Like you're just going spelunking in a data set to see what you can find... and that's all you've got; no hypothesis, no nothing. I think if you're able to form even the meagerest of hypotheses, you should be able to eliminate most of these visualizations and focus on something much, much simpler.

I guess this tool purports to eliminate some of this, but there is also a degree of time-wasting involved in setting up all these visualizations. If you do more thinking up front, you can zero in on a smaller and more targeted subset of experiments. Simpler EDA tools may suffice. If you can prove your point with a single line or scatter plot (or number?), that's really the best case scenario.

macleginn 16 hours ago
Eigendecomposition of the covariance matrix, essentially PCA, is probably the first non-trivial step in the analysis of any dataset. The idea in the comment above seems to be that it's more useful to combine some basic knowledge of statistics with simpler visualisation techniques, rather than to quickly generate thousands of shallower plots. Being able to generate thousands of plot is useful, of course, but I would agree that promoting good data-analysis culture is more beneficial.
wtallis 16 hours ago
> Eigendecomposition of the covariance matrix, essentially PCA, is probably the first non-trivial step in the analysis of any dataset

For a sufficiently narrow definition of "dataset", perhaps. I don't think it's the obvious step one when you want to start understanding a time series dataset, for example. (Fourier transform would be a more likely step two, after step one of actually look at some of your data.)

mturmon 12 hours ago
I agree, but: the technique of “singular spectrum analysis” is pretty much PCA applied to a covariance matrix resulting from time-lagging the original time series. (https://en.wikipedia.org/wiki/Singular_spectrum_analysis)

So this is not unheard of for time series analysis.

fpl-dev 14 hours ago
Exactly that's a good example!
14 hours ago
hatthew 14 hours ago
For me, one of the most annoying things in my workflow is when I'm waiting for the software to catch up. If I'm making a plot, there's a lot of little tweaks I want to do to visually extract the maximum amount of information from a dataset. For example, if I'm making a histogram, I may want to adjust the number of bins, change to log scale, set min/max to remove outliers, and change the plot size on page. For the sake of the argument, let's say I'm working with a set of 8 slices of the dataset, so I need to regenerate 8 plots every time I make a tweak. My workflow is: Code the initial plots with default settings, run numpy to process the data, run matplotlib to display the data, look at the results, make tweaks to the code, circle back to step 2. In that cycle, "wait for matplotlib to finish generating the plots" can often be one of the longest parts of the cycle, and critically it's the vast majority of the cumulative time that I'm waiting rather than actively doing something. Drawing plots should be near instantaneous; there's an entire industry devoted to drawing complicated graphics in 16ms or less, I shouldn't need to wait >100ms for a single 2d grid with some dots and lines on it.

Matplotlib is okay, but there's definitely room for improvement, so why not go for that improvement?

sfpotter 14 hours ago
I think this varies a lot depending on what you're doing.

I agree 100% that matplotlib is really slow and should be made to run as fast as humanly possible. I would add a (3) to my list above: optimize matplotlib!

OTOH, at least for what I'm doing, the code that runs to generate the data that gets plotted dominates the runtime 99% of the time.

For me, adjusting plots is usually the time waster. Hence point (2) above. I'd love to be able to make the tweaks using a WYSIWYG editor and have my plotting script dynamically updated. The bins, the log scale, the font, the dpi, etc, etc.

I think with your 8 slices examples above: my (2) and (3) would cover your bases. In your view, is the rest of matplotlib really so bad that it needs to be burnt to the ground for progress to be made?

hatthew 13 hours ago
Yeah, I'd love it if mpl could be optimized. I do think that it has a lot of weird design decisions that could justify burning it down and starting from scratch (e.g. weird mix of stateful and stateless api), but I've already learned most of its common quirks so I selfishly don't care anymore, and my only significant complaint is that I want it to be faster :)

edit: regarding runtime, I'm sure this varies a lot based on usecase, but for my usual usecase I store a mostly-processed dataset, so the additional processing before drawing the data is usually minimal.

paddy_m 17 hours ago
I'd be curious to hear more about your EDA workflow.

What I want for EDA is a tool that let's me quickly toggle between common views of the dataset. I run through the same analysis over and over again, I don't want to type the same commands repeatedly. I have my own heuristics for which views I want, and I want a platform that lets me write functions that express those heuristics. I want to build the inteligence into the tool instead of having to remember a bunch of commands to type on each dataframe.

For manipulating the plot, I want a low-code UI that lets me point and click the operations I want to use to transform the dataframe. The lowcode UI should also emit python code to do the same operations (so you aren't tied to a low-code system, you just use it as a faster way to generate code then typing).

I have built the start of this for my open source datatable UX called Buckaroo. But it's for tables, not for plotting. The approach could be adapted to plotting. Happy to collaborate.

jampekka 16 hours ago
At least I usually do prefer to do the EDA plotting by writing and editing code. This is a lot more flexible. It's relatively rare to need other interactivity than zooming and panning.

The differing approaches probably can be seen in some API choices, although the fastplotlib API is a lot more ergonomic than many others. Having to index the figure or prefixing plots with add_ are minor things, and probably preferable for application development, but for fast-iteration EDA they will start to irritate fast. The "mlab" API of matplotlib violates all sorts of software development principles, but it's very convenient for exploratory use.

Matplotlib's performance, especially with interaction and animation, and clunky interaction APIs are definite pain points, and a faster and better interaction supporting library for EDA would be very welcome. Something like a mlab-type wrapper would probably be easy to implement for fastplotlib.

And to bikeshed a bit, I don't love the default black background. It's against usual conventions, difficult for publication and a bit harder to read when used to white.

paddy_m 14 hours ago
Writing and editting code is a lot more flexible, but it gets repetitive, and I have written the same stuff so many times. It's all adhoc, and it fixes the problem at the time, then it gets thrown away with the notebook only to be written again soon.

As an example, I frequently want to run analytics on a dataframe. More complex summary stats. So you write a couple of functions, and have two for loops, iterating over columns and functions. This works for a bit. It's easy to add functions to the list. Then a function throws an error, and you're trying to figure out where you are in two nested for loops.

Or, especially for pandas, you want to separate functions to depend on the same expensive pre-calc. You could pass the existing dict of computed measures so you can reuse that expensive calculation... Now you have to worry about the ordering of functions.

So you could put all of your measures into one big function, but that isn't reusable. So you write your big function over and over.

I built a small dag library that handles this, and lets you specify that your analysis requires keys and provides keys, then the DAG of functions is ordered for you.

How do other people approach these issues?

kkoncevicius 14 hours ago
I work with R and not python, so some things might not apply, but this:

> [...] it fixes the problem at the time, then it gets thrown away with the notebook only to be written again soon.

Is one of the reasons I stopped using notebooks.

One solution to your problem might be to create a simple executable script that, when called on the file of your dataset in a shell, would produce the visualisation you need. If it's an interactive visualisation then I would create a library or otherwise a re-usable piece of code that can be sourced. It takes some time but ends up saving more time in the end.

If you have custom-made things you have to check on your data tables, then likely no library will solve your problem without you doing some additional the work on top.

And for these:

> Or, especially for pandas, you want to separate functions to depend on the same expensive pre-calc. [...] Now you have to worry about the ordering of functions.

I save expensive outputs to intermediate files, and manage dependencies with a very simple build-system called redo [1][2].

[1]: http://www.goredo.cypherpunks.su

[2]: http://karolis.koncevicius.lt/posts/using_redo_to_manage_r_d...

paddy_m 12 hours ago
Thanks. I see how redo works.

For larger datasets, real scripts are a better idea. I expect my stuff to work with datasets up to about 1Gb, caching is easy to layer on and would speed up work for larger datsets, but my code assumes the data fits in memory. It would be easier to add caching, the make sure I don't load an entire dataset into memory. (I don't serialize the entire dataframe to the browser though).

jampekka 12 hours ago
Usually I write scripts that use function memoization cache (to disk) for expensive operations. Recently I've also used Marimo sometimes, which has great support for modules (no reloading hacks), can memoize to disk and has deterministic state.
simply_anyone 15 hours ago
I agree with you sfpotter, very interesting. Looks in some ways similar to PyQtGraph regarding real time plotting.

I agree with you regarding matplotlib, although I find a lot of faults/frustration in using it. Both your points on 3D plotting and WYSIWYG editor would be extremely nice and as far as I know nothing exists in python ticking these marks. For 3D I typically default to Matlab as I've found it to be the most responsive/easy to use. I've not found anything directly like a WYSIWYG editor. Stata is the closest but I deplore it, R to some extent has it but if I'm generating multiple plots it doesn't always work out.

I'm surprised by what you said about "EDA". I find the opposite, a shotgun approach, exploring a vast number of plots with various stratifications gives me better insight. I've explored plotting across multiple languages (R,python,julia,stata) and not found one that meets all my needs.

The biggest issue I often face is I have 1000 plots I want to generate that are all from separate data groups and could all be plotted in parallel but most plotting libraries have holds/issues with distribution/parallelization. The closest I've found is I'll often build up a plot in python using a Jupyter notebook. Once I'm done I'll create a function taking all the needed data/saving a plot out, then either manually or with the help of LLMs convert it to julia which I've found to be much faster in loading large amounts of data and processing it. Then I can loop it using julia's "distributed" package. Its less then ideal, threaded access would be great, rather then having to distribute the data, but I've yet to find something that works. I'd love a simple 2D EDA plotting library that has basic plots like lines, histograms (1/2d), scatter plots, etc, has basic colorings and alpha values and is able to handle large amounts (thousands to millions of points) of static data and plot it saving to disk parallelized. I've debated writing my own library but I have other priorities currently, maybe once I finish my PhD.

selimthegrim 14 hours ago
Interested to hear what your PhD is in.
benbojangles 16 hours ago
I agree on the refinement of matplotlib, we all need it to be better at resource handling, lower memory use, it often get boggy quickly.
tomjakubowski 15 hours ago
For point (2), have you tried the perspective-viewer library? You can make edits in the UI and then use the "debug view" to copy and paste the new configuration back into your code.

https://perspective.finos.org/

mhh__ 16 hours ago
My hot take is that 3D plotting feels bad because 3D plots are bad. You can usually find some alternative way of representing the data
sfpotter 15 hours ago
I work on solving 3D problems: numerical methods for PDEs in R^3, computational geometry, computational mechanics, graphics, etc. Being able to make nice 3D plots is super important for this. I agree it's not always necessary, and when a 2D plot suffices, that's the way to go, but that doesn't obviate my need for 3D plots.
bee_rider 15 hours ago
3D plots might be neat if there was some widespread way of displaying them. Unfortunately we can only make 2D projections of 3D plots on our computer screens and pieces of paper.

Maybe VR will change that at some point. :shrug:

jampekka 15 hours ago
This is the correct take. There are almost always better ways to plot three dimensional data than trying to project 3D geometry to 2D.
14 hours ago
rossant 16 hours ago
Shameless plug: I'm actively working on a similar project, Datoviz [1], a C/C++ library with thin Python bindings (ctypes). It supports both 2D and 3D but is currently less mature and feature-complete than fastplotlib. It is also lower level (high-level capabilities will soon be provided by VisPy 2.0 which will be built on top of Datoviz, among other possible backends).

My focus is primarily on raw performance, visual quality, and scalability for large datasets—millions, tens of millions of points, or even more.

[1] https://datoviz.org/

cycomanic 11 hours ago
Cool to see you on here Cyrille, I've been following your work (and Nicolas's) for a long time. Thanks for all the cool stuff you've been doing!
Spoilage4218 16 hours ago
I have always admired your datoviz library from afar and check the vispy2/vispy2-sandbox libraries on GitHub every few months to check up on it. When do you think 'soon' is?? Really looking forward to it!
rossant 14 hours ago
Thanks! The code is currently managed by Nicolas Rougier in a GitHub repository that will be made public next week. This repository hosts the "graphics server protocol" (GSP), an intermediate layer between Datoviz and the future high-level plotting API. For the latter, we’ll need community feedback to shape an API philosophy that aligns with VisPy users' needs—let's aim to publish a write-up this month.

Implementing the API on top of GSP should be relatively straightforward, as the core graphics-related mechanisms are handled by GSP/Datoviz. We've created a Slack channel for discussions—contact me privately if you'd like to join.

16 hours ago
749402826 16 hours ago
"Fast" is a bold claim, given the complete lack of benchmarks and the fact that it's written entirely in Python...
paddy_m 16 hours ago
I'm certain the host heavy lifting is done by numpy which is a python wrapper around Fortran and C. The visualization heavy lifting is done by pygfx/wgpu-py. wgpu-py has C. I think wgpu-py compiles to WASM to run in the browser. More and more packages are taking this route.

[1] https://github.com/pygfx/pygfx [2] https://github.com/pygfx/wgpu-py

almarklein 2 hours ago
All true, except the bit that wgpu-py compiles to WASM. It's all desktop.

In the plans that we do have for running the browser, Fastplotlib, Pygfx and wgpy-py will still be Python, running on CPython that is compiled to WASM (via Pyodide). But instead of wgpu-py cffi-ing into a C library, it would make JS calls to the WebGPU API.

kushalkolar 15 hours ago
In fastplotlib at the end of the day everything is wgpu under the hood, and as the other poster correctly pointed out about numpy being fortran and C wrappers.
16 hours ago
16 hours ago
ZeroCool2u 18 hours ago
Seems like a nice library, but I have a hard time seeing myself using it over plotly. The plotly express API is just so simple and easy. For example, here's the docs for the histogram plot: https://plotly.com/python/histograms/

This code gives you a fully interactive, and performant, histogram plot:

```python

import plotly.express as px df = px.data.tips() fig = px.histogram(df, x="total_bill") fig.show()

```

kushalkolar 15 hours ago
Different use cases :) Plotly doesn't give the performance and interactive tools required for many neuroscience visualizations. We also focus more on the primitive graphics and, at least not yet, on the more complex "composite" graphics built with primitives like histograms.
17 hours ago
Starlord2048 16 hours ago
[flagged]
dang 16 hours ago
Please stop.
qoez 15 hours ago
[flagged]
dang 15 hours ago
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

qoez 14 hours ago
I appreciate the warning and if it's not by claude I apologize, but I do think we should be allowed to express scepticism if things posted are just AI slop (and if we have to fear getting banned or what-have-you as a consequence I genuinely think that's worse for HN long term than the alternative).
pvg 14 hours ago
If the skepticism is based on nothing but vibes such commentary is functionally equivalent to something the site guidelines ask you to avoid as it is.
dang 8 hours ago
Don't worry, we wouldn't ban anyone for this. I agree with you that it's a grey area and will take time to work out.
kushalkolar 15 hours ago
I dunno why you'd say this, neither of us are fans of LLMs and most of this was written before LLMs were a thing :)
janalsncm 15 hours ago
Maybe Claude was trained on your code. You should take it as a compliment.
bdangubic 14 hours ago
asked claude, said it didn’t do it :)