BioHacker News | Fastplotlib: GPU-accelerated, fast, and interactive plotting library

▲Fastplotlib: GPU-accelerated, fast, and interactive plotting library(medium.com)

491 points by rossant 118 days ago | 37 comments

Every two weeks or so I peruse github looking for something like this and I have to say this looks really promising. In statistical genetics we make really big scatterplots called Manhattan plots https://en.wikipedia.org/wiki/Manhattan_plot and we have to use all this highly specialized software to visualize at different scales (for a sense of what this looks like: https://my.locuszoom.org/gwas/236887/). Excited to try this out

▲clewis7 118 days ago

Hey! This sounds like a really interesting use case. If you run into any issues or need help with the visualization, please don't hesitate to post an issue on the repo. We can also think about adding an example demo of a manhattan plot to help too!

▲ 117 days ago

▲j_bum 118 days ago

If you’re working in R with ggplot2, you could also consider the `ggrastr` package, specifically, `ggrastr::geom_point_rast`

▲swalsh 117 days ago

These really large scatterplots are also useful for visualizing claims, and finding fraud.

▲samstave 118 days ago

Have you tried ManimGL?

https://github.com/3b1b/manim/releases

Super awesome, and you can make it into an MCP for Cursor.

▲jarpineh 117 days ago

This looks very promising. I'll have to think my visualization cases against new possibilities this enables.

I have been intermittently following Rerun, a "robotics-style data visualization" app [1]. Their architecture bears certain similarities [2]. Wgpu in both, egui and imgui, Rust with Python. Rerun's stack does compile to WASM and works in browser. Use cases seem different, but somewhat the same. I don't do scientific nor robotic stuff at all, so no opinions on feasibility of either...

[1] https://rerun.io [2] https://github.com/rerun-io/rerun/blob/main/ARCHITECTURE.md

▲dcl 118 days ago

I always thought it was interesting that my modern CPU takes ages to plot 100,000 or so points in R or Python (ggplot2, seaborn, plotnine, etc) and yet somehow my 486DX 50Mhz could pump out all those pixels to play Doom interactively and smoothly.

▲sieste 118 days ago

This SO thread [1] analyses how much time ggplot spends on various tasks. Not sure if a better GPU integration to produce the visual output would help speed it up significantly.

[1] https://stackoverflow.com/questions/73470828/ggplot2-is-slow...

▲kkoncevicius 117 days ago

From R side i think this is mainly because ggplot2 is really really slow.

Base R graphics would plot 100,000 points in about 100 milliseconds.

    x <- rnorm(100000)
    plot(x)

A quick benchmark with writing to a file:

    x <- rnorm(100000)
    system.time({
      png("file.png")
      plot(x)
      dev.off()
    })

     user  system elapsed
    0.179   0.002   0.180

▲stackedinserter 117 days ago

Nobody cares about optimization for relatively big datasets like million points, maybe it's not a very popular use case. Even libraries that do able to render these datasets, do that incorrectly e.g. skip peaks, show black rectangles instead of showing internal distribution of noisy data, etc.

I ended up with writing my own tool that's able to show millions of points and never looked back.

▲zoogeny 118 days ago

> powered by WGPU, a cross-platform graphics API that targets Vulkan (Linux), Metal (Mac), and DX12 (Windows).

The fact that they are using WGPU, which appears to be a Python native implementation of WebGPU, suggests there is an interesting possible extended case. As a few other comments suggest, if one knows that the data is available on a machine in a cluster rather than on the local machine of a user, it might make sense to start up a server, expose a port and pass along the data over http to be rendered in a browser. That would make it shareable across the lab. The limit would be the data bandwidth over http (e.g. for the 3 million point case) but it seems like for simpler cases it would be very useful.

That would lead to an interesting exercise of defining a protocol for transferring plot points over http in such a way they could be handed over to a the browser WebGPU interface efficiently. Perhaps even a more efficient representation is possible with some pre-processing on the server side?

▲kushalkolar 118 days ago

> the data is available on a machine in a cluster rather than on the local machine of a user

jupyter-rfb lets you do remote rendering for this, render to a remote frame buffer and send over a jpeg byte stream. We and a number of our scientific users use it like this. https://fastplotlib.org/ver/dev/user_guide/faq.html#what-fra...

> defining a protocol for transferring plot points

This sounds more like GSP, which Cyrille Rossant (who's made some posts here) works on, it has a slightly different kind of use case.

▲zoogeny 118 days ago

What is GSP in this context? Searching Python GSP brings up Generalized Sequence Pattern (GSP) algorithm [1] and Graph Signal Processing [2], neither of which seem to be a protocol. I also found "Generic Signaling Protocol" and "Global Sequence Protocol" which also don't seem relevant. Forgive me if GSP is some well know thing which I am just not familiar with.

1. https://github.com/jacksonpradolima/gsp-py

2. https://pygsp.readthedocs.io/en/stable/

▲bglazer 118 days ago

Graphics Server Protocol

Forgive me for doing this, but I used an LLM to find that. They’re exceptionally useful for disambiguation tasks like this. Knowing what an acronym refers to is very useful for next token prediction, so they’re quite good at it. It’s usually trivial to figure out if they’re hallucinating with a search engine.

[1] https://news.ycombinator.com/item?id=43335769

▲kushalkolar 118 days ago

I don't think it's ready yet and I think it might be private at the moment, Cyrille can comment more on it.

But if I understand correctly it's a protocol for serializing graphical objects, pretty neat idea.

▲mkl 118 days ago

WGPU is a Rust thing more than a Python thing.

▲almarklein 118 days ago

To clarify this a bit, wgpu is a Rust implementation of WebGPU, just like Dawn is a C++ implementation of WebGPU (by Google). Both projects expose a C-api following webgpu.h. wgpu-py Should eventually be able to work with both. (Disclaimer: I'm the author of wgpu-py)

▲zoogeny 118 days ago

Fair, I was looking at the wgpu-py [1] page but only skimmed it. It does indeed look like a wrapper over wgpu-native [2] which is written in Rust.

1. https://github.com/pygfx/wgpu-py

2. https://github.com/gfx-rs/wgpu-native

▲Swannie 118 days ago

What you describe sounds a bit like Graphistry:

https://pygraphistry.readthedocs.io/en/latest/performance.ht...

▲Vipitis 118 days ago

I have watched recordings of your recent representation and decided to finally give it a try last week. My goal is to create some interactive network visualizations - like letting you click/box select nodes and edges to highlight subgraphs which sounds possible with the callbacks and selectors.

Haven't had the time to get very far yet, but will gladly contribute an example once I figure something out. Some of the ideas I want to eventually get to is to render shadertoys(interactively?) into a fpl subplot (haven't looked at the code at all, but might be doable), eventually run those interactively in the browser and do the network layout on the GPU with compute shaders (out of scope for fpl).

▲kushalkolar 118 days ago

Hi! I've seen some of your work on wgpu-py! Definitely let us know if you need help or have ideas, if you're on the main branch we recently merged a PR that allows events to be bidirectional.

▲crazygringo 118 days ago

Sounds really compelling.

But it doesn't seem to answer how it works in Jupyter notebooks, or if it does at all. Is the GPU acceleration done "client-side" (JavaScript?) or "server-side" (in the kernel?) or is there an option for both?

Because I've used supposedly fast visualization libraries in Google Colab before, but instead of updating at 30 fps, it takes 2 seconds to update after a click, because after the new image is rendered it has to be transmitted via the Jupyter connector and network and that can turn out to be really slow.

▲ivoflipse 118 days ago

Fastplotlib definitely works in Jupyterlab through jupyter-rfb https://github.com/vispy/jupyter_rfb

I believe the performance is pretty decent, especially if you run the kernel locally

Their docs also cover this as mentioned by @clewis7 below: https://www.fastplotlib.org/ver/dev/user_guide/faq.html#what...

▲kushalkolar 118 days ago

Thanks Ivo!

Just to add on, colab is weird and not performant, this PR outlines our attempts to get jupyter-rfb working on colab: https://github.com/vispy/jupyter_rfb/pull/77

▲crazygringo 118 days ago

Thanks. Yeah I've been baffled as to why just interactive Matplotlib with a Colab kernel is so slow. The Colab CPU is fast (enough), the network is fast, I haven't been able to figure out where the bottleneck is either.

▲kushalkolar 116 days ago

I just remembered, I think there is something weird with Google's servers or the network because performance was very poor even with a custom Google Cloud instance running jupyterlab, see this: https://github.com/vispy/jupyter_rfb/issues/95#issuecomment-...

▲paddy_m 118 days ago

Is google colab slower than an equivalently powerful kernel running on a remote jupyter kernel? Are you running into network problems, or is it something specific to colab?

▲kushalkolar 116 days ago

I just commented above, see this: https://github.com/vispy/jupyter_rfb/issues/95#issuecomment-...

▲clewis7 118 days ago

Thanks Ivo!

▲theLiminator 118 days ago

Do you have any numbers for the rough number of datapoints that can be handled? I'm curious if this enables plotting many millions of datapoints in a scatterplot for example.

▲clewis7 118 days ago

Yes! The number of data points can range in the millions. Quite honestly, the quality of your GPU would be the limiting factor here. I will say, however, that for most use cases, an integrated GPU is sufficient. For reference, we have plotted upwards of 3 million points on a mid-range integrated GPU from 2017.

I will work on adding somewhere in our docs some metrics for this kind of thing (I think it could be helpful for many).

▲enriquto 118 days ago

>I will work on adding somewhere in our docs some metrics for this kind of thing (I think it could be helpful for many).

Certainly! A comparison of performance with specialized tools for large point clouds would be very interesting (like cloudcompare and potree).

▲wodenokoto 118 days ago

How is it compared to HoloViz?[1]

I followed one of their online workshops, and it feels really powerful, although it is a bit confusing which part of it does what (it's basically 6 or 7 projects put together under an umbrella)

[1] https://holoviz.org/

▲kushalkolar 117 days ago

Fastplotlib is very different from bokeh and holoviz, and has different use cases.

Bokeh and holoviz send data to a JS front end that draws (to the best of my knowledge), whereas fastplotlib does everything on the python side and uses jupyter_rfb to send a compressed frame buffer when used in jupyter. Fastplotlib also works as a native desktop application in Qt and glfw, which is very different from bokeh/holoviz. Fastplotlib also has higher raw render speed, you can scroll though a 4k video at 60Hz with thousands of extra objects on your desktop which I haven't ever been able to accomplish with bokeh (I haven't tried it in years, not sure if things have changed)

The events system is also very different, we try to keep the API to simple function callbacks in fastplotlib.

At the end of the day use the best tool for your use case :)

▲almarklein 118 days ago

One big difference is that Fastplotlib is based on GPU tech, so its capable of rendering much larger datasets interactively.

▲unnah 117 days ago

How much larger? Holoviz includes the datashader library for GPU-based rendering, and here is an example with 10 million points: https://examples.holoviz.org/gallery/nyc_taxi/nyc_taxi.html

▲almarklein 117 days ago

I don't know Datashader that well, but from what I understand, it generates an image from a set of primitives (e.g. points), and then allows you to interactively inspect that image. It does not re-render the points on every frame like Fastplotlib/Pygfx does.

Depending on your GPU, you can render say 1-50 million points smoothly. Also see e.g. https://github.com/pygfx/pygfx/discussions/819

▲juliusbk 118 days ago

This looks super cool! Looking forward to trying it.

I think a killer feature of these gpu-plotting libraries would be if they could take torch/jax cuda arrays directly and not require a (slow) transfer over cpu.

▲kushalkolar 118 days ago

Thanks! That is a great question and one that I've we've been battling with as well. As far as we know, this is not possible due to the way different contexts are set up on the GPU https://github.com/pygfx/pygfx/issues/510

tinygrad which I haven't used seems torch-like and has a WGPU backend: https://github.com/tinygrad/tinygrad

▲juliusbk 118 days ago

Yeah, I remember looking into it myself as well, and not finding any easy path. A shame.... Maybe there's a hard way to do it though :)

▲rossant 118 days ago

I've been looking into this issue with Datoviz [1] following a user request. It turns out there may be a way to achieve it using Vulkan [2] (which Datoviz is based on) and CuPy's UnownedMemory [3]. I wrote a simple proof of concept using only Vulkan and CuPy.

I'm now working on a way for users to wrap a Datoviz GPU buffer as a CuPy array that directly references the Datoviz-managed GPU memory. This should, in principle, enable efficient GPU-based array operations on GPU data without any transfers.

[1] https://datoviz.org/

[2] https://registry.khronos.org/vulkan/specs/latest/man/html/VK...

[3] https://docs.cupy.dev/en/latest/reference/generated/cupy.cud...

▲kushalkolar 118 days ago

This looks cools thanks! Makes me wonder if there's any way to do that with WGPU if WGPU is interfacing with Vulkan, probably not easy if possible I"m guessing.

WGPU has security protections since it's designed for the browser so I'm guessing it's impossible.

▲rossant 118 days ago

Indeed, it doesn't seem to be possible at the moment, see e.g. https://github.com/gfx-rs/wgpu/issues/4067

▲paddy_m 118 days ago

Wow. So are you saying that you can have some array on the GPU that you setup with python via CuPy, then you call to the webbrowser and give it the pointer address for that GPU array, and the browser through WASM/WebGPU can access that same array? That sounds like a huge browser security hole.

▲kushalkolar 118 days ago

Yea the security issue is why I'm pretty sure you can't do it on WGPU, but Vulkan and cupy can fully run locally so it doesn't have the same security concern.

▲rossant 118 days ago

Exactly, this is the sort of thing you can more easily do on desktop than in a web browser.

▲rossant 117 days ago

I published my proof-of-concept here: https://gist.github.com/rossant/517806ea551f4038fd412c23c3d6...

▲PerryStyle 118 days ago

Would it be possible to leverage the python array api standard? Or is that more suited for just computations?

▲roastedpeacock 117 days ago

Not sure about the memory transfer bottleneck and potential mitigations. But out of interest, how insurmountable would it be to 'retool' fastplotlib to use JAX acceleration instead of wgpu?

▲7speter 118 days ago

I’m not making neuroscience visualizations. I’m working with rather line graphs and would like to animate based on ~10000 points. I’m looking to convert these visuals to video for youtube, in hd and at 60fps using the HEVC/h.265 codec. I took a quick look at the documentation to see if this is possible and I didn’t see anything. Are or will this sort of rendering be supported?

I previously tried this on matplotlib and it took 20-30 minutes to make a single rendering because matplotlib only uses a single core on a cpu and doesn’t support gpu acceleration. I also tried Man im, but I couldn’t get an actual video file, and opengl seems to be a bit complicated to work with (I went and worked on other things though I should ask around about the video file output). Anyway, I’m excited about the prospect of a gpu accelerated dataviz tool that utilizes Vulkan, and I hope this library can cover my usecase.

▲kushalkolar 118 days ago

Rendering frames and saving them to disk can be done with rendercanvas but we haven't exposed this in fastplotlib yet: https://github.com/pygfx/rendercanvas/issues/49

▲paddy_m 118 days ago

Really nice post introducing your library.

When would you reach for a different library instead of fastplotlib?

How does this deal with really large datasets? Are you doing any type of downsampling?

How does this work with pandas? I didn't see it as a requirement in setup.py

Does this work in Jupyter notebooks? What about marimo?

▲kushalkolar 118 days ago

Thanks!

> When would you reach for a different library instead of fastplotlib?

Use the best tool for your usecase, we're focused on GPU accelerated interactive visualization. Our use cases broadly are developing ML algorithms, user-end ML Ops tools, and looking live data off of live scientific instruments.

> How does this deal with really large datasets? Are you doing any type of downsampling?

Depends on your hardware, see https://fastplotlib.org/ver/dev/user_guide/faq.html#do-i-nee...

> How does this work with pandas? I didn't see it as a requirement in setup.py

If you pass in numpy-like types that use the buffer protocol it should work, we also want to support direct dataframe input in the future: https://github.com/fastplotlib/fastplotlib/issues/395

There are more low-level priorities in the meantime.

> Does this work in Jupyter notebooks? What about marimo?

Jupyter yes via juptyer-rfb, see our repo: https://github.com/fastplotlib/fastplotlib?tab=readme-ov-fil...

▲applied_heat 118 days ago

Looking forward to checking out your library, thanks for sharing it with the world.

I’ve been using kst-plot for live streaming data from instruments and interactive plots. It’s fast and I haven’t found any limit for the amount of data it can plot. Development has basically stopped - the product is done, feature complete, and works perfectly! It is used by European and Canadian space agencies. Maybe it will be interesting to you to see how they have solved or approached some of the same problems you have solved or will also solve !

▲lagrange77 118 days ago

Nice, i'd be interested to know which method for drawing lines (which is hard [0]) it uses.

[0] https://mattdesl.svbtle.com/drawing-lines-is-hard

▲kushalkolar 118 days ago

Almar made blog posts about the line shader he wrote!

https://almarklein.org/triangletricks.html

https://almarklein.org/line_rendering.html

A big shader refactor was done in this PR: https://github.com/pygfx/pygfx/pull/628

▲lagrange77 118 days ago

Thank you!

▲carabiner 118 days ago

GPU all the things! GPU-accelerated Tableau would be incredible.

▲pama 118 days ago

I know 3D is in the roadmap. Once the basic functionality is in place, it would be great to also consider integrating molecular visualization or at least provide enough fast primitives to simplify the integration of molecular visualization tools with this library.

▲clewis7 118 days ago

We are definitely looking forward to adding more 3D graphics in the future, and this sounds really cool. Would you mind posting an issue on the repo? I think this is something we would want to have on the roadmap or at least an open issue to plan out how we could do this. Thanks!

▲doright 118 days ago

Sometimes I wish these plotting libraries were more portable beyond Python only. I was looking for something similar for Ruby just a while ago but the install instructions seemed out of date and unsupported on Windows.

▲noosphr 118 days ago

Any sufficiently advanced plotting library with an api that can be called externally becomes indistinguishable from a GUI toolkit: https://www.gnu.org/software/guile/docs/guile-tut/tutorial.h...

Not sure if that is the right tutorial, but many years ago in the guile 1.x days I wrote a local visualizer for the data from a particle physics accelerator entirely in Guile and Gnuplot. It was very MVC and used guile as the controller and Gnuplot as the viewer.

Was it stupid? Yes. Did it work better than all the other tools I had at the time? Also yes.

▲kushalkolar 118 days ago

I do not know ruby but sometimes that's an opportunity to try and make one which others will also find useful :)

▲anthk 117 days ago

Here people using tons of GB and VRAM for tasks I just used awk and gnuplot on really underpowered machines. Such as the guys who parsed GB sized texts files ('big data', you know) containing IPs and Cloudflare hosts (due to LaLiga scandal blocking whole CF IP's because some of them they were user for soccer TV broadcasting piracy) by using CUDA and some bullshit, when the same tasks could be done with mawk in less than an hour.

▲enriquto 118 days ago

That would be preposterous if it wasn't so hilariously false:

> These days, having a GPU is practically a prerequisite to doing science, and visualization is no exception.

It becomes really funny when they go on to this, as if it was a big deal:

> Depicted below is an example of plotting 3 million points

Anybody who has ever used C or fortran knows that a modern CPU can easily churn through "3 million points" at more than 30 frames per second, using just one thread. It's not a particularly impressive feat, three million points is the size of a mid-resolution picture, and you can zoom-in and out those trivially in real-time using a CPU (and you could do that 20 years ago, as well). Maybe the stated slowness of fastplotlib comes from the unholy mix of rust and python?

Now, besides this rant, I think that fastplotlib is fantastic and, as an (unwilling) user of Python for data science, it's a godsend. It's just that the hype of that website sits wrong in me. All the demos show things that could be done much easier and just as fast when I was a teenager. The big feat, and a really big one at that, is that you can access this sort of performance from python. I love it, in a way, because it makes my life easier now; but it feels like a self-inflicted problem was solved in a very roundabout way.

▲cycomanic 118 days ago

>> Depicted below is an example of plotting 3 million points

> Anybody who has ever used C or fortran knows that a modern CPU can easily churn through "3 million points" at more than 30 frames per second, using just one thread. It's not a particularly impressive feat, three million points is the size of a mid-resolution picture, and you can zoom-in and out those trivially in real-time using a CPU (and you could do that 20 years ago, as well). Maybe the stated slowness of fastplotlib comes from the unholy mix of rust and python?

That's a misrepresentation though, it's 3 million points in sine waves, e.g. something like 1000 sine waves with e.g. 3000 points in each. If you look at the zoomed in image, the sine waves are spaced significantly, so if you would represent this as an image it would be at least a factor 10 larger. In fact that is likely a significant underestimation, i.e. you need to connect the points inside the sine waves as well.

The comparison case would be to take a vector graphics (e.g. svg) with 1000 sine wave lines and open it in a viewer (written in C or Fortran if you want) and try zooming in and out quickly.

▲kushalkolar 118 days ago

Thanks, and the purpose was to show what's possible on modest hardware that most people have. We have created gigabytes of graphics that live on the gpu for more complex use cases and they remain performant, but you need a gaming gpu.

▲enriquto 118 days ago

But why do you want to fit the whole dataset in memory? If the dataset is stored in a tiled and multi-scaled representation you need to only grab the part of it that is needed to fit your screen (which is a constant, small amount of data, even if the dataset is arbitrarily large).

If you insist to fit the entire thing in memory, it may seem better to do so in the plain RAM, which nowadays is of humongous size even in "modest" systems.

▲rossant 117 days ago

Maybe it's an instance of Parkinson's law [1]: if it all fits in GPU memory, just put it all in and plot it. This is much simpler to implement than any out-of-memory technique. It's also easier for the user—`scatter(x, y)` would work effortlessly with, say, 10 million points.

But with 10 billion points, you need to consider more sophisticated approaches.

[1] https://en.wikipedia.org/wiki/Parkinson%27s_law

▲stackedinserter 117 days ago

You can't draw a proper plot with 3 million points at 30 fps, unless you cut corners, like not showing distribution of data (showing black rectangle when there's internal structure) or skipping peaks, like many plotting tools do, e.g. Grafana.

▲enriquto 117 days ago

Of course you can! The screen of my laptop has nearly 3 million points (2160x1350) and I can do a fair amount of processing on each of its pixels, with one CPU thread, and still be above 30fps. A naive plotting method that loops over all the points and puts them into a grid will work without problem. Try it yourself!

▲kushalkolar 117 days ago

Setting the value of a pixel in an image is very different from drawing objects like lines, this is a good introduction: https://graphicscompendium.com/intro/01-graphics-pipeline

▲enriquto 117 days ago

Ultimately, objects are always drawn in the screen by setting pixels into it. Plotting a point by setting a pixel is entirely reasonable, and can be indeed done directly, in realtime, for several millions of points. I just tested the C program below, compiled with gcc without optimizations, and it gives about 80 fps for three million points (on my 6 year-old thinkpad). My point: CPUs are ridiculously fast, and you can indeed do a lot of large-scale data visualization without need to meddle with the GPU.

    #define FPS 80

    void plot_points(
                    float *o,  // output raster array (w*h)
                    int w,     // width of raster
                    int h,     // height of raster
                    float *x,  // input point coordinates (2*n)
                    int n      // number of input points
                    )
    {
            // initialize the output raster
            for (int i = 0; i < 2*w*h; i++)
                    o[i] = 0;
    
            // accumulate the points that fall inside the raster
            for (int i = 0; i < n; i++)
            {
                    int p = x[2*i+0];
                    int q = x[2*i+1];
                    if (p >= 0 && p < w && q >= 0 && q < h)
                            o[w*q+p] += 1;
            }
    }
    
    #include <stdlib.h>
    int main(void)
    {
            int w = 1000;
            int h = 1000;
            int n = 3000000;
            float *x = malloc(2*n*sizeof*x);
            float *o = malloc(w*h*sizeof*o);
            for (int i = 0; i < 2*n; i++)
                    x[i] = 1000*(rand()/(1.0+RAND_MAX));
            for (int i = 0; i < FPS ; i++)
                    plot_points(o, w, h, x, n);
            return 0;
    }
    // NOTE: if this program runs in less than 1 second, it means that it
    // is faster than "FPS"

▲stackedinserter 107 days ago

You're plotting individual points here, not a proper data graph. Even if you need a cloud of points, it's not enough, since you need to have different sizes and types of points that may have different sizes based on another data column, and definitely need to be drawn with antialiasing, even if they're simple squares.

Then, to draw something like this imgur.com/a/mXvEBzl (ADS-B data, ~10 million points iirc), you need to connect points with (antialiased) lines, where individual pixel should be blended into plot with respect of line opacity. Also, lines can be of different thickness, so it multiplies your `o[w x p+p] += 1` again.

I'm not even talking about multiple layers that are quite standard.

I use my own plotting app, it takes a lot more than just slap a bunch of points into "float *o". Try to write your own, you will figure it out pretty quickly, unless you're ok with black blobs that resemble the input data.

▲kushalkolar 117 days ago

OK now try to do this in 3D with arbitrary projections and interactivity! And guess what, you'd create a rendering engine :)

My earlier reply has a link to how GPUs actually push pixels to the screen.

There are also some excellent blog posts on how line rendering is done:

https://almarklein.org/triangletricks.html

https://almarklein.org/line_rendering.html

▲klaussilveira 118 days ago

Very cool to see imgui empowering so many different things.

▲fpl-dev 118 days ago

We love imgui! Big thanks to the imgui devs, and Pascal Thomet who maintains the python bindings for imgui-bundle, and https://github.com/panxinmiao who made an Imgui Renderer for wgpu-py!

▲rossant 118 days ago

Imgui is awesome! Thanks for mentioning imgui-bundle—I hadn’t heard of it before, but it looks great! [1]

[1] https://github.com/pthom/imgui_bundle

▲vibranium 118 days ago

I’m often working with a windows desktop and a remote Linux box on which I have my data & code. I’d like to plot “locally” on my desktop workstation from the remote host. This usually either means using X11 (slow) or some sort of web-based library like plotly. Does fastplotlib offer any easy solution here?

▲kushalkolar 118 days ago

This is exactly why we use jupyter-rfb, I often have large datasets on a remote cluster computer and we perform remote rendering.

see: https://fastplotlib.org/ver/dev/user_guide/faq.html#what-fra...

▲aplzr 118 days ago

I'm in the same boat as the person you replied to, but have zero experience with remote plotting other that doing static plots in in a remote session in the interactive window provided by VS Code's python extension. Would this also work there, or would I have to start using jupyter notebooks?

▲kushalkolar 118 days ago

non-jupyter notebook implementations have their quirks, eventually we hope to make a more universal jupyter-rfb kind of library, perhaps using anywidget. Anywidget is awesome: https://github.com/manzt/anywidget

People have used fastplotlib and jupyter-rfb in vscode, but it can be troublesome and we don't currently have the resources to figure out exactly why.

▲aplzr 118 days ago

Alright, thanks. I don't particularly like notebook, but this might a reason to give it another go.

▲dan-robertson 117 days ago

I’ve found X11 to be fine, but:

- defaults are often bad. In R there is a way to turn on double-buffering in Cairo to make things fast

- eventually so went for R-inside-orgmode where graphics are written to pngs (fast) and then displayed inside Emacs (fast over X forwarding so long as you aren’t trying to smooth-scroll with an image half-visible in the current window).

▲roter 118 days ago

Very interesting and promising package.

I especially like that there is a PyQt interface which might provide an alternative to another great package: pyqtgraph[0].

[0] https://github.com/pyqtgraph/pyqtgraph

▲kushalkolar 118 days ago

Thanks! I used pyqtgraph for many years and love what can be done by it, we started off wanting to build something like it but based on WGPU and not bound to Qt.

▲clewis7 118 days ago

Thank you for your interest! We have taken a lot of inspiration from pyqtgraph and really like their library.

▲unit149 118 days ago

[dead]

▲abdullahkhalids 118 days ago

Is it possible to put the interactive plots on your website? Or is this a Jupyter notebook only tool.

▲clewis7 118 days ago

See here: https://www.fastplotlib.org/ver/dev/user_guide/faq.html#what...

We are hoping for pyodide integration soon, which would allow fastplotlib to be run strictly in the browser!

▲abdullahkhalids 118 days ago

Thanks. That will be very cool.

▲fpl-dev 118 days ago

In the browser only jupyter for now, you can use voila to make a server based application using jupyter: https://github.com/voila-dashboards/voila

As Caitlin pointed out below pyodide is a future goal.

▲abdullahkhalids 118 days ago

This is very nice. But thinking more along the lines of, can I embed a single interactive widget in a blog post.

▲ivoflipse 118 days ago

Not today, it requires wgpu-py to support running on WASM / pyodide, which it doesn't yet (unfortunately)

▲trostaft 118 days ago

Looks very interesting for interactive visualization. I like the animation interface. Also love imgui, glad to see it here. I wish I had better plotting tools for publication quality images (though, honestly I'm pretty happy with matplotlib).

▲kushalkolar 118 days ago

Thanks! Yup our focus is not publication figures, matplotlib and seaborn cover that space pretty well.

▲meisel 118 days ago

One of the big bottlenecks of plotting libraries is simply the time it takes to import the library. I’ve seen matplotlib being slow to import, and in Julia they even have a “time to first plot” metric. I’d be curious to see how this library compares.

▲clewis7 118 days ago

I think one nice thing that we have tried to do is limit super heavy dependencies and also separate optional dependencies to streamline things.

The quickest install would be `pip install fastplotlib`. This would be if you were interested in just having the barebones (no imgui or notebook) for desktop viz using something like glfw.

We can think about adding in our docs some kind of import time metrics.

▲kushalkolar 118 days ago

Almar did some work on speeding up imports a year ago: https://github.com/fastplotlib/fastplotlib/pull/431

but we haven't benchmarked it yet

▲almarklein 118 days ago

I have a feeling there's room for improvement for importing Pygfx as well. I think we should indeed strive that simple plots load super-quick.

▲ 118 days ago

▲insane_dreamer 117 days ago

Looks great--and meets a significant need. I see you have meshes on the roadmap--very much looking forward to testing that for real-time cortex mapping once available. Kudos.

▲neomantra 117 days ago

I'm exploring something akin to this, but focusing on 3D views via mesh shading, also powered by DuckDB.

Can you describe your cortex mapping data sources (volumetric? approx number of samples?) and is there any open data to grab? What kinds of visualizations/manipulations you would want or is there an existing product to compare to? Thanks :)

EDIT: Confused WGPU with WebGPU, so deleted a sentence.

▲kushalkolar 117 days ago

In the meantime you can use the rendering engine pygfx to create them directly :)

▲gooboo 118 days ago

Yeah, many browsers have webgpu turned off by default, So you're stuck with wasm (wasm Simd if you're lucky)

Hopefully both are implemented.

▲ivoflipse 118 days ago

This library builds upon pygfx and wgpu-py. Unfortunately, the latter doesn't support running on WASM, pyscript or pyodide yet, but there's an issue about it:

https://github.com/pygfx/wgpu-py/issues/407

PRs welcome though :-)

▲almarklein 118 days ago

That's because WebGPU is still experimental. This will change, as it's set to replace WebGL.

Fastplotlib / pygfx are primarily meant to run on desktop. When using it via the notebook the server does the rendering.

As Ivo said, we have plans to support running in the browser via Pyodide, which opens some interesting things, but is not the primary purpose.

▲_aleph2c_ 118 days ago

https://archive.md/G3wj6

▲asterix_pano 118 days ago

Looks very interesting. Does it allow to plot lines of varying thickness?

▲vegabook 117 days ago

Syntax looks matplotlib-ish so we’re going back to 2003

▲jampekka 117 days ago

I'd prefer even more matplotlib-ish. Don't fix what's not broken.

A major reason why other plotting libraries don't take of is use of complicated APIs. But data analysis doesn't need Application Programming Interfaces, it needs User Interfaces.

▲menaerus 117 days ago

> of complicated APIs

of which the matplotlib is the embodiment. Terrible API with terrible terrible performance.

▲jampekka 117 days ago

Not sure what you mean by complicated API. The (pylab) API is a very straightforward (mostly) immediate rendering typeish interface, with a lot of convenient shortcuts for operations used a lot in data analysis.

For architecture astronauts there's also the OOP API over which the pylab API is a wrapper.

Of course there are also a lot of all sorts of declarative APIs, which are popular with people copy-pasting code from cookbooks. These become very painful very fast if you do something that's not in the cookbook.

Matplotlib does struggle with performance in some/many cases, but it has little to do with the API.

▲menaerus 117 days ago

Just my personal experience from using the library for at least 7-8 years. So many things and concepts are glued onto each other, making the API so much non-intuitive whenever you try to do anything more sophisticated that isn't a 1:1 match from examples found in the cookbook. It's really a PITA and performance, I have to say this again, is really really bad. If this had been part of my daily job I would certainly try to switch to something else.

▲jampekka 117 days ago

Performance for animation and (custom) interaction is a real problem. But as for performance being "really really bad", there are not many widely used plotting libraries faster than it, at least for static plotting and zoom/pan interaction.

There are indeed many ad-hoc functions, typically for commonly used cases, and they tend to cover vast majority of common use case with very simple and concise code. If you want something more custom, the underlying artist API is very flexible. But you probably know this based on the 7-8 years?

Things like subplot layouts, data point annotation and legend tweaking can be really painful. Something like a proper box/model CSS layouting would be great.

▲rossant 117 days ago

I feel like everyone has different expectations for a scientific plotting API. The tension between ease of use and expressivity is so strong that a one-size-fits-all solution is unlikely ever to exist.

▲disgruntledphd2 117 days ago

Yeah, I'm not sure why anyone likes matplotlib, but then I guess I liked base-R which is even more niche, so :shrug:.

▲fransje26 117 days ago

> Don't fix what's not broken.

I would argue that the Matplotlib syntax is horribly broken (or rather, the Matlab syntax it historically tried to emulate, and had to stick with for better or worse..)

▲jampekka 117 days ago

What are your issues with the matplotlib API more specifically?

▲fransje26 117 days ago

Complaining about the inconsistencies of the matplotlib interface is pretty much beating a dead horse by now, and has been done repeatedly and in detail by others.

The problems start as soon as you try doing something more than plt.plot(), and you get your first encounter with the maddening interface differences between a single figure plot and a multi-figure plot. And then it spirals out of control from there.

There is no denying that a lot of effort was put in the library over the years, with lots of documentation and examples, but my general experience over the 15 years I've been using it is that as soon as you try doing something slightly different than the defaults, it invariably turns out to be a time-consuming, frustrating endeavour, with no guarantees that you'll get the result you want.

▲jampekka 117 days ago

Sure there are some inconsistencies and legacy, but I wouldn't call that "horribly broken".

You're probably referring to plots with subplots. Those indeed have issues, although mostly not because of the API. This has somewhat improved with the constrained layout, within the old API. There's also now GridSpec for more control. And for EDA those don't really matter much. There are some annoying differences when calling Axis methods vs the global functions (e.g. xlim vs set_xlim).

Tweaking plots exactly as you want can get tricky, although for that the artist API can get you more or less anything you want. Care to share what's the library that gives you guaranteed results in no time and with no frustrations?

▲fransje26 117 days ago

> You're probably referring to plots with subplots. Those indeed have issues, [..] There are some annoying differences when calling Axis methods vs the global functions [..]

When plotting is the basis of what a library does, and there are annoying differences encountered at such a very basic usage level, then it is not completely unreasonable to express some grievance about the syntax imposed to the user.. It is a frustrating user-experience to start encountering issues at such a fundamental level.

Tweaking plots, axis and layouts is tricky. Animating a plot with a bit of control is non-trivial, although I am prepared to concede that the two are different beasts.

My most recent annoyance was for something that seemed superficially easy: duplicating a left axis to a right axis, with a different label text, keeping the "original" grid and limits. Think of degree Kelvin on the left, and the equivalent in Celsius as a right axis. After more than 30 minutes of trying, I simply gave up as it was way beyond the amount of time I could justify spending on a single plot.

▲kushalkolar 117 days ago

Take a look at the examples gallery, it differs greatly from matplotlib. But I guess that will be subjective

▲MortyWaves 118 days ago

Another tool that requires precise control over memory layout, bandwidth, performance… using Python.

▲rossant 118 days ago

... using Python... itself leveraging NumPy, C, the GPU...

▲asangha 118 days ago

>sine_wave.colors[::3] = "red"

I never knew I needed this until now

▲kushalkolar 118 days ago

We offer a lot of ways to slice colors, set cmaps and cmap transforms, they are really useful in neuroscience:

https://fastplotlib.org/ver/dev/_gallery/line/line_colorslic...

https://fastplotlib.org/ver/dev/_gallery/line/line_cmap_more...

https://fastplotlib.org/ver/dev/_gallery/line/line_cmap.html...

And with collections if you want to go crazy: https://fastplotlib.org/ver/dev/_gallery/line_collection/lin...

▲ 118 days ago

▲sfpotter 118 days ago

Very cool effort. That said, and it's probably because of the kind of work that I do, but I have almost never found the four challenges to be any kind of a problem for me. Although I do think there is some kind of contradiction there. Plotting (exploratory data analyis ("EDA"), really) is all about distilling key insights and finding features hidden in data. But you have to some kind of intuition about where the needle in the haystack is. IME, throwing up a ton of plots and being able to scrub around in them never seems to provide much insight. It's also very fast---usually the feedback loop is like "make a plot, go away and think about it for an hour, decide what plot I need to make next, repeat". If there is too much data on the screen it defeats the point of EDA a little bit.

For me, matplotlib still reigns supreme. Rather than a fancy new visualization framework, I'd love for matplotlib to just be improved (admittedly, fastplotlib covers a different set of needs than what matplotlib does... but the author named it what they named it, so they have invited comparison. ;-) ).

Two things for me at least that would go a long way:

1) Better 3D plotting. It sucks, it's slow, it's basically unusable, although I do like how it looks most of the time. I mainly use PyVista now but it sure would be nice to have the power of a PyVista in a matplotlib subplot with a style consistent with the rest of matplotlib.

2) Some kind of WYSIWYG editor that will let you propagate changes back into your plot easily. It's faster and easier to adjust your plot layout visually rather than in code. I'd love to be able to make a plot, open up a WYSIWYG editor, lay things out a bit, and have those changes propagate back to code so that I can save it for all time.

(If these features already exist I'll be ecstatic ;-) )

▲kkoncevicius 118 days ago

I have to agree with your point about EDA. The library is neat, but even the example of covariance matrix animation is a bit contrived.

Every pixel has a covariance with every other pixel, so sliding though the rows of the covariance matrix generates as many faces on the right as there are pixels in a photograph of a face. However the pixels that strongly co-vary will produce very similar right side "face" pictures. To get a sense of how many different behaviours there are one would look for eigenvectors of this covariance matrix. And then 10 or so static eigenvectors of the covariance matrix (eigenfaces [1]) would be much more informative than thousands of animated faces displayed in the example.

Some times a big interactive visualisation can be a sign of not having a concrete goal or not knowing how to properly summarise. After all that's the purpose of a figure - to highlight insights, not to look for ways to display the entire dataset. And pictures that try to display the whole dataset end up shifting the job of exploratory analysis to a visual space and leave it for somebody else.

Thou of course there are exceptions.

[1]: https://en.wikipedia.org/wiki/Eigenface

▲fpl-dev 118 days ago

Hi, one of the other devs here. As the poster below pointed out what you're missing is that in this case we know that an eigendecomposition or PCA will be useful. However if you're working on matrix decomposition algorithms like us, or if you're trying to design new forms of summary matrices because a covariance matrix isn't informative for your type of data then these types of visualizations are useful. We broadly work on designing new forms of matrix decomposition algorithms so it's very useful to look at the matrices and then try to determine what types of decompositions we want to do.

▲sfpotter 118 days ago

I've also worked on designing new matrix decompositions, and I've never found the need for anything but `imshow`...

▲fpl-dev 118 days ago

ok, different libraries have different use cases, the type of data we work with absolutely necessitates dynamic visualization. You wouldn't view a video with imshow would you?

▲sfpotter 118 days ago

Every time I've needed to scrub through something in time like that, dumping a ton of frames to disk using imshow has been good enough. Usually, the limiting factor is how quickly I can generate a single frame.

It's hard for me to imagine what you're doing that necessitates such fancy tools, but I'm definitely interested to learn! My failure of imagination is just that.

▲fpl-dev 118 days ago

The example from the article with the subtitle "Large-scale calcium imaging dataset with corresponding behavior and down-stream analysis" is a good example. We have brain imaging video that is acquired simultaneously with behavioral video data. It is absolutely essential to view the raw video at 30-60Hz.

▲wtallis 118 days ago

Aren't you missing the entire point of exploratory data analysis? Eigenfaces are an example of what you can come up with as the end product of your data exploration, after you've tried many ways of looking at the data and determined that eigenfaces are useful.

Your whole third paragraph seems to be criticizing the core purpose of exploratory data analysis as though one should always be able to skip directly to the next phase of having a standardized representation. When entering a new problem domain, somebody needs to actually look at the data in a somewhat raw form. Using the strengths of the human vision system to get a rough idea of what the typical data looks like and the frequency and character of outliers isn't dumping the job of exploratory data analysis onto the reader, it's how the job actually gets done in the first place.

▲fpl-dev 118 days ago

> Using the strengths of the human vision system to get a rough idea of what the typical data looks like and the frequency and character of outliers isn't dumping the job of exploratory data analysis onto the reader, it's how the job actually gets done in the first place.

Yup this is a good summary of the intent, we also have to remember that the eigenfaces dataset is a very clean/toy data example. Real datasets never look this good, and just going straight to an eigendecomp or PCA isn't informative without first taking a look at things. Often you may want to do something other than an eigendecomp or PCA, get an idea of your data first and then think about what to do to it.

Edit: the point of that example was to show that visually we can judge what the covariance matrix is producing in the "image space". Sometimes a covariance matrix isn't even the right type of statistic to compute from your data and interactively looking at your data in different ways can help.

▲kkoncevicius 118 days ago

As a whole, of course you have a point - big visualisations when done properly should help with data exploration. However, from my experience they rarely (but not never) do. I think it's specific to the type of data you work with and the visualisation you employ. Let me give an example.

Imagine we have some big data - like an OMIC dataset about chromatin modification differences between smokers and non-smokers. Genomes are large so one way to visualise might be to do a manhattan plot (mentioned here in another comment). Let's (hypothetically) say the pattern in the data is that chromatin in the vicinity of genes related to membrane functioning have more open chromatin marks in smokers compared to non smokers. A manhattan plot will not tell us that. And in order to be able to detect that in our visualisation we had to already know what we were looking for in the first place.

My point in this example is the following: in order to detect that we would have to know what to visualise first (i.e. visualise the genes related to membrane function separately from the rest). But then when we are looking for these kinds of associations - the visualisation becomes unnecessary. We can capture the comparison of interest with a single number (i.e. average difference between smokers vs non-smokers within this group of genes). And then we can test all kinds of associations by running a script with a for-loop in order to check all possible groups of genes we care about and return a number for each. It's much faster than visualisation. And then after this type of EDA is done, the picture would be produced as a result, displaying the effect and highlighting the insights.

I understand your point about visualisation being an indistinguishable part of EDA. But the example I provided above is much closer to my lived experience.

▲sfpotter 118 days ago

Yeah, I agree with the general sentiment of what you're saying.

Re: wtallis, I think my original complaint about EDA per se is indeed off the mark.

Certainly creating a 20x20 grid of live-updating GPU plots and visualizations is a form of EDA, but it seems to suggest a complete lack of intuition about the problem you're solving. Like you're just going spelunking in a data set to see what you can find... and that's all you've got; no hypothesis, no nothing. I think if you're able to form even the meagerest of hypotheses, you should be able to eliminate most of these visualizations and focus on something much, much simpler.

I guess this tool purports to eliminate some of this, but there is also a degree of time-wasting involved in setting up all these visualizations. If you do more thinking up front, you can zero in on a smaller and more targeted subset of experiments. Simpler EDA tools may suffice. If you can prove your point with a single line or scatter plot (or number?), that's really the best case scenario.

▲macleginn 118 days ago

Eigendecomposition of the covariance matrix, essentially PCA, is probably the first non-trivial step in the analysis of any dataset. The idea in the comment above seems to be that it's more useful to combine some basic knowledge of statistics with simpler visualisation techniques, rather than to quickly generate thousands of shallower plots. Being able to generate thousands of plot is useful, of course, but I would agree that promoting good data-analysis culture is more beneficial.

▲wtallis 118 days ago

> Eigendecomposition of the covariance matrix, essentially PCA, is probably the first non-trivial step in the analysis of any dataset

For a sufficiently narrow definition of "dataset", perhaps. I don't think it's the obvious step one when you want to start understanding a time series dataset, for example. (Fourier transform would be a more likely step two, after step one of actually look at some of your data.)

▲mturmon 118 days ago

I agree, but: the technique of “singular spectrum analysis” is pretty much PCA applied to a covariance matrix resulting from time-lagging the original time series. (https://en.wikipedia.org/wiki/Singular_spectrum_analysis)

So this is not unheard of for time series analysis.

▲fpl-dev 118 days ago

Exactly that's a good example!

▲ 118 days ago

▲hatthew 118 days ago

For me, one of the most annoying things in my workflow is when I'm waiting for the software to catch up. If I'm making a plot, there's a lot of little tweaks I want to do to visually extract the maximum amount of information from a dataset. For example, if I'm making a histogram, I may want to adjust the number of bins, change to log scale, set min/max to remove outliers, and change the plot size on page. For the sake of the argument, let's say I'm working with a set of 8 slices of the dataset, so I need to regenerate 8 plots every time I make a tweak. My workflow is: Code the initial plots with default settings, run numpy to process the data, run matplotlib to display the data, look at the results, make tweaks to the code, circle back to step 2. In that cycle, "wait for matplotlib to finish generating the plots" can often be one of the longest parts of the cycle, and critically it's the vast majority of the cumulative time that I'm waiting rather than actively doing something. Drawing plots should be near instantaneous; there's an entire industry devoted to drawing complicated graphics in 16ms or less, I shouldn't need to wait >100ms for a single 2d grid with some dots and lines on it.

Matplotlib is okay, but there's definitely room for improvement, so why not go for that improvement?

▲sfpotter 118 days ago

I think this varies a lot depending on what you're doing.

I agree 100% that matplotlib is really slow and should be made to run as fast as humanly possible. I would add a (3) to my list above: optimize matplotlib!

OTOH, at least for what I'm doing, the code that runs to generate the data that gets plotted dominates the runtime 99% of the time.

For me, adjusting plots is usually the time waster. Hence point (2) above. I'd love to be able to make the tweaks using a WYSIWYG editor and have my plotting script dynamically updated. The bins, the log scale, the font, the dpi, etc, etc.

I think with your 8 slices examples above: my (2) and (3) would cover your bases. In your view, is the rest of matplotlib really so bad that it needs to be burnt to the ground for progress to be made?

▲hatthew 118 days ago

Yeah, I'd love it if mpl could be optimized. I do think that it has a lot of weird design decisions that could justify burning it down and starting from scratch (e.g. weird mix of stateful and stateless api), but I've already learned most of its common quirks so I selfishly don't care anymore, and my only significant complaint is that I want it to be faster :)

edit: regarding runtime, I'm sure this varies a lot based on usecase, but for my usual usecase I store a mostly-processed dataset, so the additional processing before drawing the data is usually minimal.

▲paddy_m 118 days ago

I'd be curious to hear more about your EDA workflow.

What I want for EDA is a tool that let's me quickly toggle between common views of the dataset. I run through the same analysis over and over again, I don't want to type the same commands repeatedly. I have my own heuristics for which views I want, and I want a platform that lets me write functions that express those heuristics. I want to build the inteligence into the tool instead of having to remember a bunch of commands to type on each dataframe.

For manipulating the plot, I want a low-code UI that lets me point and click the operations I want to use to transform the dataframe. The lowcode UI should also emit python code to do the same operations (so you aren't tied to a low-code system, you just use it as a faster way to generate code then typing).

I have built the start of this for my open source datatable UX called Buckaroo. But it's for tables, not for plotting. The approach could be adapted to plotting. Happy to collaborate.

▲jampekka 118 days ago

At least I usually do prefer to do the EDA plotting by writing and editing code. This is a lot more flexible. It's relatively rare to need other interactivity than zooming and panning.

The differing approaches probably can be seen in some API choices, although the fastplotlib API is a lot more ergonomic than many others. Having to index the figure or prefixing plots with add_ are minor things, and probably preferable for application development, but for fast-iteration EDA they will start to irritate fast. The "mlab" API of matplotlib violates all sorts of software development principles, but it's very convenient for exploratory use.

Matplotlib's performance, especially with interaction and animation, and clunky interaction APIs are definite pain points, and a faster and better interaction supporting library for EDA would be very welcome. Something like a mlab-type wrapper would probably be easy to implement for fastplotlib.

And to bikeshed a bit, I don't love the default black background. It's against usual conventions, difficult for publication and a bit harder to read when used to white.

▲paddy_m 118 days ago

Writing and editting code is a lot more flexible, but it gets repetitive, and I have written the same stuff so many times. It's all adhoc, and it fixes the problem at the time, then it gets thrown away with the notebook only to be written again soon.

As an example, I frequently want to run analytics on a dataframe. More complex summary stats. So you write a couple of functions, and have two for loops, iterating over columns and functions. This works for a bit. It's easy to add functions to the list. Then a function throws an error, and you're trying to figure out where you are in two nested for loops.

Or, especially for pandas, you want to separate functions to depend on the same expensive pre-calc. You could pass the existing dict of computed measures so you can reuse that expensive calculation... Now you have to worry about the ordering of functions.

So you could put all of your measures into one big function, but that isn't reusable. So you write your big function over and over.

I built a small dag library that handles this, and lets you specify that your analysis requires keys and provides keys, then the DAG of functions is ordered for you.

How do other people approach these issues?

▲kkoncevicius 118 days ago

I work with R and not python, so some things might not apply, but this:

> [...] it fixes the problem at the time, then it gets thrown away with the notebook only to be written again soon.

Is one of the reasons I stopped using notebooks.

One solution to your problem might be to create a simple executable script that, when called on the file of your dataset in a shell, would produce the visualisation you need. If it's an interactive visualisation then I would create a library or otherwise a re-usable piece of code that can be sourced. It takes some time but ends up saving more time in the end.

If you have custom-made things you have to check on your data tables, then likely no library will solve your problem without you doing some additional the work on top.

And for these:

> Or, especially for pandas, you want to separate functions to depend on the same expensive pre-calc. [...] Now you have to worry about the ordering of functions.

I save expensive outputs to intermediate files, and manage dependencies with a very simple build-system called redo [1][2].

[1]: http://www.goredo.cypherpunks.su

[2]: http://karolis.koncevicius.lt/posts/using_redo_to_manage_r_d...

▲paddy_m 118 days ago

Thanks. I see how redo works.

For larger datasets, real scripts are a better idea. I expect my stuff to work with datasets up to about 1Gb, caching is easy to layer on and would speed up work for larger datsets, but my code assumes the data fits in memory. It would be easier to add caching, the make sure I don't load an entire dataset into memory. (I don't serialize the entire dataframe to the browser though).

▲jampekka 118 days ago

Usually I write scripts that use function memoization cache (to disk) for expensive operations. Recently I've also used Marimo sometimes, which has great support for modules (no reloading hacks), can memoize to disk and has deterministic state.

▲simply_anyone 118 days ago

I agree with you sfpotter, very interesting. Looks in some ways similar to PyQtGraph regarding real time plotting.

I agree with you regarding matplotlib, although I find a lot of faults/frustration in using it. Both your points on 3D plotting and WYSIWYG editor would be extremely nice and as far as I know nothing exists in python ticking these marks. For 3D I typically default to Matlab as I've found it to be the most responsive/easy to use. I've not found anything directly like a WYSIWYG editor. Stata is the closest but I deplore it, R to some extent has it but if I'm generating multiple plots it doesn't always work out.

I'm surprised by what you said about "EDA". I find the opposite, a shotgun approach, exploring a vast number of plots with various stratifications gives me better insight. I've explored plotting across multiple languages (R,python,julia,stata) and not found one that meets all my needs.

The biggest issue I often face is I have 1000 plots I want to generate that are all from separate data groups and could all be plotted in parallel but most plotting libraries have holds/issues with distribution/parallelization. The closest I've found is I'll often build up a plot in python using a Jupyter notebook. Once I'm done I'll create a function taking all the needed data/saving a plot out, then either manually or with the help of LLMs convert it to julia which I've found to be much faster in loading large amounts of data and processing it. Then I can loop it using julia's "distributed" package. Its less then ideal, threaded access would be great, rather then having to distribute the data, but I've yet to find something that works. I'd love a simple 2D EDA plotting library that has basic plots like lines, histograms (1/2d), scatter plots, etc, has basic colorings and alpha values and is able to handle large amounts (thousands to millions of points) of static data and plot it saving to disk parallelized. I've debated writing my own library but I have other priorities currently, maybe once I finish my PhD.

▲selimthegrim 118 days ago

Interested to hear what your PhD is in.

▲tomjakubowski 118 days ago

For point (2), have you tried the perspective-viewer library? You can make edits in the UI and then use the "debug view" to copy and paste the new configuration back into your code.

https://perspective.finos.org/

▲benbojangles 118 days ago

I agree on the refinement of matplotlib, we all need it to be better at resource handling, lower memory use, it often get boggy quickly.

▲mhh__ 118 days ago

My hot take is that 3D plotting feels bad because 3D plots are bad. You can usually find some alternative way of representing the data

▲sfpotter 118 days ago

I work on solving 3D problems: numerical methods for PDEs in R^3, computational geometry, computational mechanics, graphics, etc. Being able to make nice 3D plots is super important for this. I agree it's not always necessary, and when a 2D plot suffices, that's the way to go, but that doesn't obviate my need for 3D plots.

▲bee_rider 118 days ago

3D plots might be neat if there was some widespread way of displaying them. Unfortunately we can only make 2D projections of 3D plots on our computer screens and pieces of paper.

Maybe VR will change that at some point. :shrug:

▲jampekka 118 days ago

This is the correct take. There are almost always better ways to plot three dimensional data than trying to project 3D geometry to 2D.

▲ 118 days ago

▲rossant 118 days ago

Shameless plug: I'm actively working on a similar project, Datoviz [1], a C/C++ library with thin Python bindings (ctypes). It supports both 2D and 3D but is currently less mature and feature-complete than fastplotlib. It is also lower level (high-level capabilities will soon be provided by VisPy 2.0 which will be built on top of Datoviz, among other possible backends).

My focus is primarily on raw performance, visual quality, and scalability for large datasets—millions, tens of millions of points, or even more.

[1] https://datoviz.org/

▲Spoilage4218 118 days ago

I have always admired your datoviz library from afar and check the vispy2/vispy2-sandbox libraries on GitHub every few months to check up on it. When do you think 'soon' is?? Really looking forward to it!

▲rossant 118 days ago

Thanks! The code is currently managed by Nicolas Rougier in a GitHub repository that will be made public next week. This repository hosts the "graphics server protocol" (GSP), an intermediate layer between Datoviz and the future high-level plotting API. For the latter, we’ll need community feedback to shape an API philosophy that aligns with VisPy users' needs—let's aim to publish a write-up this month.

Implementing the API on top of GSP should be relatively straightforward, as the core graphics-related mechanisms are handled by GSP/Datoviz. We've created a Slack channel for discussions—contact me privately if you'd like to join.

▲rossant 117 days ago

I wrote a quick draft outlining the vision here. [1]

[1] https://github.com/vispy/vispy2/blob/main/ARCHITECTURE.md

▲cycomanic 118 days ago

Cool to see you on here Cyrille, I've been following your work (and Nicolas's) for a long time. Thanks for all the cool stuff you've been doing!

▲rossant 117 days ago

Thank you!

▲ 118 days ago

▲749402826 118 days ago

"Fast" is a bold claim, given the complete lack of benchmarks and the fact that it's written entirely in Python...

▲paddy_m 118 days ago

I'm certain the host heavy lifting is done by numpy which is a python wrapper around Fortran and C. The visualization heavy lifting is done by pygfx/wgpu-py. wgpu-py has C. I think wgpu-py compiles to WASM to run in the browser. More and more packages are taking this route.

[1] https://github.com/pygfx/pygfx [2] https://github.com/pygfx/wgpu-py

▲almarklein 118 days ago

All true, except the bit that wgpu-py compiles to WASM. It's all desktop.

In the plans that we do have for running the browser, Fastplotlib, Pygfx and wgpy-py will still be Python, running on CPython that is compiled to WASM (via Pyodide). But instead of wgpu-py cffi-ing into a C library, it would make JS calls to the WebGPU API.

▲kushalkolar 118 days ago

In fastplotlib at the end of the day everything is wgpu under the hood, and as the other poster correctly pointed out about numpy being fortran and C wrappers.

▲ 118 days ago

▲ZeroCool2u 118 days ago

Seems like a nice library, but I have a hard time seeing myself using it over plotly. The plotly express API is just so simple and easy. For example, here's the docs for the histogram plot: https://plotly.com/python/histograms/

This code gives you a fully interactive, and performant, histogram plot:

```python

import plotly.express as px df = px.data.tips() fig = px.histogram(df, x="total_bill") fig.show()

```

▲kushalkolar 118 days ago

Different use cases :) Plotly doesn't give the performance and interactive tools required for many neuroscience visualizations. We also focus more on the primitive graphics and, at least not yet, on the more complex "composite" graphics built with primitives like histograms.

▲ 118 days ago

▲Starlord2048 118 days ago

[flagged]

▲dang 118 days ago

Please stop.

▲qoez 118 days ago

[flagged]

▲dang 118 days ago

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

▲qoez 118 days ago

I appreciate the warning and if it's not by claude I apologize, but I do think we should be allowed to express scepticism if things posted are just AI slop (and if we have to fear getting banned or what-have-you as a consequence I genuinely think that's worse for HN long term than the alternative).

▲pvg 118 days ago

If the skepticism is based on nothing but vibes such commentary is functionally equivalent to something the site guidelines ask you to avoid as it is.

▲dang 118 days ago

Don't worry, we wouldn't ban anyone (just) for this. I agree with you that it's a grey area and will take time to work out.

▲kushalkolar 118 days ago

I dunno why you'd say this, neither of us are fans of LLMs and most of this was written before LLMs were a thing :)

▲janalsncm 118 days ago

Maybe Claude was trained on your code. You should take it as a compliment.

▲bdangubic 118 days ago

asked claude, said it didn’t do it :)