BioHacker News | Patterns for Building Realtime Features

▲Patterns for Building Realtime Features(zknill.io)

161 points by zknill 30 days ago | 8 comments

Can we not use "realtime" for what is just really "interactive" features? Or even just "duplex" ?

"realtime" has a long history in the computing and technology realms, and this ain't it.

▲sb8244 30 days ago

Realms matter. I don't really feel that holding terms hostage between hardware and software worlds is worth much energy.

People that know enough to care will know within 1 second what the article is about.

"Soft real-time" is probably the correct term here, but that is actually more confusing for 99% of people.

"Interactive" is not descriptive. "Duplex" is certainly not obvious.

▲PaulDavisThe1st 30 days ago

This has nothing to do with soft realtime, hard realtime or realtime in any of its more traditional senses.

TFA is just about the design and deployment of a two-way ("duplex") communication system that makes distributed applications "feel modern, collaborative, and up-to-date"

These sorts of systems have existed for decades; TFA provides a brief overview of 3 design patterns associated with them.

▲cowboy_henk 30 days ago

The wikipedia page on real-time seems to agree with OP that this would be a soft real-time system: https://en.wikipedia.org/wiki/Real-time_computing

In particular

> Soft real-time systems are typically used to solve issues of concurrent access and the need to keep a number of connected systems up-to-date through changing situations

That's exactly what this post is about. "Real-time" on the web often just means an app that allows multiple users to concurrently make updates to some piece of state, and for the changes to that state to be broadcast to all users so that all clients are kept consistent within a reasonable time frame, preferably as quickly as possible.

While the deadlines aren't as hard as in e.g. audio programming, "real-time multiplayer" apps can be said to be broken if there is a very large noticeable delay between a user editing something and the other users seeing that edit reflected in their local client.

▲MisterTea 30 days ago

> "Real-time" on the web often just means an app that allows multiple users to concurrently make updates to some piece of state, and for the changes to that state to be broadcast to all users so that all clients are kept consistent within a reasonable time frame, preferably as quickly as possible.

Hard disagree. There is no deadline to meet time wise, only that a specific task must be completed in a specific order as quickly as possible. This implies managing state while keeping latency low which is a QoS problem.

▲MisterTea 30 days ago

> This has nothing to do with soft realtime, hard realtime or realtime in any of its more traditional senses.

Agreed. The best term to use for this scenario is low-latency as real time implies a deadline. low-latency deals with QoS where you ascribe a quality parameter to the service.

▲Animats 30 days ago

Indeed. This is just about user interfaces driven by remote events.

There are patterns for "real time". Things such as:

- What matters is the slowest case, not the average case.

- What has to be real time, and what can run in background?

- Avoiding priority inversions.

- Is there a stall timer that trips if you miss the control loop timing? What happens when the stall timer trips?

- What influence do caches have on time repeatability? What's the worst case? Can you keep the worst case from being the nothing-in-cache case?

▲ 30 days ago

▲paulgb 30 days ago

“Interactive” is overly broad and doesn’t imply multi-user sync. “Duplex” tells me about the transport layer, not the desired functionality — some of the patterns in the article are achieved with half-duplex transports.

“Realtime” in this sense is akin to real-time strategy games which has been a named genre since at least 1992 (and is also not real-time in the classical computing sense).

▲atoav 30 days ago

Let us admit that realtime even it its most traditional use is quite foggy.

It basically just referes to a system that reacts in a predictable time. Traditionally that time is short, but that isn't part of the definition.

As a audio DSP programmer my deadline is determined by the samplerate and the size of my buffer, and that is a hard deadline, as any dropouts might be audible. For game people their deadline is more flexible as there the framerate can drop without it being completely unacceptable. How low of an framerate is too low isn't a clear deadline, but the clear deadline isn't part of the definition. You can run any realtime code in an environment where it can't perform.

So what about a webservice that guarantees that every internal state update is communicated to clients via a websocket within an acceptable timeframe? Like the game people they don't have a clearly formalized deadline, but when a websocket update takes a minute that would surely be unacceptable. So the original definition of "a system that reacts in a predictable time" can still hold — just not for the reason that people colloquially think ("I get updates as they happen or, in realtime").

And to be frank, I think when some dev says "realtime updates" we all understand what they meant and that is the point of language right?

▲packetlost 30 days ago

Having worked on hard realtime systems for years, the definition is quite clear. Everything you describe falls into the soft realtime bucket because missing a deadline isn't a system fault, it's just undesirable, not to mention that system load can and does affect hitting deadlines. "Soft realtime" is fairly broad term that gets abbreviated and thrown around in domains in which it is not appropriate. What this article describes is really just a UI and architecture that enables live updates. There's no deadline, there's not even a concept of wall clock time baked into the system.

We understand despite the muddling of terms, but let's not pretend that precise language isn't desirable to the best extent possible.

▲autocole 30 days ago

I like the terms multiplayer and collaborative

▲recroad 30 days ago

It’s amazing how much boilerplate stuff you don’t have to worry about when you use Phoenix LiveView. I think I’m in love with it.

▲rozap 30 days ago

I've been writing elixir for years (i think since 0.14?) and have been writing liveview for years. I'm all in on elixir in general, but I'm not sure I'd use liveview for another big project. Maybe it's just rose colored glasses because the TS/React world certainly has its own issues, but I think TS/react and regular old phoenix is a sweet spot. The composability of live view components still has a number of footguns (duplicate IDs, etc) and I think it has higher coupling than a nicely structured react app. You also have to treat liveviews and components fairly differently. All these design choices are for good reason, but it ends up being annoying in a large app. Also in deeply nested trees with lots of component reuse, static typing from TS really helps with refactors. As the projects grew to be large, I think I'm more productive with phoenix/TS than I am with phoenix/liveview.

I think there's a certain class of small to medium sized application (I'm building several) where liveview can be a good fit, but after writing it professionally and in hobby projects for several years, I'm less convinced that it's a great solution everywhere.

▲causal 30 days ago

Thanks for taking the time to write this, I feel like I'm always seen the excitement of new users and appreciate some criticism from a veteran.

I've been considering LiveView for a real time application I need to make, but haven't been sure whether it's worth the effort of learning Elixir.

▲rozap 30 days ago

For a real time application, there is no better option than elixir and phoenix if you actually want to get something working and shipped. Things that you will need to do which would be wildly complex in other stacks are so simple in erlang and elixir. I highly recommend taking the time to learn it - and it's a fairly simple language so that process should go pretty quickly. The debate about Liveview vs React+Phoenix channels is an exercise left up to the reader.

▲ahmaman 30 days ago

Curious to hear if anyone have experience using Svelte together with LiveSvelte [0], as described in this blog post [1]?

In theory, it sounds like this could be the sweet-spot for using the Javascript ecosystem where its required where LiveView fall short.

[0]: https://github.com/woutdp/live_svelte

[1]: https://dockyard.com/blog/2024/03/14/harnessing-liveview-and...

▲recroad 29 days ago

I'm coming from the opposite. Years of Phoenix/TS and finally got tired of writing boilerplate code serializing and deserializing data just to make UIs work. I now find myself focusing 95% of my effort on adding business value rather than dealing with back-end/front-end stuff. Reminds me of the old JSP/ASP days when everything was sever-side.

I hear you on the typing part, but Elixir is taking a step in that direction with 1.8 and I only expect that to get better. I'm going to the Elixir Conf in Poland in May and hoping to learn more on what's next. So excited.

▲ellieh 30 days ago

Came to the comments to say this. As I've been learning Elixir + LiveView, I've been consistently surprised at the amount you get for "free"

▲mervz 30 days ago

I have not enjoyed a language and framework like I'm enjoying Elixir and Phoenix! It has become my stack for just about everything.

▲Tolexx 30 days ago

I think Elixir/Phoenix has to be the best stack currently for building applications with real-time features and that's all thanks to BEAM. Perhaps Go is another decent alternative but I definitely prefer BEAM's concurrency model.

▲pawelduda 30 days ago

Exactly! I was halfway through the article and thought how LiveView is basically equivalent of the "push ops" pattern described but beautifully abstracted away and comes for free, while you (mostly) write dynamic HTML markup. Magic!

▲jtwaleson 30 days ago

I'm building a simple version with horizontally scalable app servers that each use LISTEN/NOTIFY on the database. The article says this will lead to problems and you'll need PubSub services, but I was hoping LISTEN/NOTIFY would easily scale to hundreds of concurrent users. Please let me know if that won't work ;)

Some context: The use case is a digital whiteboard like Miro and the heaviest realtime functionality will be tracking all of the pointers of all the users updating 5x per second. I'm not expecting thousands/millions of users as I'm planning on running each instance of the software on-prem.

▲zknill 30 days ago

I work on the team that built a Postgres pub/sub connector at Ably (we call it LiveSync).

Our postgres connector also works on LISTEN/NOTIFY and has horizontally scalable consumers that will share the load.

There's two types of LISTEN/NOTIFY you can use. Either have the notify event carry the payload or have the notify event tell you of a change and then query for the data. If you choose the second option you'll get much better resilience under load, as the back pressure is contained in the table rather than dropped on NOTIFY.

If you do go with a poll-on-change design, you'll likely benefit from some performance tuning around denouncing the polls and how big a batch size of records to poll for.

As for the exact collaboration features, when clients are online and interacting, it's fairly easy. Lots of the hard stuff comes from knowing when a connection is dropped or when a client goes away. Detecting this and updating the other clients can be hard.

Another team at Ably worked on that problem, and called it Spaces.

▲jtwaleson 30 days ago

Thank you! As for a bit more context, I have an tokio async Rust app which does LISTEN and forwards all messages to a (for now unbounded) tokio::sync::broadcast channel. As the Rust program handles the messages extremely quickly, it should take the load off of Postgres. If a client disconnects/starts queuing up messages then of course we might get a memory problem in the app, but the messages are only 20 bytes or so.

▲oa335 30 days ago

I assume you are using Postgres? If so i believe you will be limited by max_notify_queue_pages.

I would instead try would be to create an event table in the db, then create a publication on that table and have clients subscribe to that publication via logical replication. You would have to increase the number of replication slots though.

▲jtwaleson 30 days ago

Yes, Postgres, will check it out, thank you!

▲jamil7 30 days ago

Yeah this is a fancier version of what we do at work, although we just poll the event table.

▲paulgb 30 days ago

One thing to keep in mind is that when you LISTEN on a connection, you remove it from the connection pool until you are done with it. The straightforward approach of using one connection per connected client can quickly consume all of the connections in your connection pool and prevent you from handling additional requests.

What I’ve done for this in the past is to do all subscriptions over a single connection per server, and then have code in the server that receives all of the events and dispatches them internally based on a lookup table.

▲jtwaleson 30 days ago

That's indeed what I do, the application has one LISTEN connection and distributes the messages over its connected clients.

▲pawelduda 30 days ago

I think you'll for an app like this you'll eventually want to go with something that was designed for a high throughput (I don't think LISTEN/NOTIFY was). Yes, it's nice because you get it out of the box with Postgres, but you'll find that it's not resilient, you need to be careful with managing DB connections (in your case of hundreds/thousands users it is a significant thing), and it's lackluster compared to a specialized PubSub.

▲calebio 30 days ago

This article ended too soon :( I was having a really good time reading it, very nice work!

▲martinsnow 30 days ago

How do you handle deployments of realtime back ends which needs state in memory?

▲zknill 30 days ago

The other commenters have mentioned doing deploys behind a proxy, which is fine, but eventually you're going to have to re-deploy the component that terminates the client connections (i.e. websocket server or SSE, etc).

From the client's perspective, there's not a lot of difference between the server dropping the connection (on redeploy) or the connection being dropped for some other transient reason.

That is to say, with a decent client side handling of connection state you just incrementally rollout your new servers and each server terminates its connections triggering reconnects from the clients.

The hardest part is often maintaining continuity on some stream of events. That is; picking up exactly where you were before the connection dropped. You need some mechanism for the client to report the event it last received, and some way to "rewind" back to that point on the stream.

▲cess11 30 days ago

On BEAM/OTP you can control how state is handled at code updates. Finicky but you can.

In most other contexts you'd externalise state to a data store like Redis or RDBMS, and spawn one, kill one or do blue-green in the nebula behind your load balancer constellation.

▲mtndew4brkfst 30 days ago

BEAM hot code deploys are a vanity tactic in most business verticals, not worth the time and energy to most - nor are they something you can feasibly implement at a polyglot organization. Rolling deploys behind an off the shelf load-balancer setup is perfectly adequate for an overwhelming number of the businesses who'll adopt Elixir for web dev.

▲paulgb 30 days ago

I work on an open source project for doing just that: https://plane.dev

It follows the model used by Figma where each active document gets its own process on a cluster of machines, and requests are routed to the document server. The advantage of this over something like pubsub is that you have a single authoritative server that can persist the document, instead of having to coordinate multiple sources of truth.

(We also provide a managed version of Plane at https://jamsocket.com)

▲jakewins 30 days ago

In general you do it by doing a failover behind some variation of a reverse proxy.

If you can start new instances quickly and clients can handle short delays you can do it by just stopping the old deployment and starting the new one, booting off of the snapshotted state from the prior deployment.

If you need “instant” you do it by implementing some form of catchup and then fail over.

It is a lot easier to do this if you have a dedicated component that ”does” the failover, rather than having the old and new deployments try to solve it bilaterally. Could just be a script ran by a human, or something like a k8s operator if you do this a lot

▲rurban 29 days ago

Polling in reactive webapps? (this is what he calls "realtime")

Well, don't do that. He by himself advised against it previously https://zknill.io/posts/how-to-adopt-realtime/

And it's really annoying that those web nobs inflate terms. He also talks about hard realtime which has nothing to do with hard realtime. Will his webapps reboot? Ha, of course not. He is talking that reactive webapps becoming simple.

▲deepsun 30 days ago

Isn't this what Firebase and similar solved a long time ago?

Having a local copy of a database slice, and hide the sync from developer at all.

▲blixt 30 days ago

What I found building multiplayer editors at scale is that it's very easy to very quickly overcomplicate this. For example, once you get into pub/sub territory, you have a very complex infrastructure to manage, and if you're a smaller team this can slow down your product development a lot.

What I found to work is:

Keep the data you wish multiplayer to operate on atomic. Don't split it out into multiple parallel data blobs that you sometimes want to keep in sync (e.g. if you are doing a multiplayer drawing app that has commenting support, keep comments inline with the drawings, don't add a separate data store). This does increase the size of the blob you have to send to users, but it dramatically decreases complexity. Especially once you inevitably want versioning support.

Start with a simple protocol for updates. This won't be possible for every type of product, but surprisingly often you can do just fine with a JSON patching protocol where each operation patches properties on a giant object which is the atomic data you operate on. There are exceptions to this such as text, where something like CRDTs will help you, but I'd try to avoid the temptation to make your entire data structure a CRDT even though it's theoretically great because this comes with additional complexity and performance cost in practice.

You will inevitably need to deal with getting all clients to agree on the order in which operations are applied. CRDTs solve this perfectly, but again have a high cost. You might actually have an easier time letting a central server increment a number and making sure all clients re-apply all their updates that didn't get assigned the number they expected from the server. Your mileage may vary here.

On that note, just going for a central server instead of trying to go fully distributed is probably the most maintainable way for you to work. This makes it easier to add on things like permissions and honestly most products will end up with a central authority. If you're doing something that is actually local-first, then ignore me.

I found it very useful to deal with large JSON blobs next to a "transaction log", i.e. a list of all operations in the order the server received them (again, I'm assuming a central authority here). Save lines to this log immediately so that if the server crashes you can recover most of the data. This also lets you avoid rebuilding the large JSON blob on the server too often (but clients will need to be able to handle JSON blob + pending updates list on connect, though this follows naturally since other clients may be sending updates while they connect).

The trickiest part is choosing a simple server-side infrastructure. Honestly, if you're not a big company, a single fat server is going to get you very far for a long time. I've asked a lot of people about this, and I've heard many alternatives that are cloud scale, but they have downsides I personally don't like from a product experience perspective (harder to implement features, latency/throughput issues, possibility of data loss, etc.) Durable Objects from Cloudflare do give you the best from both worlds, you get perfect sharding on a per-object (project / whatever unit your users work on) basis.

Anyway, that's my braindump on the subject. The TLDR is: keep it as simple as you can. There are a lot of ways to overcomplicate this. And of course some may claim I am the one overcomplicating things, but I'd love to hear more alternatives that work well at a startup scale.

▲jvanderbot 30 days ago

I've worked in robotics for a long time. In robotics nowadays you always end up with a distributed system, where each robot has to have a view of the world, it's mission, etc, and also of each other robot, and also the command and control dashboards do too, etc etc.

Always always always follow parent's advice. Pick one canonical owner for the data, and have everyone query it. Build an estimator at each node that can predict what the robot is doing when you don't have timely data (usually just running a shadow copy of the robot's software), but try to never ever do distributed state.

Even something as simple as a map gets arbitrarily complicated when you're sensing multiple locations. Just push everyone's guesses to a central location and periodically batch update and disseminate updates. You'll be much happier.

▲mikhmha 30 days ago

Wow, this sounds like how the AI simulation for my multiplayer game works. Each AI agent has a view of the world and can make local steering decisions to avoid other agents and self preservation. Agents carry out low-level goals that are given to them by squad leaders. A squad leader receives high level "world" objectives from a commander. High-level objectives are broken down into low level objectives distributed among squad units based on their attributes and preferences.

▲rurban 29 days ago

Well, at least don't update multiple servers. Distributed read-only state is ok though, just updates not. They must be centralized.

▲athrun 30 days ago

Thanks for sharing your experience, and what you have found to work.

Sometimes I feel we (fellow HN readers) get caught into overly complex rabbit holes, so it's good to balance it out with some down-to-earth, practical perspectives.

▲NathanFlurry 27 days ago

Durable Objects solves so many problems for realtime & persistence. The biggest problem is they're vendor locked, so we can't use them if we want to keep all of our infrastructure on AWS.

So we built out a library that lets you run Durable Object-like backends on any cloud: https://github.com/rivet-gg/actor-core