"realtime" has a long history in the computing and technology realms, and this ain't it.
People that know enough to care will know within 1 second what the article is about.
"Soft real-time" is probably the correct term here, but that is actually more confusing for 99% of people.
"Interactive" is not descriptive. "Duplex" is certainly not obvious.
TFA is just about the design and deployment of a two-way ("duplex") communication system that makes distributed applications "feel modern, collaborative, and up-to-date"
These sorts of systems have existed for decades; TFA provides a brief overview of 3 design patterns associated with them.
In particular
> Soft real-time systems are typically used to solve issues of concurrent access and the need to keep a number of connected systems up-to-date through changing situations
That's exactly what this post is about. "Real-time" on the web often just means an app that allows multiple users to concurrently make updates to some piece of state, and for the changes to that state to be broadcast to all users so that all clients are kept consistent within a reasonable time frame, preferably as quickly as possible.
While the deadlines aren't as hard as in e.g. audio programming, "real-time multiplayer" apps can be said to be broken if there is a very large noticeable delay between a user editing something and the other users seeing that edit reflected in their local client.
Hard disagree. There is no deadline to meet time wise, only that a specific task must be completed in a specific order as quickly as possible. This implies managing state while keeping latency low which is a QoS problem.
Agreed. The best term to use for this scenario is low-latency as real time implies a deadline. low-latency deals with QoS where you ascribe a quality parameter to the service.
There are patterns for "real time". Things such as:
- What matters is the slowest case, not the average case.
- What has to be real time, and what can run in background?
- Avoiding priority inversions.
- Is there a stall timer that trips if you miss the control loop timing? What happens when the stall timer trips?
- What influence do caches have on time repeatability? What's the worst case? Can you keep the worst case from being the nothing-in-cache case?
“Realtime” in this sense is akin to real-time strategy games which has been a named genre since at least 1992 (and is also not real-time in the classical computing sense).
It basically just referes to a system that reacts in a predictable time. Traditionally that time is short, but that isn't part of the definition.
As a audio DSP programmer my deadline is determined by the samplerate and the size of my buffer, and that is a hard deadline, as any dropouts might be audible. For game people their deadline is more flexible as there the framerate can drop without it being completely unacceptable. How low of an framerate is too low isn't a clear deadline, but the clear deadline isn't part of the definition. You can run any realtime code in an environment where it can't perform.
So what about a webservice that guarantees that every internal state update is communicated to clients via a websocket within an acceptable timeframe? Like the game people they don't have a clearly formalized deadline, but when a websocket update takes a minute that would surely be unacceptable. So the original definition of "a system that reacts in a predictable time" can still hold — just not for the reason that people colloquially think ("I get updates as they happen or, in realtime").
And to be frank, I think when some dev says "realtime updates" we all understand what they meant and that is the point of language right?
We understand despite the muddling of terms, but let's not pretend that precise language isn't desirable to the best extent possible.
I think there's a certain class of small to medium sized application (I'm building several) where liveview can be a good fit, but after writing it professionally and in hobby projects for several years, I'm less convinced that it's a great solution everywhere.
I've been considering LiveView for a real time application I need to make, but haven't been sure whether it's worth the effort of learning Elixir.
In theory, it sounds like this could be the sweet-spot for using the Javascript ecosystem where its required where LiveView fall short.
[0]: https://github.com/woutdp/live_svelte
[1]: https://dockyard.com/blog/2024/03/14/harnessing-liveview-and...
I hear you on the typing part, but Elixir is taking a step in that direction with 1.8 and I only expect that to get better. I'm going to the Elixir Conf in Poland in May and hoping to learn more on what's next. So excited.
Some context: The use case is a digital whiteboard like Miro and the heaviest realtime functionality will be tracking all of the pointers of all the users updating 5x per second. I'm not expecting thousands/millions of users as I'm planning on running each instance of the software on-prem.
Our postgres connector also works on LISTEN/NOTIFY and has horizontally scalable consumers that will share the load.
There's two types of LISTEN/NOTIFY you can use. Either have the notify event carry the payload or have the notify event tell you of a change and then query for the data. If you choose the second option you'll get much better resilience under load, as the back pressure is contained in the table rather than dropped on NOTIFY.
If you do go with a poll-on-change design, you'll likely benefit from some performance tuning around denouncing the polls and how big a batch size of records to poll for.
As for the exact collaboration features, when clients are online and interacting, it's fairly easy. Lots of the hard stuff comes from knowing when a connection is dropped or when a client goes away. Detecting this and updating the other clients can be hard.
Another team at Ably worked on that problem, and called it Spaces.
I would instead try would be to create an event table in the db, then create a publication on that table and have clients subscribe to that publication via logical replication. You would have to increase the number of replication slots though.
What I’ve done for this in the past is to do all subscriptions over a single connection per server, and then have code in the server that receives all of the events and dispatches them internally based on a lookup table.
From the client's perspective, there's not a lot of difference between the server dropping the connection (on redeploy) or the connection being dropped for some other transient reason.
That is to say, with a decent client side handling of connection state you just incrementally rollout your new servers and each server terminates its connections triggering reconnects from the clients.
The hardest part is often maintaining continuity on some stream of events. That is; picking up exactly where you were before the connection dropped. You need some mechanism for the client to report the event it last received, and some way to "rewind" back to that point on the stream.
In most other contexts you'd externalise state to a data store like Redis or RDBMS, and spawn one, kill one or do blue-green in the nebula behind your load balancer constellation.
It follows the model used by Figma where each active document gets its own process on a cluster of machines, and requests are routed to the document server. The advantage of this over something like pubsub is that you have a single authoritative server that can persist the document, instead of having to coordinate multiple sources of truth.
(We also provide a managed version of Plane at https://jamsocket.com)
If you can start new instances quickly and clients can handle short delays you can do it by just stopping the old deployment and starting the new one, booting off of the snapshotted state from the prior deployment.
If you need “instant” you do it by implementing some form of catchup and then fail over.
It is a lot easier to do this if you have a dedicated component that ”does” the failover, rather than having the old and new deployments try to solve it bilaterally. Could just be a script ran by a human, or something like a k8s operator if you do this a lot
Well, don't do that. He by himself advised against it previously https://zknill.io/posts/how-to-adopt-realtime/
And it's really annoying that those web nobs inflate terms. He also talks about hard realtime which has nothing to do with hard realtime. Will his webapps reboot? Ha, of course not. He is talking that reactive webapps becoming simple.
Having a local copy of a database slice, and hide the sync from developer at all.
What I found to work is:
Keep the data you wish multiplayer to operate on atomic. Don't split it out into multiple parallel data blobs that you sometimes want to keep in sync (e.g. if you are doing a multiplayer drawing app that has commenting support, keep comments inline with the drawings, don't add a separate data store). This does increase the size of the blob you have to send to users, but it dramatically decreases complexity. Especially once you inevitably want versioning support.
Start with a simple protocol for updates. This won't be possible for every type of product, but surprisingly often you can do just fine with a JSON patching protocol where each operation patches properties on a giant object which is the atomic data you operate on. There are exceptions to this such as text, where something like CRDTs will help you, but I'd try to avoid the temptation to make your entire data structure a CRDT even though it's theoretically great because this comes with additional complexity and performance cost in practice.
You will inevitably need to deal with getting all clients to agree on the order in which operations are applied. CRDTs solve this perfectly, but again have a high cost. You might actually have an easier time letting a central server increment a number and making sure all clients re-apply all their updates that didn't get assigned the number they expected from the server. Your mileage may vary here.
On that note, just going for a central server instead of trying to go fully distributed is probably the most maintainable way for you to work. This makes it easier to add on things like permissions and honestly most products will end up with a central authority. If you're doing something that is actually local-first, then ignore me.
I found it very useful to deal with large JSON blobs next to a "transaction log", i.e. a list of all operations in the order the server received them (again, I'm assuming a central authority here). Save lines to this log immediately so that if the server crashes you can recover most of the data. This also lets you avoid rebuilding the large JSON blob on the server too often (but clients will need to be able to handle JSON blob + pending updates list on connect, though this follows naturally since other clients may be sending updates while they connect).
The trickiest part is choosing a simple server-side infrastructure. Honestly, if you're not a big company, a single fat server is going to get you very far for a long time. I've asked a lot of people about this, and I've heard many alternatives that are cloud scale, but they have downsides I personally don't like from a product experience perspective (harder to implement features, latency/throughput issues, possibility of data loss, etc.) Durable Objects from Cloudflare do give you the best from both worlds, you get perfect sharding on a per-object (project / whatever unit your users work on) basis.
Anyway, that's my braindump on the subject. The TLDR is: keep it as simple as you can. There are a lot of ways to overcomplicate this. And of course some may claim I am the one overcomplicating things, but I'd love to hear more alternatives that work well at a startup scale.
Always always always follow parent's advice. Pick one canonical owner for the data, and have everyone query it. Build an estimator at each node that can predict what the robot is doing when you don't have timely data (usually just running a shadow copy of the robot's software), but try to never ever do distributed state.
Even something as simple as a map gets arbitrarily complicated when you're sensing multiple locations. Just push everyone's guesses to a central location and periodically batch update and disseminate updates. You'll be much happier.
Sometimes I feel we (fellow HN readers) get caught into overly complex rabbit holes, so it's good to balance it out with some down-to-earth, practical perspectives.
So we built out a library that lets you run Durable Object-like backends on any cloud: https://github.com/rivet-gg/actor-core