BioHacker News | From Languages to Language Sets

▲From Languages to Language Sets(gist.github.com)

64 points by whatever3 56 days ago | 17 comments

▲andyferris 53 days ago

One thing I'll note is we tend to use languages from different levels in different settings (front end, back end, systems) and we spend an awful lot of time writing glue code to get them to talk to each other.

A major advantage of the proposed approach is automated FFI and serialization/deserialization between languages in the same language set. RustScript would be able to accept a struct or enum from Rust or RustGC, and vice-versa. You could have a channel with different languages on either end.

You can also see that we _want_ something like this, e.g. we bolt TypeScript on top of JavaScript, and types onto Python. If JavaScript (or python) were designed so they could be more easily compiled (notably, no monkey patching) then they would support level 2 as well.

I have been thinking of level 2 or 1 languages that support the higher levels. This is a really good framing. (The problem with going the other way is the implementation decisions in the interpretter often constrain how the compiler can work, e.g. CPython is dominant because all the libraries that make use of the CPython FFI, and similarly for NodeJS. It is easier to interpret a constrained language than to compile a dynamic language designed with an interpretter in mind).

▲sitkack 53 days ago

Back when I did some high perf Python, I’d define my data at C structs and bump allocate those structs in a list using the cffi.

It is not unlike defining your data model for SQL so that you can have sane data access.

▲zahlman 52 days ago

>bump allocate

The term isn't familiar to me, and when I try to look it up I get almost exclusively Rust-related results. I guess you mean https://en.wikipedia.org/wiki/Region-based_memory_management , which I grew up calling "pool allocation".

▲sitkack 52 days ago

I am using it a euphemism for reserving address space and then committing pages as your list grows. This isn't preallocating memory, just address space.

But pool allocation works as well. The idea is to go from a chonky representation to a portable compact representation of a C-struct but still have the Python accessors to this data look idiomatic.

▲zahlman 52 days ago

In what sense is reserving address space not "preallocating memory"? It might not be allocated to an object, but it certainly is allocated to your process... yes?

▲Wumpnot 52 days ago

No address space does not mean you have memory, you can reserve gigabytes of address space and your process will still show nothing allocated until you commit it.

▲zahlman 51 days ago

Ah, of course. Never mind me; when I learned low-level concepts, hardware was much simpler (and so were operating systems).

▲eterps 53 days ago

This a 100%. It's madness that languages are effectively siloed from each other.

▲notarobot123 53 days ago

I think Peter Naur's description of levels of computation is a better one for considering an actual layering of levels of abstraction:

> Each level is associated with a certain set of operations and with a programming language that allows us to write or otherwise express programs that call these operations into action. In any particular use of the computer, programs from all levels are executed simultaneously. In fact, the levels support each other. In order to execute one operation of a given level, several operations at the next lower level will normally have to execute. Each of these operations will in their turn call several operations at the still lower level into execution.

The old term "problem-oriented languages" seems to still be quite useful. Programming languages are always focused on allowing the programmer to solve a set of problems and their features hide irrelevant details.

These language sets seem like a helpful grouping of features that suit particular problem domains but I don't think it works as a taxonomy of levels of abstraction.

▲layer8 52 days ago

There is something to be said for the three languages (“levels”) to actually look sufficiently different from each other, so that when looking at some code it’s immediately clear which one it’s in. Making them too similar increases the likelihood of mistaking which one you’re in, and applying the mindset of one to the other.

Another reason to do that is that the different levels are amenable to different affordances, and have different trade-offs in their design. For example, at level 4 you may want to go for a more BASIC-like syntax, without semicolons, and commands without argument-list parentheses.

▲teaearlgraycold 53 days ago

I’m a strong supporter of adding an automatic GC to Rust. Although it seems difficult to justify as RustGC code wouldn’t be trivial to convert to traditional Rust. But going in the opposite direction should be trivial.

▲munificent 53 days ago

> One language could combine the 2nd and 3rd level though. A language that can be interpreted during development for fast iteration cycle, but compiled for better performance for deployment. There isn’t such a language popular today though.

I'm not sure if Dart counts as "popular", but it otherwise fits this bill. It has a JIT and can startup pretty quickly and interpret on the fly. You can also hot reload code changes while a program is running. And it can ahead-of-time compile to efficient machine code when you're ready to ship.

▲layer8 52 days ago

Many interpreted languages use an intermediate representation and/or JIT compilation internally, like for example Python with its .pyc files. And Java as a level-2 language only compiles to bytecode (class files) which by default is then interpreted, and typically only JIT-compiled for “hot” code. The distinction between levels 3 and 2 is more about how the application is distributed for execution, in source-code form vs. in some compiled binary form.

▲munificent 51 days ago

> Many interpreted languages use an intermediate representation and/or JIT compilation internally, like for example Python with its .pyc files.

Yes, but Python, Ruby, Lua, etc. are also all dynamically typed, which places them in level 4.

> And Java as a level-2 language only compiles to bytecode (class files) which by default is then interpreted, and typically only JIT-compiled for “hot” code.

Yes, but Java is generally only run in the JVM and is not often compiled ahead-of-time to a static executable. There are AOT compilers for Java, but the performance isn't great. Java was designed to run in a VM. Classloaders, static initializers, reflection, every-method-is-virtual all make it quite difficult to compile Java to a static executable and get decent performance.

Dart was designed to be a decent AOT target.

▲tmtvl 52 days ago

A language which can be either interpreted or compiled doesn't exist? Nobody tell the Common Lispers, they'd vanish in a puff of logic.

▲emidln 53 days ago

This is one of the reasons I like Clojure. There are very useful dialects with broad overlap between:

Browser / JavaScript environments -> ClojureScript

General Purpose (JVM) -> Clojure

Fast Scripting -> Babashka (although I've used ClojureScript for this in the past)

C/C++ Interop (LLVM-based) -> Jank (new, but progressing rapidly and already useful)

I can largely write the same expressive code in each environment, playing to the platform strengths as needed. I can combine these languages inside the same project, and have libraries that have unified APIs across implementation. I can generally print and read EDN across implementations, provided I register the right tag handlers for custom types (this is one area jank still has to catch up). Reader conditionals allow implementation-specific code as needed.

I'm really excited about Jank giving me a good alternative to JNI/JNA/Panama when I need my Clojure to touch OS parts the JVM hasn't wrapped.

▲rickcarlino 53 days ago

This is a better taxonomy of what a language is rather than the dated concept of “High-level” vs. “Low-level”.

▲flufluflufluffy 52 days ago

Level 4 and level 3 have equal performance. The benefit is in the developer experience. But also just use what you like man. Or what fits the task at hand. Most of the “problems” this and other articles like it bring up are so overblown and exaggerated.

▲kerkeslager 53 days ago

> Now let’s address level 4. Big players sit at this level, perhaps the most popular languages by headcount of their programmers. The problem with a lack of static typing is that it’s hard to work on such code in groups and at scale. Every successful business started with those languages eventually rewrites their codebase to use one of the “lower level” languages because big codebases written by many people are hard to maintain and modify without the support of a static type-checker. They are still great languages for solo, small projects, especially if the code can be easily automatically tested.

This is total made up nonsense. I've worked in Python for over a decade, and at multiple successful companies that have been running quarter-million plus line Python codebases for 8+ years.

Proponents of static typing like to sound alarms that it's impossible to scale dynamic codebases when they lack the experience in those languages to know people solve scaling problems in those languages.

I'm not hating on static languages, but I think they involve more tradeoffs than proponents of static typing admit. Time spent compiling is pretty costly, and a lot of codebases go to great lengths to somewhat bypass the type system with dependency injection, which results in much more confusing codebases than dynamic types ever did.

Meanwhile, many of the worlds largest and longest-maintained codebases are written in C, which is only half-assed type checked at any point, and is much harder to maintain than dynamic languages. The idea that projects reach some point of unweildiness where every one of them gets rewritten is just not correct.

I might have gone a bit easier on this if the author hadn't said "Every successful business..."--the word "every" really is just way too far.

EDIT: I'll also note that just because a language isn't statically typed, doesn't mean it gains no benefit from type checking. JavaScript and Python are not created equal here: JavaScript will happily let you add NaN and undefined, only to cause an error in a completely unrelated-seeming area of the codebase, whereas Python generally will type check you and catch errors pretty close to where the bug is.

▲svilen_dobrev 52 days ago

i completely agree. i have lead a team of 3 (three, incl. me) which has made and run completely "untyped" 90KLoc python + 90KLoc javascript for years (well, about 25% was generated, from another 1%. Meta programming, yes). And a few other similar codebase magnitudes and team-counts, before.

IMO rewriting happens mostly because going to lower-level of the needed proficiency/understanding - once the product why-what-how is more-or-less discovered, its code can be commoditized and hardened, kind-a. Dynamic stuff is very powerful == becomes too powerful (and "magic").

There is quite some wishful thinking around what so-called static-typing (which is actually static type-hinting) in dynamic languages, hoping and believing that declaring something Float, guarantees it being Float at runtime.. which is nonsense. In Ada, and few other runtime type-and-range-etc-checking languages - yes. But in plain ones.. nope. C++, Java, whatever - noone checks things at runtime. Most of those do not have a way to know what some n-bytes represent, hoping it matches the expected layout (i.e. type). While, say, python very well knows what each and every object is. If asked.

Of course, one can build such real-runtime checkers, and apply them where/when it is needed and makes sense - instead of blanket policy everywhere, but noone bothers. (Funny thing is, when i made one such library 15y ago, i was spit at and told to go code in java. And, even i haven't since then stumbled on pressing need to use it myself. Having 10 asserts (or constraint checks) in some 10kLoc does not justify whole library)

That said, i think something like language-verticals might be useful. And/Or gradual hardening, on piece-meal basis. At least a standartized way for going up/down or stricter/relaxed, from wherever one is.

▲Snacklive 52 days ago

> But in plain ones.. nope. C++, Java, whatever - noone checks things at runtime. Most of those do not have a way to know what some n-bytes represent

But then my question is, what are you doing that you need to manually check for types ? I mean i get it at some point, usually at the time of user input usually you need to run checks to actually convert some input into a valid type. But once it is inside your program you don't need to check anymore because well... static typing and all that, you should know what you have in every step just at a glance

▲svilen_dobrev 51 days ago

> once it is inside your program

Well, if it is in my program - or codebase of (design+code but mostly) lining-and-discipline by me - i don't need them type-checks. They give somewhat fake sense of correctness, kind of over-confidence. Maybe useful here or there, but like 1% (Given any inputs from outside are gated with checks/validations/conversions). One rarely sends by mistake a single float where a Point3D is needed. But move(x,y) and move(y,x) are wildly different things and no typing would catch that (except excessive&brittle stuff like Xcoord/Ycoord as types). Needs explicit naming (not typing), like move(x=a, y=b) or smth.moveX(a).moveY(b) or whatever.

▲ 53 days ago

▲gabrielsroka 53 days ago

2022

https://hn.algolia.com/?query=From%20Languages%20to%20Langua...

▲Peteragain 52 days ago

A nice summary of the issues is my top-level comment. So Java, Vs JavaScript is in here somewhere. My take on it is that java keeps getting extended to look more like JavaScript, and inevitably becomes slower.

▲jimbokun 53 days ago

Where would Haskell go?

Erlang and Elixir?

▲ 53 days ago

▲azhenley 53 days ago

Haskell is listed.

▲siknad 53 days ago

Perhaps together with Agda (compiles to Haskell, has FFI to it, is more higher-level), some not-pure ML, and maybe Rust or ATS?

▲jasonthorsness 53 days ago

Especially with LLMs to assist we don't gain much anymore from making everything one syntax, one language, etc. Projects like Dotnet Blazor/ASP.NET or Python Streamlit/Dash IMO are forced and are more trouble than they are worth. The OP suggestion, where everything is Rust, has the same problem; it's too forced.

We should embrace the domain-specific niceties; there are room for lots of languages, they can iterate more quickly, try new things, and specialize syntax to the domain.

▲chubot 52 days ago

I generally agree with the 4 tier categorization:

    1. C/C++/Rust
    2. Java/Go/OCaml
    3. MyPy, TypeScript
    4. Python, JavaScript

But I'd also add 2 or 3 more tiers:

    5. String-ish languages without GC - Shell, Awk, Make, CMake  [1]
    6. Configuration Languages - YAML / TOML - declaring data structures [2]
    7. Data Notations - JSON, HTML, CSV - Objects, Documents, Tables [3]

The goal of YSH is actually to *unify tiers 4, 5, 6, and 7* under one language. The slogan I've been using is "minimal YSH is shell+Python+JSON+YAML"

Instead of having Unix sludge (autotools - m4 generating make) and Cloud sludge (Helm - Go templates generating YAML), you have one language

    - YSH is the code dialect -- it is a shell with real data types like Python
      - and with reflection like Ruby/Python, not generating text

    - Hay (Hay Ain't YAML) is the data dialect
      - and we have built-in JSON, etc.

This is a hard design challenge, but I just made a release with an overhaul of Hay - https://oils.pub/release/0.28.0/

Hay version 1 was hard-coded in the interpreter - https://oils.pub/release/0.28.0/doc/hay.html

But we realized it's actually better to self-host it in YSH, using YSH reflection. We will be testing this by rewriting Hay in YSH

---

So that's our language design response to https://news.ycombinator.com/item?id=43386115

> It's madness that languages are effectively siloed from each other.

Instead of tiers 4, 5, 6 being silo'd, we have them all under YSH and the Oils runtime (which is tiny, 2.3 MB of pure native code).

(As a bonus, OSH also runs on the Oils runtime, and it's the most bash-compatible shell!)

----

[1] Garbage Collection Makes YSH Different - https://www.oilshell.org/blog/2024/09/gc.html

Shell, Awk, and Make Should Be Combined - https://www.oilshell.org/blog/2016/11/13.html - all these languages lack GC!

[2] Survey of Config Languages - https://github.com/oils-for-unix/oils/wiki/Survey-of-Config-... - divides this category into 5 tiers:

    1. Languages for String Data
    2. Languages for Typed Data
    3. Programmable String-ish Languages
    4. Programmable Typed Data
    5. Internal DSLs in General Purpose Languages

[3] Zest: Notation and Representation addresses this - https://www.scattered-thoughts.net/writing/notation-and-repr...

YSH also has a common subset with J8 Notation (which is a superset of JSON)

▲HumanOstrich 53 days ago

TLDR; Let's rewrite everything in 3 languages: Rust, RustGC, and RustScript!

Ugh.

▲froh 53 days ago

That's missing out on the nice idea of four (or five) language "levels"

> 4: Interpreted, dynamically typed: JavaScript, Python, PHP

> 3: Interpreted, statically typed: Hack, Flow, TypeScript, mypy

> 2: Compiled with automatic memory management (statically typed): Go, Java (Kotlin), C#, Haskell, Objective-C, Swift

> 1: Compiled with manual memory management (statically typed): Rust, C, C++

> There is a 0th level, assembly, but it’s not a practical choice for most programmers today.

and it's also missing out on the generic hypothesis that a language is needed between levels 2 and 3, which is interpreted for fast turn-around times but also compilable for fast run time.

▲vips7L 52 days ago

Hot reload has existed for decades. I don’t see how that is missing.

I really think the person your replying to really is just showing dissatisfaction that once again someone is trying to force Rust where it doesn’t belong. It’s exhausting.

▲sharlos201068 50 days ago

That just seems like an evolution of what many of the level 2 languages can do without compromising on their primary definition of being statically typed and interpreted.