That is to say, do focus on systems problems. Key ones I identified are efficient data representation, avoiding needless memory churn/bloat, and talking directly to lower-level software/hardware, like the kernel.
Focus on systems programming and not on syntactic niceties or oddities.
Parse once for syntax and symbol definition, 2nd pass over parsed structure to link symbol references to their definitions. Two uncomplicated passes.
That handles a general code graph - so the language can go anywhere, and never be fundamentally held up by limitations of early syntax/parsing decisions.
For the longest time the syntax was just glorified s-exprs. This made it much easier to focus on the semantic choices and improved iteration times and willingness to experiment since the parser changes were always trivial.
I highly recommend this approach for new PLs.
For Virgil, I started with mostly Java/C syntax, but with "variable: type" instead of "type variable", because it was both easier to parse and was more like standard ML and what you encounter in programming language theory. That syntax was already catching on, so I felt like I was swimming with the stream. I initially made silly changes like array indexing being "array(index)" instead of "array[index]", which turned out to be annoying to just take random code and change all the "[" to "(" and "]" to ")". Also, I had keywords "method" and "field", but eventually decided things looked better as "def" and "var", because they were easier to eyeball and readily understandable to people who write JavaScript (and Scala, as it turns out).
Overall Virgil's syntax is a kind of an average of all the curly braced languages and where it differs at all, it's been to make things more composable and avoid cryptic line-noise-looking things. For example, to allocate an object of class C, one writes "C.new(args)", because that can be understood as "C.new" as a function applied to "(args)"--so one can easily write "C.new" and yes, indeed, that's a first class function. That works with delegates and so on. So I don't regret not exactly matching the "new C()" you'd find in Java or C++.
1) write a pretty printer early on (I had one for the s-expr based syntax and one for the concrete syntax I introduced later) This will allow you to automatically apply the syntax changes on the example code you have written in your PL with only very little programming
2) instead of parser generators use recursive descent + Pratt parsing Pratt parsing is a little bit magical at first but it is easy to develop a working intuition without understanding all the details of the algorithm.
I added a few improvements to help with legibility which made it bearable to program with sexpr.
But arithmetic expressions and (chaining of) unary operators was just too painful, e.g.
(= (. (^ ptr) file) ...)
instead of
ptr^.field = ...
Some possible answers:
> . Unfortunately I also designed a language with top-level execution and nested functions - neither of which I could come up with a good compilation model for if I wanted to preserve the single-pass, no AST, no IR design of the compiler.
- This is the major PITA of C: Is an actual terrible target for transpilers that have aspirations and sophisticated features like Strings, some decent type system, or whatever. So, your option is to target something that is as close to semantics of your language (Basically, any other language that is not C), to the point where your target is MORE sophisticated than your own lang.
- I think Zig (for speed/apparent simplicity) and Rust could be good targets instead of C (I wish Rust were way faster to compile!). Assuming Zig, it will simplify other aspects like cross compiling
- I don't think is totally possible to avoid have a semi-interpreter for the transpiler, where you at most need a prelude with some hand-crafted functions that do the lifting. With this I mean things like `fn int(I:Int):Int'` so your code output is like `plus(int(1), int(2))`. Langs like APL/J use this for great effects and basically side-step the need for a vm/opcode. (see also: Compiling with closures)
I.. hadn't thought of this. I mean I wouldn't transpile to a slow lang like python but choosing Zig or C++ is tempting. Zig maybe not as it's unfinished. But C++ instead of C would make my life easier (for ex. implementing classes).
> prelude with some hand-crafted functions that do the lifting. With this I mean things like `fn int(I:Int):Int'` so your code output is like `plus(int(1), int(2))`
Curious what you mean here. Is it `1+2` -> ast Binop{'+', left, right} -> gen `plus(1,2)` ?? Sorry it's late here and I should sleep..
https://hookrace.net/blog/introduction-to-metaprogramming-in...
Yes, `plus(1,2)`. The problem will become more apparent when you find things like `plus` need some overloading support/macros/generics/etc, so thing like `print` too.
So, eventually you need to think in macros, multiple versions of the same thing, or, if you craft things very carefully, only support things your target support so you can avoid it (but I wonder how much is feasible)
But, but, but…
They share the same two start letters! They were clearly meant to cohabitate!
Zinc on Zig, what a Zing!
(Just a casual at-a-distance zig fan)
It's really unfortunate... reddit used to make me laugh - now it just makes me angry.
The good (or at least barely tolerable) sane politics/econ subs hide in plain sight. (Also, enn cee dee?)
For example, the term "side effects" has half a dozen different meanings in common use. A Haskell programmer wouldn't consider memory allocation to be a side effect. A realtime programmer might consider taking too long might be a side effect, hence tools like realtime sanitizer [0]. Cryptography developers often consider input-dependent timing variance a critical side effect [1]. Embedded developers often consider things like high stack usage to be a meaningful side-effect.
This isn't to say that a systems language needs to support all of these different definitions, just a suggestion that any systems language should be extremely clear about the use cases it's intending to enable and the definitions it uses.
It's true that these are all somewhat related concepts, but I'm pretty sure the term "side effect" is consistently used in the functional sense.
GCC has two attributes for marking functions, "pure" and "const" (not the language const qualifier). C23 introduced the [[reproducible]] and [[unsequenced]] attributes, that are mostly modeled by the GCC extensions, but with some subtle but important differences in their description.
Turns out it's pretty hard to define these concepts if the language is not built around immutability and pure functions from the ground up.
I think he used side effect with a functional programming meaning. A pure function will just take immutable data and produce other immutable data without affecting state. A function which adds 1 to a number has no side effects, while a function that adds 1 to a number and prints to the console has side effects.
But state is open for interpretation. If I write (making up syntax, attempting to be language-agnostic)
fnc foo uses scalar i produces scalar
does
return make scalar(i + 1)
end fnc
One could argue that is not pure, and one would have to write fnc foo takes heap h, uses scalar i produces heap, integer
does
(newHeap, result) := heap.makeScalar(i + 1)
return (newHeap, result)
end fnc
That expresses the notion that this function destroys a heap and returns a new heap that stores an additional scalar (an implementation likely would optimize that to modify the heap that got passed in, but, to some, that’s an implementation detail)> while a function that adds 1 to a number and prints to the console has side effects.
Again, that’s open for interpretation. If the program cannot read what’s on the console, why would that be considered a side effect? That function also heats my apartment and contributes my electricity bill.
Basic thing is: different programmers care about different things. Embedded programmers may care about minute details such as the number of cycles a function takes.
To give a degenerate-but-simple example of that, a low-level systems language striving for "purity" but that also allowed arbitrary memory reading for whatever reason ("deep low-level custom stack trace functionality") would technically be able to witness the effects of the stack changing due to function calls. You can just define that away as an effect (and honestly would probably have to), but I'd suggest being clear about it.
A denegenerate-but-often-relevant example is that a "pure" function in a language that considers memory allocation "pure" can crash the entire OS process by running it out of memory. That's so impure that not only can the execution context (thread, async context, whatever) that is running the program out of memory witness it, so can every other execution context in the process, indeed, whether they want to or not they have to! We generally consider memory allocation "pure" for pragmatic reasons, because we really have no choice, the alternative is to create such a restrictive definition of "pure" as to be effectively useless, but that is almost the largest possible "effect" we're glossing over!
[1]: https://jerf.org/iri/post/2025/fp_lessons_purity/#purity-is-...
> Reasonable C interop, and probably, initial compilation to C.
How do you achieve "reasonable C interop" without pointers, I wonder?
So if you store a memory address in the integer variable X, you just need a way to access that memory.
In assembly languages, usually, you have no pointers.
They could be, but it's much worse from a performance perspective if you just have these raw machine addresses rather than the pointers in the C language so actual C compilers haven't done that for many years. ISO/IEC TS 6010 describes the best current attempt to come up with coherent semantics for these pointers, or here's a Rustier perspective https://www.ralfj.de/blog/2020/12/14/provenance.html [today Rust specifically says its pointers have provenance and what that means, like that TS for the C language]
Now, if you read Ralf's post and want to argue about that I'm afraid there are already lots of HN discussions and your point has probably already been made so: https://news.ycombinator.com/item?id=25419740 or https://news.ycombinator.com/item?id=42878450
Note that float and double are a bit particular because they can use different registers! But yeah, when stored in memory they are the same 32/63 bit integers.
int ptr
I like that everything starts with a keyword, it makes the language feel consistant and I assume the parser is simple to understand because of it.
I like that you distinguish a procedure from a function in regards to side-effects and that you support both mutable and immutable types.
I like that you dont have to put a semicolon after each line but instead just use newline.
I like that you don't need parenthesis for ifs and whiles, however I am not sure I like the while syntax. Maybe I need to try it a bit before I can make up my mind.
On the other hand I think the type system could be expanded to support more types of different sizes. Especially if you are going for a systems programming language you want to be able to control that.
I think you could have a nil type because it is handy but it would be good if the language somehow made sure that any variable that could potentially be nil has to be explicitly checked before use.
You mean, "worse". There is a reason why e.g. Pascal only had this misfeature in its very first version and gave up on it in the Revised Report, at which point having both functions and procedures arguably became an unnecessary distinction without difference.
> And there's always the issue of what to do about side-effects on module load.
You execute them. Just be sure to only run them once, and maintain proper traversal order (that being post-order): e.g. if your main program has "import A; import B", and A has "import B, import C", and B has "import D", you first run D's init, then B's, then C's, then main's.
As a language user, don't rely on order for your side effects. Actually, just don't have side effects on module load. You almost never need it. Lazily initialize your globals.
Textual order. At least it's visible.
> Lazily initialize your globals.
What does that even mean? Something like that:
size_t _fwrite__impl(const void* buffer, size_t size, size_t count, FILE* stream) {
if (!__crt0_initialized) {
_crt0();
}
if (!__crt1_initialized) {
_crt1();
}
if (!__xfloat_initialized) {
_xfloat();
}
if (__stdio_initialized) {
_init_stdio();
}
// Actual implementation that touches internal FILE-tables and maybe does float/double formatting.
}
? But why?> Actually, just don't have side effects on module load. You almost never need it.
See above. There is a surprising amount of invisibly initialized global state in e.g. C runtime library.
Is "textual order" breadth-first, depth-first, reverse breadth-first, or reverse depth-first? Whichever you pick there will be a case where some module can't initialize because of assumptions it makes about how other modules are initialized. And like I said, it totally breaks down for cyclical dependencies - which are so common in practice, you must consider it.
> ? But why?
To paraphrase what Rust does, "no life before main." The point is to force expressions to be evaluated as they're used instead of as they're declared. There's an additional benefit that global resources that are not used are not initialized, which in the cases above has global side effects.
The glibc runtime is not something to be held up as a model for something particularly well designed. You can get all the benefits of hidden global initialization via laziness without all the problems placed on the programmer to care about their import declaration order, or undefined cases like cyclical imports.
One place where this stuff really sucks is when using dynamic linking and shared libraries have constructor functions that modify global state. GCC had to rollback changes to --ffast-math a couple years ago because loading two libraries compiled with different flags could result in undefined behavior when the MXCSR register depended on the order of library initialization.
As for the cyclical dependencies I'd argue they should be disallowed. Either their initialization order doesn't actually matter — in which case it doesn't matter :) — or there is a way to break things into smaller pieces and reorder them to function properly — in which case it's what should be done — or there is no valid ordering at all, in which case it's a genuine bug which has been made possible only because cyclical dependencies were allowed.
> There's an additional benefit that global resources that are not used are not initialized
This, arguably, can be considered a downside. Consider the security implications (and introduced mitigations) of e.g. writeable GOT/PLT. But it's a design decision with both of the choices valid, just with different trade-offs.
> You can get all the benefits of hidden global initialization via laziness without all the problems placed on the programmer to care about their import declaration order
I'd be interested to read about that. To me, this sounds mostly the problem of not accurately specifying your actual dependencies.
If you haven't looked into Zig's 'comptime' system, you might find some relevant inspiration there.
What You See is What Is Executed
easy back and forth to assembly + inline assembly.
I used Ruby in the Web 2 days in a few dotcom projects.
What decade am I in? This is not optional any more. Hard pass.
I don't have to be worried that a 3rd party library without dependency begins to have 30 transitive dependencies which now can conflict with other diamond dependencies.
I need my dependency tree to be small to avoid every single factor of friction.
Language specific package manager is exactly what encourage the exponential explosion of packages leading to dependency hell (and lead to major security concerns).
Sounds like you're biased.
https://archlinux.org/packages/extra/x86_64/gnome-shell/
gnome-shell > accountsservice > shadow > pam > systemd-libs > xz > bash > readline > ncurses > gcc-libs > glibc
and I didn't even try finding the longest chain...
You'll see cases like NPM and to a lesser degree Cargo where projects have hefty dependency graphs because it is so easy to just pull in one more dependency, but on the other side you have C++ that has conan and vcpkg but the opinions on them are so mixed people rely on other methods like cmake fetch package instead.
I appreciate having tools that let me pull in what I need when I need it, but the dependency explosion is real and I dunno how to have one without the other.
Another factor is that updating C++ compilers/stdlib tends to break older libraries; I'm not sure if this is any less the case in Rust (unclear? I mostly get trouble with C dependencies) or Python (old Numpy does not supply wheels for newer Python, and ruamel.yaml has some errors on newer Python: https://sourceforge.net/p/ruamel-yaml/tickets/476/).
It's incredible how much stuff some C projects vendor
The existence of a package manager causes a social problem within the language community of excessive transitive dependencies. It makes it difficult to trust libraries and encourages bad habits.
Much like Rust has memory safety benefits as a result of some choices that make it difficult to work with in some context, lack of a package manager can have benefits that make it difficult to work with in certain contexts.
These are all just tradeoffs and I'm glad "no package manager" languages are still being created because I personally enjoy using them more.
Sure, the language maintainers will need to provide some kind of api which can be called by the more general purpose tool, but why not have it be a first class citizen instead of some kind of foo2nix adapter maintained by a totally separate group of people?
There's no need to have a cozy CLI and a bespoke lockfile format and a dedicated package server when I'll be using other tools to handle those things in a non-language-specific way anyhow.
I like this language, it shares my aesthetics.
(a) nobody uses the language, so a package manager doesn't matter OR
(b) people use the language, they will want to share packages, then a package manager will be bolted on (or many will, see python)
Seems like first-class package manager support (a la Rust) makes the most sense to me.
This is also why systems people will typically push back if you ask for non-official repos added to apt sources, etc.