The costs of the i386 to x86-64 upgrade(blogsystem5.substack.com)
77 points by thunderbong 10 hours ago | 13 comments
armada651 1 hour ago
> We now know that LP64 was the preferred choice and that it became the default programming model for 64-bit operating systems

That is incorrect, Windows never adopted the LP64 model. Only pointers were increased to 64-bit whereas long remained 32-bit. The long datatype should be avoided in cross-platform code.

jagrsw 29 minutes ago
All C's native datatypes should be avoided for cross-platform data structures (networking, databases, file storage) because the standard only guarantees minimum sizes. Additional problem is the endianess.

uint64_t is a bit verbose, many re-def this to u64.

coldpie 21 minutes ago
I think I agree, but I'd be interested in more discussion about this.

I always understood the native types to be the "probably most efficient" choice, for when you don't actually care about the width. For example, you'd choose int for a loop index variable which is unlikely to hit width constraints because it's the "probably most efficient" choice. If you're forced to choose a width, you might choose a width that is less efficient for the architecture.

Is that understanding correct? Historically or currently?

Either way, I think I now agree that unspecified widths are an anti-feature. There's value in having explicitly specified limits on loop index variables. When you write "for(int32_t i; ...)", it causes you to think a bit, "hey can this overflow?" And now your overflow analysis will be true for all arches, because you thought about the width that is actually in use (32-bits, in this case). It keeps program behavior consistent & easier to reason over, for all arches.

That's my thinking, but I'd be interested to hear other perspectives.

kibwen 3 minutes ago
> I always understood the native types to be the "probably most efficient" choice, for when you don't actually care about the width.

This itself is a platform-specific property, and is thus non-portable (not in the sense that your code won't run, but in the sense that it might be worse for performance than just using a known small integer when you can).

2ndbigbang 10 minutes ago
There is int_fast32_t and int_least32_t but it is probably less confusing to just use exact sized types (and would make porting to other architectures simpler).
blueflow 5 hours ago
> To support those needs, there were clutches like Intel’s PAE, which allowed manipulating up to 64GB of RAM on 32-bit machines without changing the programming model, but they were just that: hacks.

You can go look up how the 32-bit protected mode got hacked on top of the 16-big segmented virtual memory that the 286 introduced. The Global Descriptor Table is still with us on 64-bit long mode.

So, its not PAE that is particularly hacky, its a more broader thing with x86.

lmm 1 minute ago
That other hacks exist does not make a given hack any less hacky.
gregw2 1 hour ago
If I recall correctly, UNIX vendors in the late 90s were debating a fair bit internally and amongst each other whether to use LP64 or ILP64 or LLP64 (where long longs and pointers were 64bit).

ooh, found a link to a UNIX Open Group white paper on that discussion and reasoning why LP64 should be/was chosen:

https://unix.org/version2/whatsnew/lp64_wp.html

And per Raymond Chen, why Windows picked LLP64: https://devblogs.microsoft.com/oldnewthing/20050131-00/?p=36... and https://web.archive.org/web/20060618233104/http://msdn.micro...

For some history of why ILP32 was picked for 1970s 16 to 32 bit transition of C + Unix System V (Windows 3.1, Mac OS were LP32) see John Mashey's 2006 ACM piece, partcularly the section "Early Days" sechttps://queue.acm.org/detail.cfm?id=1165766

No peanut gallery comments from OS/400 guys about 128-bit pointers/object handles/single store address space in the mid-1990s please! That's not the same thing and you know it! (j/k. i'll stop now)

tzot 5 hours ago
x32 ABI support exists at least in the kernel of Debian (and Debian based) distributions, and I know because I've built Python versions (along with everything else needed for specific workloads) as x32 executables. The speed difference was minor but existing, but the memory usage was quite a lot decreased. I've worked with a similar ABI known as n32 (there was o32 for old 32, n32 for new 32 and n64 for fully 64-bit programs) on SGI systems with 64-bit capable MIPS CPUs; it made a difference there too.

Unfortunately I've read articles where quite-more-respected-than-me people said in a nutshell “no, x32 does not make a difference”, which is contrary to my experience, but I could only provide numbers where the reply was “that's your numbers in your case, not mine”.

Amazon Linux kernel did not support x32 calls the last time I tried, so you can't provide images for more compact lambdas.

gregw2 32 minutes ago
For the curious, "x32" Linux is a "L64P32" programming model. There is some lwn.net 2012 commentary on performance implications at: https://lwn.net/Articles/503541/
zh3 2 hours ago
Indeed, for many years we've been running multiple systems with X86_64 kernels and a 32-bit userspace, running many standard 32-bit applications (including browsers); only thing we've ever needed to do is run 'linux32' before starting X so that 'uname' reports i686 rather than x86_64.
martijnvds 1 hour ago
The X32 ABI is not the same as the 32-bit mode used to run "i686" binaries on x86_64 (that would be the i386 ABI).
chasil 32 minutes ago
Solaris famously compiles everything in (/usr)/bin as 32-bit.

Alas, my SmartOS test system is gone, or I would show you.

quesera 4 minutes ago
It looks like there's some variability:

  smartos$ uname -a
  SunOS smartos 5.11 joyent_20240701T205528Z i86pc i386 i86pc Solaris
Core system stuff:

  smartos$ file /usr/bin/ls
  /usr/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (Solaris), dynamically linked, interpreter /usr/lib/ld.so.1, not stripped

  smartos$ file /bin/sh
  /bin/sh: symbolic link to ksh93

  smartos$ file /bin/ksh93
  /bin/ksh93: ELF 64-bit LSB executable, x86-64, version 1 (Solaris), dynamically linked, interpreter /usr/lib/amd64/ld.so.1, not stripped
And then the pkgsrc stuff:

  smartos$ which ls
  /opt/local/bin/ls

  smartos$ file /opt/local/bin/ls
  /opt/local/bin/ls: symbolic link to /opt/local/bin/gls

  smartos$ file /opt/local/bin/gls
  /opt/local/bin/gls: ELF 64-bit LSB executable, x86-64, version 1 (Solaris), dynamically linked, interpreter /usr/lib/amd64/ld.so.1, not stripped
zokier 5 hours ago
I find it weird that the convention to use char/short/int/long/long long has persisted so widely to this day. I would have thought that already back in the 16 -> 32 bit transition period people would have standardized and moved to stdint.h style types instead (i.e. int32_t etc).

Sure, that doesn't change pointer sizes, but it would have reduced the impact of the different 64-bit data models, like Unix LP64 vs Windows LLP64

jraph 3 hours ago
I see two good reasons:

(1) DX: typing "int" feels more natural and less clunky than choosing some arbitrary size.

(2) Perf: if you don't care about the size, you might as well use the native size, which is supposed to be faster.

In Java, people do use the hardware-independent 4 byte ints and 8 byte longs. I guess (1) matters more, or that people think that the JVM will figure out the perf issue and that it'll be possible to micro-optimize if a profile pointed out an issue.

chipdart 5 hours ago
> I find it weird that the convention to use char/short/int/long/long long has persisted so widely to this day.

I don't think this is a reasonable take. Beyond ABI requirements and how developers use int over short, there are indeed requirements where the size of an integer value matters a lot, specially as this has a direct impact on data size and vectorization. To frame your analysis, I would recommend you took a peek at the not-so-recent push for hardware support for IEEE754 half-precision float/float16 types.

zokier 4 hours ago
The cases where you want platform-specific integer width (that is not something like size_t/uintptr_t) is extremely niche compared to cases where you want integer to have specific width.

I don't see the relation to fp16; I don't think anyone is pushing for `float` to refer to fp16 (or fp64 for that matter) anywhere. `long double` is already bad enough.

chipdart 1 hour ago
> The cases where you want platform-specific integer width (that is not something like size_t/uintptr_t) is extremely niche (...)

I think you got it backwards. There are platform-specific ints because different processors have different word sizes. Programing languages then adjust their definitions for these word sizes because they are handled naturally by specific processors.

So differences in word sizes exist between processors. Either programming languages support them, or they don't. Also, there is also specific needs to handle specific int sizes regardless of cpu architecture. Either programming languages support them, or don't.

And you end with "platform-specific integer widths" because programming languages do the right thing and support them.

manwe150 2 hours ago
My recollection of history is that the standardization of stdint.h happened long after the transition to 32 bit, and is only just finishing up becoming available on some major compilers after the transition to 64 bit is well behind us
jcranmer 57 minutes ago
stdint.h was introduced in C99, and MSVC didn't introduce it until 2010.
gnabgib 10 hours ago
Small discussion already (16 points, 10 hours ago, 7 comments) https://news.ycombinator.com/item?id=41768144
crest 2 hours ago
What the FreeBSD ISO size comparison overlooks is that to provide 32 bit application compatibility FreeBSD/amd64 will include a i386 copy of all libraries (unless deselected).
ChrisArchitect 1 hour ago
sjsdaiuasgdia 3 hours ago
I'm overly annoyed by the AI generated image of CPUs, with one showing a horrible mottled mess where the nice clean grid of contact pads should be.

There's endless actual pictures of processors from both eras. Using actual images here would have been as fast, possibly faster, than the process of writing a prompt and generating this image.

chefandy 8 minutes ago
And that's why laymen with prompts will never replace trained artists/designers for anything but trivial use cases— the ability to decide a) how best to visually communicate the right thing to the right audience, b) whether generative AI is the best choice to achieve that and the capability to use more precise tools if not, and c) that what you have is cruddy enough, or the message is superfluous enough, that not using an image is more effective. This image fails and this is a trivial use case. While it's depressing to see so many lower-end commercial artists lose their livelihood and their depressing wages on the market upstream, I can't help but have a little schadenfreude seeing things like this after so many tech people with dunning-krueger-informed confidence about visual communication have gleefully called me a buggy whip manufacturer.
Retr0id 54 minutes ago
The garbled pads were straight up trypophobia-inducing, genuinely stopped me in my tracks
chrsw 2 hours ago
I was going to comment on this but then decided not to because I thought I was being too petty. But I'm glad to see other people agree. It's disturbing.

I see someone else commented that it's probably due to copywrite/licensing. I agree there too. That's a shame. So, because of usage policies we end up with AI generated pictures that aren't real, aren't accurate and usually off-putting in some way. Great.

sph 2 hours ago
Not only that, they had to manually edit the generated image to add the i386 and x86-64 labels on them.

When all you have is a hammer...

Filligree 2 hours ago
First though, you would have needed to find a picture you can be sure isn’t in copyright. Or which is licensed appropriately.
Retr0id 9 minutes ago
I'm sure the same diligence was also performed when constructing the AI's training data set.
sjsdaiuasgdia 2 hours ago
images.google.com -> search for "386 cpu"

Click "tools" then "usage rights", pick "creative commons", pick an image.

Now search for "core cpu" and pick a second image.

Yeah that sure was hard and time consuming!

Kye 1 hour ago
Even easier:

Step 1: go to Wikimedia Commons

https://commons.wikimedia.org/wiki/Category:Intel_i386

2 hours ago
renox 4 hours ago
the RISC example in the article is a bit weird: on one hand, it may take even more instructions to load an address in a RISC, on the other hand all the RISC I know have an 'addi' instruction, no need to do 'li r2, 1; add r3, r1, r2' to add 1 to a register!
faragon 1 hour ago
Using indexes instead of pointers in data structures works well, and the cost of the base address + offset is negligible, as similar address calculation is already generated by the compiler when accessing into an element of a data structure. In addition to that, mention that indexes can be used as offsets, or as an actual indexes of the size of an individual element, i.e. in that case non-trivial data structures with e.g. >= 32-byte elements could address hundreds of gigabytes of RAM.

A practical use could be e.g. using bit fields can be convenient, e.g. having 32-bit indexes, with the higher bit for the color in a Red-black tree. And in case of requiring dynamic-sized items in the tree nodes, these could be in different 32-bit addressable memory pools.

jauntywundrkind 3 hours ago
Talking about the ISA needing to spend so much time addressing memory, I'm reminded of the really interesting Streaming Semantic Registers (SSRs) in Occamy, the excellent PULP group's 4096-core RISC-V research multichip design. https://arxiv.org/abs/1911.08356

Just like the instruction pointer which implicitly increments as code executes, there are some dedicated data-pointer registers. There's a dedicated ALU for advancing/incrementing, so you can have interesting access patterns for your data.

Rather than loops needing to load data, compute, store data, and loop, you can just compute and loop. The SSRs give the cores a DSP like level of performance. So so so neat. Ship it!

(Also, what was the name of the x86 architecture some linux distros were shipp8ng with 32 bit instructions & address space, but using the new x86-64 registers?)