Uninitialized garbage on ia64 can be deadly (2004)

andikleen2 • 2 months ago

Early x86-64 Linux had a similar problem. The x86-64 ABI uses registers for the first 6 arguments. To support variable number of arguments (like printf) requires passing the number of arguments in an extra register (RAX), so that the callee can save the registers to memory for va_arg() and friends. Doing this for every call is too expensive, so it's only done when the prototype is marked as stdarg.

Now the initial gcc implemented this saving to memory with a kind of duffs device, with a computed jump into a block of register saving instructions to only save the needed registers. There was no boundary check, so if the no argument register (RAX) was not initialized correctly it would jump randomly based on the junk, and cause very confusing bug reports.

This bit quite some software which didn't use correct prototypes, calling stdarg functions without indicating that in the prototype. On 32bit code which didn't use register arguments this wasn't a problem.

Later compiler versions switched to saving all registers unconditionally.

veltas • 2 months ago

In the SysV ABI for AMD64 the AL register is used to pass an upper bound on the number of vector registers used, is this related to what you're talking about?

Joker_vD • 2 months ago

Raymond Chen has a whole "Introduction to IA-64" series of posts on his blog, by the way. It's such an unconventional ISA that I am baffled that Intel seriously thought they would've been able to persuade anyone to switch to it from x86: it's very poorly suited for general-purpose computations. Number crunching, sure, but anything more freeform, and you stare at the specs and wonder how the hell the designers supposed this thing to be programmed and used.

pjmlp • 2 months ago

Itanium only failed because AMD for various reasons was able to come up with AMD64 and rug pull Intel's efforts.

In an alternative universe without AMD64, Intel would have kept pushing Itanium while sorting out its issues, HP-UX was on it, and Windows XP as well.

pjc50 • 2 months ago

Other way round: the only way any company other than Intel was able to get a new instruction set launched into the PC space was because Intel face-planted so hard with Itanium, and AMD64 was the architecture developers actually wanted to use - just make the registers wider and have more of them, and make it slightly more orthogonal.

pjmlp • 2 months ago

Developers get to use the architectures OEM vendors make available to them.

pjc50 • 2 months ago

bell-cot • 2 months ago

With how long it took Intel to ship expensive, incompatible, so-so performance ia64 chips - your theory needs an alternate universe where Intel has no competitors, ever, to take advantage of the obvious market opportunity.

bombcar • 2 months ago

It was also an era where people were happily stating on PAE 32 bit x86 rather than pay the price and performance premium for Itanium.

4gb of RAM existed but many many systems weren’t even close to it yet.

pjmlp • 2 months ago

I don't need suggestions for a time I live through, I am in computers since the 1980's.

Without AMD, there was no alternative in the PC world, It was already the first 64 bit version of Windows XP.

Since we're providing suggestions in computing history, I assume you can follow the dates,

https://en.wikipedia.org/wiki/Windows_XP_editions#Windows_XP...

bell-cot • 2 months ago

pajko • 2 months ago

The first generation was complete garbage. Itanium 2 came too late and it did not get widespread due to wrong business decisions and marketing. By the time it could have been successful, AMD64 was out. And even then Intel targeted only the same high-end enterprise market segment, when they have implemented 64-bit on Xeon: https://www.cnet.com/tech/tech-industry/intel-expanding-64-b...

pjmlp • 2 months ago

That is the whole point, assume there was no AMD64 to start with.

kelnos • 2 months ago

We can't know for sure, but my guess is that Itanium still could have failed. I could imagine an alternative universe where, even with HP-UX and WinXP running on it, no one wanted to deal with porting their application software. And its emulation of 32-bit code (both in hardware and in software) was atrocious, so running existing, unported code wouldn't really take off either.

Eventually Intel gives up after motherboard/desktop/laptop makers can't build a proper market for it. Maybe Intel then decides to go back and do something similar to what AMD did with x86_64. Maybe Intel just gives up on 64-bit and tries to convince people it's not necessary, but then starts losing market share to other companies with viable 64-bit ISAs, like IMB's POWER64 or Sun's SPARC64 or whatever.

Obviously we can't know, but I think my scenario is at least as likely as yours.

pjmlp • 2 months ago

If there was no alternative way to run Windows with more than 4 GB, eventually they would no matter what.

Considering that PAE was a gimmick anyway.

jcranmer • 2 months ago

Some guesses here:

First off, Itanium was definitely meant to be the 64-bit successor to x86 (that's why it's called IA-64 after all), and moving from 32-bit to 64-bit would absolutely have been a killer feature. It's basically only after the underwhelming launch of Itanium that AMD comes out with AMD64, which becomes the actual 64-bit version of x86; once that comes out, the 64-bitness of Itanium is no longer a differentiation.

Second... given that Itanium basically implements every weird architecture feature you've ever heard of, my guess is that they decided they had the resources to make all of this stuff work. And they got into a bubble where they just simply ignored any countervailing viewpoints anytime someone brought up a problem. (This does seem to be a particular specialty of Intel.)

Third, there's definitely a baseline assumption of a sufficiently-smart compiler. And my understanding is that the Intel compiler was actually halfway decent at Itanium, whereas gcc was absolute shit at it. So while some aspects of the design are necessarily inferior (a sufficiently-smart compiler will never be as good at hardware at scavenging ILP, hardware architects, so please stop trying to foist that job on us compiler writers), it actually did do reasonably well on performance in the HPC sector.

happosai • 2 months ago

It appeared to me (from far outside) that Intel was trying to segment the market into "Affordable Home and office PC:s with x86" and "Expensive serious computing with itanium". Having everything so different was a feature, to justify the eyewateringly expensive itanium pricetag.

kuschku • 2 months ago

The same trick they pulled again with AVX512 and ECC support later on.

clausecker • 2 months ago

And the same reason NVRAM was dead on arrival. No affordable dev systems meant that only enterprise software supported it.

windward • 2 months ago

Seems shortsighted (I'm not saying you're wrong, I can imagine Intel being shortsighted). Surely the advantage of artificial segmentation is that it's artificial: you don't double up the R&D costs.

trashface • 2 months ago

Maybe they thought they would just freeze x86 architecturally going forward and Itanium would be nearly all future R&D. Not a bet I would have taken but Intel probably felt pretty unstoppable back then.

Earw0rm • 2 months ago

The IBM PS/2 play. And we all know how well that one worked out.

happosai • 2 months ago

I'm sure it worked out for many bosses. They got their bonuses and promotions and someone else got to clean up mess.

kragen • 2 months ago

They took technical risks that didn't pan out. They thought they'd be able to solve whatever problems they ran into, but they couldn't. They didn't know ahead of time that the result was going to suck. If you try to run an actual tech company, like Intel, without taking any technical risks, competitors who do take technical risks will leave you in the dust.

This doesn't apply to fake tech companies like AirBnB, Dropbox, and Stripe, and if you've spent your career at fake tech companies, your intuition is going to be "off" on this point.

twoodfin • 2 months ago

They also aimed at what turned out to be the wrong target: When Itanium was conceived, high-performance CPUs were for technical applications like CAD and physics simulation. Raw floating point throughput was what mattered. And Itanium ended up pretty darn good at that.

But between conception and delivery, the web took over the world. Branchy integer code was now the dominant server workload & workstations were getting crowded out of their niche by the commodity economics of x86.

kuschku • 2 months ago

Thanks for this comment - that's a beautiful perspective I hadn't considered before. A clean and simple definition of technology as everything that increases human productivity.

Now I can finally explain why some "tech" jobs feel like they're just not moving the needle.

eru • 2 months ago

Computer hardware isn't the only 'tech' that exists, you know?

Problems in operations research (like logistics) or fraud detection can be just as technical.

kragen • 2 months ago

Fraud detection is a Red Queen's race. If the amount of resources that goes into fraud detection and fraud commission grows by 10×, 100×, 1000×, the resulting increase in human capacities and improvement in human welfare will be nil. It may be technically challenging but it isn't technology.

Operations research is technology, but Uber isn't Gurobi, which is a real tech company like Intel, however questionable their ethics may be.

pjc50 • 2 months ago

eru • 2 months ago

> It's such an unconventional ISA that I am baffled that Intel seriously thought they would've been able to persuade anyone to switch to it from x86 [...]

I don't know, most people don't care about the ISA being weird as long as the compiler produces reasonably fast code?

fulafel • 2 months ago

> baffled that Intel seriously thought they would've been able to persuade anyone to switch to it from x86

They did persuade SGI, DEC and HP to switch from their RISCs to it though. Which turned out to be rather good for business.

fredoralive • 2 months ago

I suspect SGI and DEC / Compaq could look at a chart and see that with P6 Intel was getting very close to their RISC chips, through the power of MONEY (simplification). They weren't hitting a CISC wall, and the main moat custom RISC had left was 64 bit. Intel's 64 bit chip would inevitably become the standard chip for PCs, and therefore Intel would be able to turn its money cannon onto overpowering all 64 bit RISCs in short order. May as well get aboard the 64 bit Intel train early.

Which is nearly true 64 bit Intel chips did (mostly) kill RISC. But not their (and HP's) fun science project IA64, they had to copy AMD's "what if x86, but 64 bit?" idea instead.

zinekeller • 2 months ago

SGI and DEC, yes, but HP? Itanium was HP's idea all along! [1]

[1] https://en.wikipedia.org/wiki/Itanium#History

fulafel • 2 months ago

You're right of course.

msla • 2 months ago

"We don't care, we don't have to, we're Intel."

Plus, DEC managed to move all of its VAX users to Alpha through the simple expedient of no longer making VAXen, so I wonder if HP (which by that point had swallowed what used to be DEC) thought it could repeat that trick and sunset x86, which Intel has wanted to do for very nearly as long as the x86 has existed. See also: Intel i860

https://en.wikipedia.org/wiki/Intel_i860

kruador • 2 months ago

The 8086 was a stop-gap solution until iAPX432 was ready.

The 80286 was a stop-gap solution until iAPX432 was ready.

The 80386 started as a stop-gap solution until iAPX432 was ready, until someone higher up finally decided to kill that one.

pjc50 • 2 months ago

https://en.wikipedia.org/wiki/Intel_iAPX_432

I'd never heard of it myself, and reading that Wikipedia page it seems to have been a collection of every possible technology that didn't pan out in IC-language-OS codesign.

Meanwhile, in Britain a few years later in 1985, a small company and a dedicated engineer, Sophie Wilson, decided that what they needed was a RISC processor that was as plain and straightforward as possible ...

yongjik • 2 months ago

Well, they did persuade HP to ditch their own homegrown PA-RISC architecture and jump on board with Itanium, so there's that. I wonder how much that decision contributed to the eventual demise of HP's high performance server division ...

classichasclass • 2 months ago

A lot, I think. PA-RISC had a lot going for it, high performance, solid ISA, even some low-end consumer grade parts (not to the same degree as PowerPC but certainly more so than, say, SPARC). It could have gone much farther than it did.

Not that HP was the only one to lose their minds over Itanic (SGI in particular), but I thought they were the ones who walked away from the most.

pjc50 • 2 months ago

Am I right in thinking that the old PA-Semi team was bought by Apple, and are substantially responsible for the success of the M-series parts?

scrlk • 2 months ago

sgerenser • 2 months ago

PA Semi (Palo Alto Semiconductor) had no relation to HP’s PA-RISC (Precision Architecture RISC).

classichasclass • 2 months ago

P.A. Semi contributed greatly to Apple silicon, but the company has nothing to do with PA-RISC. In fact, their most notable chip before Apple bought them was Power ISA.

AndrewStephens • 2 months ago

I remember when IA-64 was going to be the next big thing and being utterly baffled when the instruction set was made public. Even if you could somehow ship code that efficiently used the weird instruction bundles, there was no indication that future IA-64 CPUs would have the same limits for instruction grouping.

It did make a tiny bit of sense at the time. Java was ascendant and I think Intel assumed that JIT compiled languages were going to dominate the new century and that a really good compiler could unlock performance. It was not to be.

kragen • 2 months ago

That is not what happened.

EPIC development at HP started in 01989, and the Intel collaboration was publicly announced in 01994. The planned ship date for Merced, the first Itanic, was 01998, and it was first floorplanned in 01996, the year Java was announced. Merced finally taped out in July 01999, three months after the first JIT option for the JVM shipped. Nobody was assuming that JIT compiled languages were going to dominate the new century at that time, although there were some promising signs from Self and Strongtalk that maybe they could be half as fast as C.

AndrewStephens • 2 months ago

By the time IA-64 actually got close to shipping Intel was certainly talking about JIT being a factor in its success. At least that was mentioned in the marketing guff they were putting out.

kragen • 2 months ago

nayuki • 2 months ago

> The ia64 is a very demanding architecture. In tomorrow’s entry, I’ll talk about some other ways the ia64 will make you pay the penalty when you take shortcuts in your code and manage to skate by on the comparatively error-forgiving i386.

https://devblogs.microsoft.com/oldnewthing/20040120-00/?p=40... "ia64 – misdeclaring near and far data"

https://devblogs.microsoft.com/oldnewthing/2004/01

vardump • 2 months ago

Pretty surprising. So IA64 registers were 65 bit, with the extra bit describing whether the register contains garbage or not. If NaT (Not a Thing) is set, the register contents are invalid and that can cause "fun" things to happen...

Not that this matters to anyone anymore. IA64 utterly failed long ago.

msla • 2 months ago

In case someone hasn't heard:

https://en.wikipedia.org/wiki/Itanium

> In 2019, Intel announced that new orders for Itanium would be accepted until January 30, 2020, and shipments would cease by July 29, 2021.[1] This took place on schedule.[9]

kragen • 2 months ago

It matters to people designing new hardware and maybe new virtual machine instruction sets.

nottorp • 2 months ago

Or to people caring about their software working on more than just Chrome.

... oh wait, on more than x86(64).

ashleyn • 2 months ago

There are modern VLIW architectures. I think Groq uses one. The lessons on what works and what doesn't are worth learning from history.

bri3d • 2 months ago

VLIW works for workloads where the compiler can somewhat accurately predict what will be resident in cache. It’s used everywhere in DSP, was common in GPU for awhile, and is present in lots of niche accelerators. It’s a dead end for situations where cache residency is not predictable, like any kind of multitenant general purpose workload.

msla • 2 months ago

IA64 was EPIC, which, itself, was a "lessons learned" VLIW design, in that it had things like stop bits to explicitly demarcate dependency boundaries so instructions from multiple words could be combined on future hardware with more parallelism, and speculative execution and loads, which, well, see the article on how the speculative loads were a mixed blessing.

https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...

addaon • 2 months ago

A more everyday example is the Hexagon DSP ISA in Qualcomm chips. Four-wide VLIW + SMT.

0dyl • 2 months ago

The new TI C2000 F29 series of microcontrollers are VLIW

vardump • 2 months ago

I meant narrowly only about IA64. There is sure some lessons learned value.

jcalvinowens • 2 months ago

At least they made the stack grow in the right direction! Well, half of it, anyway...

ronsor • 2 months ago

Yet another reason IA64 was a design disaster.

VLIW architectures still live on in GPUs and special purpose (parallel) processors, where these sorts of constraints are more reasonable.

MindSpunk • 2 months ago

Are any relevant GPUs VLIW anymore? As far as I'm aware they all dropped it too, moving to scalar ISAs on SIMT hardware. The last VLIW GPU I remember was AMD TeraScale, replaced by GCN where one of the most important architecture changes was dropping VLIW.

nneonneo • 2 months ago

I mean, there is a reason why these sorts of constructs are UB, even if they work on popular architectures. The problems aren’t unique to IA64, either; the better solution is to be aware that UB means UB and to avoid it studiously. (Unfortunately, that’s also hard to do in C).

loeg • 2 months ago

It's a very weird architecture to have these NAT states representable in registers but not main memory. Register spilling is a common requirement!

amluto • 2 months ago

Hah, this is IA-64. It has special hardware support for register spills, and you can search for “NaT bits” here:

https://portal.cs.umbc.edu/help/architecture/aig.pdf

to discover at least two magical registers to hold up to 127 spilled registers worth of NaT bits. So they tried.

The NaT bits are truly bizarre and I’m really not convinced they worked well. I’m not sure what happens to bits that don’t fit in those magic registers. And it’s definitely a mistake to have registers where the register’s value cannot be reliably represented in the common in-memory form of the register. x87 FPU’s 80-bit registers that are usually stored in 64-bit words in memory are another example.

dwattttt • 2 months ago

Someone • 2 months ago

Old-time x86 sort-of has “states representable in registers but not main memory”, too.

Compilers used to use its 80-bit floating point registers for 64-bit float computations, but also might spill them to memory as 64-bit float numbers.

https://hal.science/hal-00128124v3/file/floating-point.pdf section 3 has some examples, including one where the assert can fail in:

  int main (void) {
    double x = 0x1p-1022, y = 0x1p100, z;
    do_nothing(&y);
    z = x / y;
    if (z != 0) {
      do_nothing(&z);
      assert(z != 0);
    }
  }

with

  void do nothing (double *x) { }

in a different compilation unit.

mwkaufma • 2 months ago

I assume they were stored in an out-of-band mask word

awesome_dude • 2 months ago

The bigger problem is that a user cannot avoid an application where someone was writing code with UB, unless they both have the source code, and expertise in understanding it.

eru • 2 months ago

Isn't that a general problem?