Back in 1980 the Intel 8087 Math-Coprocessor was introduced. It extended the 16-Bit 8086/8088 with a 80-Bit Floating Point-Unit (80-Bit internally Registers, 64-Bit externally - as far as I understand). This was useful to do Mathemathic-Calculations with higher accuracy and with higher speed.
I saw Benchmarks that claim Factor 100 faster for 32-Bit Division compared to Software-Emulation on 8086 (Probably even faster when using 64-Bit Division).
This Architecure was optimized (less clock cycles per Operation) with 287, 387, 487, ...) but the general Architecture was never changed. Integer Part of 8086 was extended from 16-Bit to 64 Bit (8086 vs Athlon64 / x64),Floating-Point is still 64/80-Bit.
I think that at least "Quadruple Precision FP", definded in IEEE 754 (that would be 128 Bit externelly - 160 BIt interally) shold be implemented for scientific use.
The only Architecture with 128 Bit FP seems to be OS/390 (IBM Host).
Itanium offers 128 Bit internally - 82 BIt externally (a little bit better), but Intel tells us, that Itanium will die.
Is there no market for these Extensions (Or do Banks still use Mainframes because of the greater precision in FP)?
The Extension AVX2 is 256 Bit - but can only be used for 4 parallel 64-Bit-Operations.
Wouldn't it be useful for some special cases to have a single 256-Bit FP-Unit? Would it be much effort to redesign the AVX-Extensions to offer 256 FP as well?
As far as I understand it needs many "Loops" to process Data larger 64-Bit on current Archivtecture, this slows down processing.