Discussion:
[gentoo-user] Invalid opcode after kernel update
(too old to reply)
Fernando Rodriguez
2023-09-17 21:50:01 UTC
Permalink
A few months ago after updating my kernel I started getting an invalid
opcode error during boot on the init process on my initramfs which I did
rebuilt. Switching to the old kernel and initramfs fixed the problem so
I kept that kernel for a few months for lack of time.

Today I rebuilt the whole system using `emerge -e @world` and after that
I'm able to boot the new kernel but now some pre-compiled packages (and
some that emerge -e missed because the ebuild was masked) crash with
illegal opcode. In the case of chrome it's not crashing but it only
renders garbage for webpages.

Does anyone have a clue what is happening? It's like the instruction set
changed after the kernel update (or was it the microcode?)

Thanks,

--
Alan Mackenzie
2023-09-17 22:10:01 UTC
Permalink
Hello, Fernando.
Post by Fernando Rodriguez
A few months ago after updating my kernel I started getting an invalid
opcode error during boot on the init process on my initramfs which I did
rebuilt. Switching to the old kernel and initramfs fixed the problem so
I kept that kernel for a few months for lack of time.
I'm able to boot the new kernel but now some pre-compiled packages (and
some that emerge -e missed because the ebuild was masked) crash with
illegal opcode. In the case of chrome it's not crashing but it only
renders garbage for webpages.
Does anyone have a clue what is happening? It's like the instruction set
changed after the kernel update (or was it the microcode?)
Could it be that you've got a sporadic RAM failure? Running the
standard RAM test (the one you boot into, I've forgotten its name) for
many hours might pin down the problem.
Post by Fernando Rodriguez
Thanks,
--
Fernando Rodriguez
--
Alan Mackenzie (Nuremberg, Germany).
Fernando Rodriguez
2023-09-18 15:10:02 UTC
Permalink
Post by Alan Mackenzie
Hello, Fernando.
Post by Fernando Rodriguez
A few months ago after updating my kernel I started getting an invalid
opcode error during boot on the init process on my initramfs which I did
rebuilt. Switching to the old kernel and initramfs fixed the problem so
I kept that kernel for a few months for lack of time.
I'm able to boot the new kernel but now some pre-compiled packages (and
some that emerge -e missed because the ebuild was masked) crash with
illegal opcode. In the case of chrome it's not crashing but it only
renders garbage for webpages.
Does anyone have a clue what is happening? It's like the instruction set
changed after the kernel update (or was it the microcode?)
Could it be that you've got a sporadic RAM failure? Running the
standard RAM test (the one you boot into, I've forgotten its name) for
many hours might pin down the problem.
I ran the test to be sure but it's not sporadic. It happens all the time
with the same pre-built binaries. My last working kernel was 5.15.122,
if I boot from that kernel everything works. Before the update
everything was built with -march=native and before the 'emerge -e' I
switched to -mtune=generic but I don't think it was the flags that
messed it up but the act of rebuilding because after rebuilding the
whole system I'm still having issues with pre-compiled binaries and
those should be generic builds. Strangely the same binaries that crash
on the host system run fine on a VM using hw virtualization.

I will try to run it on gdb to find out which instruction is triggering
the fault.

Thanks,
Fernando
Fernando Rodriguez
2023-09-18 19:00:01 UTC
Permalink
Post by Fernando Rodriguez
I will try to run it on gdb to find out which instruction is triggering
the fault.
Thanks,
Fernando
The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM)
i7-8809G CPU @ 3.10GHz and it's supposed to have AVX2 but I don't see it
listed on /proc/cpuinfo. I can't reboot into the old kernel right now
but I suspect that when I do it will be there because I kind of remember
seeing it there. Any clues?
--
Fernando Rodriguez
Fernando Rodriguez
2023-09-18 19:10:01 UTC
Permalink
Post by Fernando Rodriguez
Post by Fernando Rodriguez
I will try to run it on gdb to find out which instruction is
triggering the fault.
Thanks,
Fernando
The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM)
listed on /proc/cpuinfo. I can't reboot into the old kernel right now
but I suspect that when I do it will be there because I kind of remember
 seeing it there. Any clues?
Found this on my journal: "GDS: Microcode update needed! Disabling AVX
as mitigation." So I guess it's a microcode issue. I'm using dracut with
--early-microcode and I have CONFIG_MICROCODE_INTEL set and I have the
latest (as of friday) intel-microcode. I don't have initramfs enabled
for intel-microcode but never did and it was working. Will try it when I
get back, gotta run now. Any more ideas?
--
Fernando Rodriguez
Peter Böhm
2023-09-18 19:30:01 UTC
Permalink
Post by Fernando Rodriguez
Post by Fernando Rodriguez
I will try to run it on gdb to find out which instruction is triggering
the fault.
Thanks,
Fernando
The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM)
listed on /proc/cpuinfo. I can't reboot into the old kernel right now
but I suspect that when I do it will be there because I kind of remember
seeing it there. Any clues?
It is Intel DOWNFALL, also called GDS Gather Data Sampling.

Maybe you want read: https://www.phoronix.com/review/downfall

Regards,
Peter

Loading...