Discussion:
[gentoo-user] AMD microcode error?
(too old to reply)
Peter Humphrey
2024-01-28 16:50:01 UTC
Permalink
Hello list,

For the first time ever, I received an mce error today:

[11473.528812] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 14: 9090909090909090
[11473.529657] mce: [Hardware Error]: TSC 0
[11473.530146] mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1706457141 SOCKET 0 APIC 2 microcode a201009

This is an AMD Ryzen M9 5900X.

Hits on the web suggest downgrading linux-firmware, which I've now done and
will await results. The latest upgrade was to version 20240115-r1, four days
ago.

Has anyone else experienced this?
--
Regards,
Peter.
Peter Humphrey
2024-01-28 16:50:01 UTC
Permalink
Post by Peter Humphrey
Hits on the web suggest downgrading linux-firmware, which I've now done and
will await results. The latest upgrade was to version 20240115-r1, four days
ago.
s/Hits/Hints/
--
Regards,
Peter.
Mark Knecht
2024-01-28 17:10:01 UTC
Permalink
Post by Peter Humphrey
Post by Peter Humphrey
Hits on the web suggest downgrading linux-firmware, which I've now done and
will await results. The latest upgrade was to version 20240115-r1, four days
ago.
s/Hits/Hints/
--
Regards,
Peter.
If it is a memory error then there are there possibilities:

1) The new linux-firmware has a problem and the error is untrue

2) The DRAM was bad but not tested earlier and is true

3) The DRAM has gone bad and the error is true

A reasonable next step is to run some sort of longer term
memory test, memtest 86, memtest64 or something else of your choice.

Good luck,
Mark
Michael
2024-01-28 17:50:02 UTC
Permalink
Post by Mark Knecht
Post by Peter Humphrey
Post by Peter Humphrey
Hits on the web suggest downgrading linux-firmware, which I've now done
and
Post by Peter Humphrey
Post by Peter Humphrey
will await results. The latest upgrade was to version 20240115-r1, four
days
Post by Peter Humphrey
Post by Peter Humphrey
ago.
s/Hits/Hints/
--
Regards,
Peter.
1) The new linux-firmware has a problem and the error is untrue
2) The DRAM was bad but not tested earlier and is true
3) The DRAM has gone bad and the error is true
A reasonable next step is to run some sort of longer term
memory test, memtest 86, memtest64 or something else of your choice.
Good luck,
Mark
I'm not sure a microcode update has been released yet by AMD as a blob,
outside what they make available to MoBo OEMs within 'BIOS firmware' updates.
To find what's in the box use:

dmesg | grep -i 'family:'

Then check what CPU family and model microcodes the linux-firmware contains:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
tree/amd-ucode/README

If you can't find your family and model in the above, then you could check
what firmware updates are available by the MoBo's OEM. These would include
microcode made directly available by AMD to the OEM.
Peter Humphrey
2024-01-29 16:20:01 UTC
Permalink
Post by Michael
I'm not sure a microcode update has been released yet by AMD as a blob,
outside what they make available to MoBo OEMs within 'BIOS firmware'
dmesg | grep -i 'family:'
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
tree/amd-ucode/README
No luck with those.
Post by Michael
If you can't find your family and model in the above, then you could check
what firmware updates are available by the MoBo's OEM. These would include
microcode made directly available by AMD to the OEM.
That's ASRock X570 Taichi. Their pages suggest that they only acknowledge
Windows 10 & 11.

I'll keep my eyes open for another glitch. Maybe the microcode isn't to blame
at all, in which case I'd better not sleep on the job.

Thanks for the pointers.
--
Regards,
Peter.
Michael
2024-01-29 17:20:01 UTC
Permalink
Post by Peter Humphrey
Post by Michael
I'm not sure a microcode update has been released yet by AMD as a blob,
outside what they make available to MoBo OEMs within 'BIOS firmware'
dmesg | grep -i 'family:'
Then check what CPU family and model microcodes the linux-firmware
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.gi
t/ tree/amd-ucode/README
No luck with those.
OK, this means there is no microcode to load via the linux-firmware releases
(yet).
Post by Peter Humphrey
Post by Michael
If you can't find your family and model in the above, then you could check
what firmware updates are available by the MoBo's OEM. These would include
microcode made directly available by AMD to the OEM.
That's ASRock X570 Taichi. Their pages suggest that they only acknowledge
Windows 10 & 11.
Check the BIOS version in dmesg and compare it with the with the ASRock's AMD
chipset image on the asrock.com website. If the versions/dates are the same
you have nothing more to do. If the version on the website is more recent
then you may want to flash the MoBo with it.

Download the zip archive on offer and unzip it, then store the new image on a
USB stick which has been formatted with FAT32. Some OEMs require you rename
the firmware image file, it will say so on the website, or in a README within
the zip archive. Reboot and press [F2] during POST to get into the BIOS setup
menu, then go to the Tools tab to flash it from the USB.

You may have to re-apply in the BIOS menu any changes you had previously made
after the PC reboots, because restoring the settings from a backup file
doesn't always work.
Post by Peter Humphrey
I'll keep my eyes open for another glitch. Maybe the microcode isn't to
blame at all, in which case I'd better not sleep on the job.
Well, updating the BIOS firmware with the latest version often contains
patches for bugs and microcode patches for CPU vulnerabilities. However, this
does not mean it will address the MCE errors you were experiencing.
ralfconn
2024-01-28 17:50:02 UTC
Permalink
Post by Peter Humphrey
Hello list,
[11473.528812] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 14: 9090909090909090
[11473.529657] mce: [Hardware Error]: TSC 0
[11473.530146] mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1706457141 SOCKET 0 APIC 2 microcode a201009
This is an AMD Ryzen M9 5900X.
Hits on the web suggest downgrading linux-firmware, which I've now done and
will await results. The latest upgrade was to version 20240115-r1, four days
ago.
Has anyone else experienced this?
No:

$ eix -I linux-firmware
[I] sys-kernel/linux-firmware
     Available versions:  (~)20231111-r1^bstd 20231211^bstd
20240115^bstd (~)20240115-r1^bstd **99999999*l^bstd {compress-xz
compress-zstd deduplicate initramfs +redistributable savedconfig
unknown-license}
     Installed versions:  20240115-r1^bst(10:52:28
01/27/24)(redistributable savedconfig -compress-xz -compress-zstd
-deduplicate -initramfs -unknown-license)

$ grep -e "microcode\|model name" /proc/cpuinfo
model name    : AMD Ryzen 9 5900X 12-Core Processor
microcode    : 0xa20120e

raf
Loading...