Discussion:
[gentoo-user] Getting output of a program running in background after a crash
(too old to reply)
Dale
2023-10-08 18:00:02 UTC
Permalink
Howdy,

I use Konsole a lot, that thing within KDE that acts like a console. 
Anyway, I'm running a offline file system check on a rather large file
system.  For some reason, Konsole decided to crash.  I can see the file
system is still running with top, ps etc but I can't see anything to
know what it is doing.  Is there a way to get that back?  Should I kill
it and restart now that Konsole is running again?  I'd think a regular
term signal would give it a safe stopping place but still kinda chicken
to do it.  Then again, what if it stops and needs my input or worse yet,
it displays a error that I can't see but I need to know and see? 

Any thoughts?  Is there a way to get it back?  Kill it and restart?  Do
nothing and hope for the best? 

Thanks.

Dale

:-)  :-) 
Jude DaShiell
2023-10-08 18:10:01 UTC
Permalink
If I understand your question, this may help. Understand prog is the
program that errors out in this example:
prog 2>&1 | tee prog.err
Look for all output including errors in the file prog.err which tee will
have created for you and before opening prog.err try wc-l prog.err and
grep -i error prog.err to do initial inspection. If the wc command
returns 0 then no need to do the grep search since the file is empty.


-- Jude <jdashiel at panix dot com> "There are four boxes to be used in
defense of liberty: soap, ballot, jury, and ammo. Please use in that
order." Ed Howdershelt 1940.
Post by Dale
Howdy,
I use Konsole a lot, that thing within KDE that acts like a console.=C2=
=A0
Post by Dale
Anyway, I'm running a offline file system check on a rather large file
system.=C2=A0 For some reason, Konsole decided to crash.=C2=A0 I can see =
the file
Post by Dale
system is still running with top, ps etc but I can't see anything to
know what it is doing.=C2=A0 Is there a way to get that back?=C2=A0 Shoul=
d I kill
Post by Dale
it and restart now that Konsole is running again?=C2=A0 I'd think a regul=
ar
Post by Dale
term signal would give it a safe stopping place but still kinda chicken
to do it.=C2=A0 Then again, what if it stops and needs my input or worse =
yet,
Post by Dale
it displays a error that I can't see but I need to know and see?=C2=A0
Any thoughts?=C2=A0 Is there a way to get it back?=C2=A0 Kill it and rest=
art?=C2=A0 Do
Post by Dale
nothing and hope for the best?=C2=A0
Thanks.
Dale
:-)=C2=A0 :-)=C2=A0
Mark Knecht
2023-10-08 18:40:01 UTC
Permalink
Post by Dale
Howdy,
I use Konsole a lot, that thing within KDE that acts like a console.
Anyway, I'm running a offline file system check on a rather large file
system. For some reason, Konsole decided to crash. I can see the file
system is still running with top, ps etc but I can't see anything to
know what it is doing. Is there a way to get that back? Should I kill
it and restart now that Konsole is running again? I'd think a regular
term signal would give it a safe stopping place but still kinda chicken
to do it. Then again, what if it stops and needs my input or worse yet,
it displays a error that I can't see but I need to know and see?
Any thoughts? Is there a way to get it back? Kill it and restart? Do
nothing and hope for the best?
Thanks.
Dale
:-) :-)
I would suggest you learn screen - a very simple app that allows you to
start an app and then disconnect from it. You can then log out, close
your terminal, or in your case if konsole really crashed, you just open
a new konsole and reconnect.

The screen process keeps all the terminal output so you can review
it while the process is running or after it has finished.

I do not know how to reliably get access to your process if it's
really still running. Someone else here can probably give you
better instructions on that.

HTH,
Mark
Dale
2023-10-08 19:00:01 UTC
Permalink
Post by Mark Knecht
Post by Dale
Howdy,
I use Konsole a lot, that thing within KDE that acts like a console.
Anyway, I'm running a offline file system check on a rather large file
system.  For some reason, Konsole decided to crash.  I can see the file
system is still running with top, ps etc but I can't see anything to
know what it is doing.  Is there a way to get that back?  Should I kill
it and restart now that Konsole is running again?  I'd think a regular
term signal would give it a safe stopping place but still kinda chicken
to do it.  Then again, what if it stops and needs my input or worse yet,
it displays a error that I can't see but I need to know and see?
Any thoughts?  Is there a way to get it back?  Kill it and restart?  Do
nothing and hope for the best?
Thanks.
Dale
:-)  :-)
I would suggest you learn screen - a very simple app that allows you to
start an app and then disconnect from it. You can then log out, close
your terminal, or in your case if konsole really crashed, you just open
a new konsole and reconnect. 
The screen process keeps all the terminal output so you can review
it while the process is running or after it has finished.
I do not know how to reliably get access to your process if it's 
really still running. Someone else here can probably give you
better instructions on that.
HTH,
Mark
I was hoping I would catch a real quick response, even tho that wasn't
very likely.  After about 45 minutes or so, I did a pkill on it.  I seem
to recall it is about the same as a ctrl c which is a polite 'stop what
you doing' when safely possible, in most cases anyway.  I then started a
new screen process and restarted the file system check.  It's still
working on it on the other desktop.  So, even tho I hadn't read your
reply yet, I still did what you advised.  It's running in a screen
process now.  I can reattach if Konsole dies again.  Good advice tho. 
Should have did that before.  ;-)

I don't know what happened to Konsole tho.  It's crashed once before a
month or so ago and then again a bit ago.  Before that, I can't recall
it ever crashing on me before.  It appears someone is adding a feature
that includes the occasional crash as a added bonus.  ROFL 

I'm glad I made new backups.  Before Konsole crashed, it was spitting
out a LOT of stuff that I'm not sure is good.  It even mentioned
possible lost data.  I got a new 18TB hard drive and was in the process
of moving data to it and resizing the file system when this all
started.  I can't mount right now so no idea if it is still there or
not.  Now let us pray. 

Dale

:-)  :-) 
Dale
2023-10-09 00:50:01 UTC
Permalink
Post by Dale
Post by Mark Knecht
Post by Dale
Howdy,
I use Konsole a lot, that thing within KDE that acts like a console.
Anyway, I'm running a offline file system check on a rather large file
system.  For some reason, Konsole decided to crash.  I can see the file
system is still running with top, ps etc but I can't see anything to
know what it is doing.  Is there a way to get that back?  Should I kill
it and restart now that Konsole is running again?  I'd think a regular
term signal would give it a safe stopping place but still kinda chicken
to do it.  Then again, what if it stops and needs my input or worse
yet,
Post by Dale
it displays a error that I can't see but I need to know and see?
Any thoughts?  Is there a way to get it back?  Kill it and restart?  Do
nothing and hope for the best?
Thanks.
Dale
:-)  :-)
I would suggest you learn screen - a very simple app that allows you to
start an app and then disconnect from it. You can then log out, close
your terminal, or in your case if konsole really crashed, you just open
a new konsole and reconnect. 
The screen process keeps all the terminal output so you can review
it while the process is running or after it has finished.
I do not know how to reliably get access to your process if it's 
really still running. Someone else here can probably give you
better instructions on that.
HTH,
Mark
I was hoping I would catch a real quick response, even tho that wasn't
very likely.  After about 45 minutes or so, I did a pkill on it.  I
seem to recall it is about the same as a ctrl c which is a polite
'stop what you doing' when safely possible, in most cases anyway.  I
then started a new screen process and restarted the file system
check.  It's still working on it on the other desktop.  So, even tho I
hadn't read your reply yet, I still did what you advised.  It's
running in a screen process now.  I can reattach if Konsole dies
again.  Good advice tho.  Should have did that before.  ;-)
I don't know what happened to Konsole tho.  It's crashed once before a
month or so ago and then again a bit ago.  Before that, I can't recall
it ever crashing on me before.  It appears someone is adding a feature
that includes the occasional crash as a added bonus.  ROFL 
I'm glad I made new backups.  Before Konsole crashed, it was spitting
out a LOT of stuff that I'm not sure is good.  It even mentioned
possible lost data.  I got a new 18TB hard drive and was in the
process of moving data to it and resizing the file system when this
all started.  I can't mount right now so no idea if it is still there
or not.  Now let us pray. 
Dale
:-)  :-) 
Just as a update.  The file system I was trying to do a file system
check on was my large one, about 40TBs worth.  While running the file
system check, it started using HUGE amounts of memory.  It used almost
all my 32GBs and most of swap as well.  It couldn't finish due to not
enough memory, it literally crashed itself.  So, I don't know if this is
because of some huge problem or what but if this is expected behavior,
don't try to do a file system check on devices that large unless you
have a LOT of memory. 

I ended up recreating the LVM devices from scratch and redoing the
encryption as well.  I have backups tho.  This all started when using
pvmove to replace a hard drive with a larger drive.  I guess pvmove
isn't always safe.  It took going on two days to move. 

Oh, Mark had good advice too.  Do important stuff in screen just in case
something crashes, like Konsole.  :/ 

Thanks.  Hope someone learns from my boo boo. 

Dale

:-) :-) 

P. S.  I currently have my backup system on my old Gigabyte 770T mobo
and friends.  It is still a bit slower than copying when no encryption
is used so I guess encryption does slow things down a bit.  That said,
the CPU does hang around 50% most of the time.  htop doesn't show what
is using that so it must be IO or encryption.  Or something kernel
related that htop doesn't show.  No idea. 
Frank Steinmetzger
2023-10-09 11:10:01 UTC
Permalink
Post by Dale
Just as a update.  The file system I was trying to do a file system
check on was my large one, about 40TBs worth.  While running the file
system check, it started using HUGE amounts of memory.  It used almost
all my 32GBs and most of swap as well.  It couldn't finish due to not
enough memory, it literally crashed itself.  So, I don't know if this is
because of some huge problem or what but if this is expected behavior,
don't try to do a file system check on devices that large unless you
have a LOT of memory. 
Or use a different filesystem. O:-)
Post by Dale
I ended up recreating the LVM devices from scratch and redoing the
encryption as well.  I have backups tho.  This all started when using
pvmove to replace a hard drive with a larger drive.  I guess pvmove
isn't always safe.
I think that may be a far-fetched conclusion. If it weren’t safe, it
wouldn’t be in the software – or at least not advertised as safe.
Post by Dale
P. S.  I currently have my backup system on my old Gigabyte 770T mobo
and friends.  It is still a bit slower than copying when no encryption
is used so I guess encryption does slow things down a bit.  That said,
the CPU does hang around 50% most of the time.  htop doesn't show what
is using that so it must be IO or encryption.
You can add more widgets (“meters”) to htop, one of them shows disk
throughput. But there is none for I/O wait. One tool that does show that is
glances. And also dstat which I mentioned a few days ago. Not only can dstat
tell you the total percentage, but also which process is the most expensive
one.

I set up bash aliases for different use cases of dstat:
alias ,d='dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? /dev/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem --swap'
alias ,dd='dstat --time --cpu --disk --disk-util -D $(ls /dev/sd? /dev/nvme?n? /dev/mmcblk? 2>/dev/null | tr "\n" ,) --mem-adv'
alias ,dm='dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? /dev/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem-adv --swap'
alias ,dt='dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? /dev/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem --swap --top-cpu --top-bio --top-io --top-mem'

Because I attach external storage once in a while, I use a dynamic list of
devices to watch that is passed to the -D argument. If I don’t use -D, dstat
will only show a total for all drives.

The first is a simple overview (d = dstat).

The second is the same but only for disk statistics (dd = dstat disks). I
use it mostly on my NAS (five SATA drives in total, which creates a very
wide table).

The third shows more memory details like dirty cache (dm = dstat memory),
which is interesting when copying large files.

And the last one shows the top “pigs”, i.e. expensive processes in terms of
CPU, IO and memory (dt = dstat top).
Post by Dale
Or something kernel
related that htop doesn't show.  No idea. 
Perhaps my tool tips give you ideas. :)
--
GrÌße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.

What is the difference between two flutes? – A semitone.
Dale
2023-10-11 18:00:01 UTC
Permalink
Post by Frank Steinmetzger
Post by Dale
Just as a update.  The file system I was trying to do a file system
check on was my large one, about 40TBs worth.  While running the file
system check, it started using HUGE amounts of memory.  It used almost
all my 32GBs and most of swap as well.  It couldn't finish due to not
enough memory, it literally crashed itself.  So, I don't know if this is
because of some huge problem or what but if this is expected behavior,
don't try to do a file system check on devices that large unless you
have a LOT of memory. 
Or use a different filesystem. O:-)
I'm using ext4 which is said to be one of the most reliable and widely
used file systems.  I do wonder tho, am I creating file systems that may
be to large or that it just has trouble with???  I doubt that but I'm up
to about 40TBs now.  I just can't figure out a way to split that data
up, yet.
Post by Frank Steinmetzger
Post by Dale
I ended up recreating the LVM devices from scratch and redoing the
encryption as well.  I have backups tho.  This all started when using
pvmove to replace a hard drive with a larger drive.  I guess pvmove
isn't always safe.
I think that may be a far-fetched conclusion. If it weren’t safe, it
wouldn’t be in the software – or at least not advertised as safe.
Well, something went sideways.  Honestly, I think it might not be pvmove
but something happened with the file system itself. After all, LVM
wasn't complaining at all and everything showed the move completed with
no errors.  I guess it is possible pvmove had a problem but given it was
the file system that complained so loudly, I'm leaning to it having a
issue. 
Post by Frank Steinmetzger
Post by Dale
P. S.  I currently have my backup system on my old Gigabyte 770T mobo
and friends.  It is still a bit slower than copying when no encryption
is used so I guess encryption does slow things down a bit.  That said,
the CPU does hang around 50% most of the time.  htop doesn't show what
is using that so it must be IO or encryption.
You can add more widgets (“meters”) to htop, one of them shows disk
throughput. But there is none for I/O wait. One tool that does show that is
glances. And also dstat which I mentioned a few days ago. Not only can dstat
tell you the total percentage, but also which process is the most expensive
one.
alias ,d='dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? /dev/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem --swap'
alias ,dd='dstat --time --cpu --disk --disk-util -D $(ls /dev/sd? /dev/nvme?n? /dev/mmcblk? 2>/dev/null | tr "\n" ,) --mem-adv'
alias ,dm='dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? /dev/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem-adv --swap'
alias ,dt='dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? /dev/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem --swap --top-cpu --top-bio --top-io --top-mem'
Because I attach external storage once in a while, I use a dynamic list of
devices to watch that is passed to the -D argument. If I don’t use -D, dstat
will only show a total for all drives.
The first is a simple overview (d = dstat).
The second is the same but only for disk statistics (dd = dstat disks). I
use it mostly on my NAS (five SATA drives in total, which creates a very
wide table).
The third shows more memory details like dirty cache (dm = dstat memory),
which is interesting when copying large files.
And the last one shows the top “pigs”, i.e. expensive processes in terms of
CPU, IO and memory (dt = dstat top).
Post by Dale
Or something kernel
related that htop doesn't show.  No idea. 
Perhaps my tool tips give you ideas. :)
-- GrÌße | Greetings | Salut | Qapla’ Please do not share anything
from, with or about me on any social network. What is the difference
between two flutes? – A semitone.
Dang, I have a lot of drives here to add to all that.  Bad thing is,
every time I reboot, all but two I think tend to move around, even tho I
haven't moved anything.  This is why I use either labels or UUIDs by the
way.  Once ages ago, I saw a way to make commands/scripts see all drives
on a system with some sort of inclusive trick.  I think it used brackets
but not sure.  I can't find that trick anymore.  I should have saved
that thing. 

I used some command, can't recall which it was, and I think it is the
kernel itself using so much CPU time.  Given when it does it, I think it
is either processing the encryption or working to send the data to the
disks, or both.  I'd suspect both but I dunno. 

Anyway, I'm restoring from a fresh LVM rebuild now.  No way to test
anything to see what the problem was now. 

Dale

:-)  :-) 
Jude DaShiell
2023-10-11 20:00:01 UTC
Permalink
Linux is being targeted by ransomware and other forms of malware so it may
be worthwhile to run ferensics on your backup and find what ferensics have
to tell you. After check and see if any of what you found out were false
positives.


-- Jude <jdashiel at panix dot com> "There are four boxes to be used in
defense of liberty: soap, ballot, jury, and ammo. Please use in that
order." Ed Howdershelt 1940.
Post by Dale
Post by Frank Steinmetzger
Just as a update.=C2=A0 The file system I was trying to do a file syst=
em
Post by Dale
Post by Frank Steinmetzger
check on was my large one, about 40TBs worth.=C2=A0 While running the =
file
Post by Dale
Post by Frank Steinmetzger
system check, it started using HUGE amounts of memory.=C2=A0 It used a=
lmost
Post by Dale
Post by Frank Steinmetzger
all my 32GBs and most of swap as well.=C2=A0 It couldn't finish due to=
not
Post by Dale
Post by Frank Steinmetzger
enough memory, it literally crashed itself.=C2=A0 So, I don't know if =
this is
Post by Dale
Post by Frank Steinmetzger
because of some huge problem or what but if this is expected behavior,
don't try to do a file system check on devices that large unless you
have a LOT of memory.=C2=A0
Or use a different filesystem. O:-)
I'm using ext4 which is said to be one of the most reliable and widely
used file systems.=C2=A0 I do wonder tho, am I creating file systems that=
may
Post by Dale
be to large or that it just has trouble with???=C2=A0 I doubt that but I'=
m up
Post by Dale
to about 40TBs now.=C2=A0 I just can't figure out a way to split that dat=
a
Post by Dale
up, yet.
Post by Frank Steinmetzger
I ended up recreating the LVM devices from scratch and redoing the
encryption as well.=C2=A0 I have backups tho.=C2=A0 This all started w=
hen using
Post by Dale
Post by Frank Steinmetzger
pvmove to replace a hard drive with a larger drive.=C2=A0 I guess pvmo=
ve
Post by Dale
Post by Frank Steinmetzger
isn't always safe.
I think that may be a far-fetched conclusion. If it weren=E2=80=99t saf=
e, it
Post by Dale
Post by Frank Steinmetzger
wouldn=E2=80=99t be in the software =E2=80=93 or at least not advertise=
d as safe.
Post by Dale
Well, something went sideways.=C2=A0 Honestly, I think it might not be pv=
move
Post by Dale
but something happened with the file system itself. After all, LVM
wasn't complaining at all and everything showed the move completed with
no errors.=C2=A0 I guess it is possible pvmove had a problem but given it=
was
Post by Dale
the file system that complained so loudly, I'm leaning to it having a
issue.=C2=A0
Post by Frank Steinmetzger
P. S.=C2=A0 I currently have my backup system on my old Gigabyte 770T =
mobo
Post by Dale
Post by Frank Steinmetzger
and friends.=C2=A0 It is still a bit slower than copying when no encry=
ption
Post by Dale
Post by Frank Steinmetzger
is used so I guess encryption does slow things down a bit.=C2=A0 That =
said,
Post by Dale
Post by Frank Steinmetzger
the CPU does hang around 50% most of the time.=C2=A0 htop doesn't show=
what
Post by Dale
Post by Frank Steinmetzger
is using that so it must be IO or encryption.
You can add more widgets (=E2=80=9Cmeters=E2=80=9D) to htop, one of the=
m shows disk
Post by Dale
Post by Frank Steinmetzger
throughput. But there is none for I/O wait. One tool that does show tha=
t is
Post by Dale
Post by Frank Steinmetzger
glances. And also dstat which I mentioned a few days ago. Not only can =
dstat
Post by Dale
Post by Frank Steinmetzger
tell you the total percentage, but also which process is the most expen=
sive
Post by Dale
Post by Frank Steinmetzger
one.
alias ,d=3D'dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? /de=
v/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem --swap'
Post by Dale
Post by Frank Steinmetzger
alias ,dd=3D'dstat --time --cpu --disk --disk-util -D $(ls /dev/sd? /de=
v/nvme?n? /dev/mmcblk? 2>/dev/null | tr "\n" ,) --mem-adv'
Post by Dale
Post by Frank Steinmetzger
alias ,dm=3D'dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? /d=
ev/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem-adv --swap'
Post by Dale
Post by Frank Steinmetzger
alias ,dt=3D'dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? /d=
ev/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem --swap --top-cpu --top-bio =
--top-io --top-mem'
Post by Dale
Post by Frank Steinmetzger
Because I attach external storage once in a while, I use a dynamic list=
of
Post by Dale
Post by Frank Steinmetzger
devices to watch that is passed to the -D argument. If I don=E2=80=99t =
use -D, dstat
Post by Dale
Post by Frank Steinmetzger
will only show a total for all drives.
The first is a simple overview (d =3D dstat).
The second is the same but only for disk statistics (dd =3D dstat disks=
). I
Post by Dale
Post by Frank Steinmetzger
use it mostly on my NAS (five SATA drives in total, which creates a ver=
y
Post by Dale
Post by Frank Steinmetzger
wide table).
The third shows more memory details like dirty cache (dm =3D dstat memo=
ry),
Post by Dale
Post by Frank Steinmetzger
which is interesting when copying large files.
And the last one shows the top =E2=80=9Cpigs=E2=80=9D, i.e. expensive p=
rocesses in terms of
Post by Dale
Post by Frank Steinmetzger
CPU, IO and memory (dt =3D dstat top).
Or something kernel
related that htop doesn't show.=C2=A0 No idea.=C2=A0
Perhaps my tool tips give you ideas. :)
-- Gr=C3=BC=C3=9Fe | Greetings | Salut | Qapla=E2=80=99 Please do not s=
hare anything
Post by Dale
Post by Frank Steinmetzger
from, with or about me on any social network. What is the difference
between two flutes? =E2=80=93 A semitone.
Dang, I have a lot of drives here to add to all that.=C2=A0 Bad thing is,
every time I reboot, all but two I think tend to move around, even tho I
haven't moved anything.=C2=A0 This is why I use either labels or UUIDs by=
the
Post by Dale
way.=C2=A0 Once ages ago, I saw a way to make commands/scripts see all dr=
ives
Post by Dale
on a system with some sort of inclusive trick.=C2=A0 I think it used brac=
kets
Post by Dale
but not sure.=C2=A0 I can't find that trick anymore.=C2=A0 I should have =
saved
Post by Dale
that thing.=C2=A0
I used some command, can't recall which it was, and I think it is the
kernel itself using so much CPU time.=C2=A0 Given when it does it, I thin=
k it
Post by Dale
is either processing the encryption or working to send the data to the
disks, or both.=C2=A0 I'd suspect both but I dunno.=C2=A0
Anyway, I'm restoring from a fresh LVM rebuild now.=C2=A0 No way to test
anything to see what the problem was now.=C2=A0
Dale
:-)=C2=A0 :-)=C2=A0
Neil Bothwick
2023-10-12 07:10:01 UTC
Permalink
It only does this when I'm copying files over.  Right now I'm copying
about 26TBs of data over ethernet and it is taking a while.  Once I stop
it or it finishes the copy, the CPU goes to about nothing, unless I'm
doing something else.  So it has something to do with the copy process.
Or the network. What are you using to copy? If you use rsync, you can
make use the the --bwlimit option to reduce the speed and network load.
--
Neil Bothwick

Is that "woof" feed me; "woof" walk me; "woof" there's a burglar? What??
Neil Bothwick
2023-10-12 16:00:01 UTC
Permalink
Post by Neil Bothwick
It only does this when I'm copying files over.  Right now I'm copying
about 26TBs of data over ethernet and it is taking a while.  Once I
stop it or it finishes the copy, the CPU goes to about nothing,
unless I'm doing something else.  So it has something to do with the
copy process.
Or the network. What are you using to copy? If you use rsync, you can
make use the the --bwlimit option to reduce the speed and network load.
Reduce?  I wouldn't complain if it went faster.  I think it is about as
fast as it is going to get tho.
And that may be contributing to the CPU usage. Slowing down the flow may
make the comouter more usable, and you're never going to copy 26TB
quickly, especially over ethernet.
While I'm not sure what is keeping me from copying as fast as the drives
themselves can go, I suspect it is the encryption.
If you're copying over the network, that will be the limiting factor.
--
Neil Bothwick

Nixon's Principal: If 2 wrongs don't make a right, try 3.
Michael
2023-10-12 21:50:02 UTC
Permalink
Post by Neil Bothwick
Post by Neil Bothwick
It only does this when I'm copying files over. Right now I'm copying
about 26TBs of data over ethernet and it is taking a while. Once I
stop it or it finishes the copy, the CPU goes to about nothing,
unless I'm doing something else. So it has something to do with the
copy process.
Or the network. What are you using to copy? If you use rsync, you can
make use the the --bwlimit option to reduce the speed and network load.
Reduce? I wouldn't complain if it went faster. I think it is about as
fast as it is going to get tho.
And that may be contributing to the CPU usage. Slowing down the flow may
make the comouter more usable, and you're never going to copy 26TB
quickly, especially over ethernet.
While I'm not sure what is keeping me from copying as fast as the drives
themselves can go, I suspect it is the encryption.
Why don't you test throughput without encryption to confirm your assumption?
Post by Neil Bothwick
If you're copying over the network, that will be the limiting factor.
Someone posted some extra options to mount with and add to exports
file. Those added options almost doubled the speed. I watch gkrellm
and I think it is going about as fast as it can. My problem is, some
software uses one unit to measure things while another uses something
else. It makes it hard to figure out what is doing what. Still, using
gkrellm which is something I'm used to watching when it comes to drive
read/write data, I think it is as good as it is going to get. Not that
I'm not open to trying other options that might speed things up. I
still think encryption is slowing it down some. As you say tho,
ethernet isn't helping which is why I may look into other options later,
faster ethernet or fiber if I can find something cheap enough.
There are a lot of hypotheses in your statements, but not much testing to
prove or disprove any of them.

Why don't you try to isolate the cause by testing one system element at a time
and see what results you get.

Copy a large file from tmpfs to tmpfs to see how fast it can transfer across
your LAN - or use iperf3 as already recommended. Use a file size large enough
to saturate your network and give you a real life max throughput.

Repeat, but this time copy the large file over to disk.

Repeat, but this time try different filesystems, disks, volumes, strides/
stripes, add encryption, compression, whatnot.

You may spend an hour or two, but you'd soon isolate the major contributing
factor(s) causing the observed slowdown.

Unless you're running Pentium 4 or some other old CPU, it is almost certain
your CPU is capable of using AES-NI to offload to hardware some/all of the
encryption/decryption load - as long as you have the crypto module built in
your kernel.

PS. Keep notes and flush caches between tests to avoid drawing conclusions on
spurious results.
Frank Steinmetzger
2023-10-12 23:10:01 UTC
Permalink
Post by Michael
Post by Neil Bothwick
Post by Neil Bothwick
It only does this when I'm copying files over. Right now I'm copying
about 26TBs of data over ethernet and it is taking a while. Once I
stop it or it finishes the copy, the CPU goes to about nothing,
unless I'm doing something else. So it has something to do with the
copy process.
Or the network. What are you using to copy? If you use rsync, you can
make use the the --bwlimit option to reduce the speed and network load.
Reduce? I wouldn't complain if it went faster. I think it is about as
fast as it is going to get tho.
And that may be contributing to the CPU usage. Slowing down the flow may
make the comouter more usable, and you're never going to copy 26TB
quickly, especially over ethernet.
While I'm not sure what is keeping me from copying as fast as the drives
themselves can go, I suspect it is the encryption.
Why don't you test throughput without encryption to confirm your assumption?
What does `cryptsetup benchmark` say? I used to use a Celeron G1840 in my
NAS, which is Intel Haswell without AES_NI. It was able to do ~ 150 MB/s raw
encryption throughput when transferring to or from a LUKS’ed image in a
ramdisk, so almost 150 % of gigabit ethernet speed.
Post by Michael
Post by Neil Bothwick
If you're copying over the network, that will be the limiting factor.
Someone posted some extra options to mount with and add to exports
file.
Ah right, you use NFS. If not, I’d have suggested not to use rsync over ssh,
because that would indeed introduce a lot of encryption overhead.
Post by Michael
I still think encryption is slowing it down some. As you say tho,
ethernet isn't helping which is why I may look into other options later,
faster ethernet or fiber if I can find something cheap enough.
There are a lot of hypotheses in your statements, but not much testing to
prove or disprove any of them.
Why don't you try to isolate the cause by testing one system element at a time
and see what results you get.
[
]
Unless you're running Pentium 4 or some other old CPU, it is almost certain
your CPU is capable of using AES-NI to offload to hardware some/all of the
encryption/decryption load - as long as you have the crypto module built in
your kernel.
The FX-8350 may be old, but it actually does have AES instructions.

Here is my Haswell i5 (only two years younger than the FX) with AES_NI:

~ LC_ALL=C cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 1323959 iterations per second for 256-bit key
PBKDF2-sha256 1724631 iterations per second for 256-bit key
PBKDF2-sha512 1137284 iterations per second for 256-bit key
PBKDF2-ripemd160 706587 iterations per second for 256-bit key
PBKDF2-whirlpool 510007 iterations per second for 256-bit key
argon2i 7 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 7 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 679.8 MiB/s 2787.0 MiB/s
serpent-cbc 128b 91.4 MiB/s 582.1 MiB/s
twofish-cbc 128b 194.9 MiB/s 368.3 MiB/s
aes-cbc 256b 502.3 MiB/s 2155.4 MiB/s
serpent-cbc 256b 90.3 MiB/s 582.5 MiB/s
twofish-cbc 256b 194.0 MiB/s 368.6 MiB/s
aes-xts 256b 2470.8 MiB/s 2478.7 MiB/s
serpent-xts 256b 537.4 MiB/s 526.1 MiB/s
twofish-xts 256b 347.3 MiB/s 347.3 MiB/s
aes-xts 512b 1932.6 MiB/s 1958.0 MiB/s
serpent-xts 512b 532.9 MiB/s 522.9 MiB/s
twofish-xts 512b 348.4 MiB/s 348.9 MiB/s

The 6 Watts processor in my Surface Go yields:
aes-xts 512b 1122,2 MiB/s 1123,7 MiB/s
--
GrÌße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.

The severity of the itch is inversely proportional to the reach.
Dale
2023-10-13 01:40:01 UTC
Permalink
Post by Frank Steinmetzger
Post by Michael
Why don't you test throughput without encryption to confirm your assumption?
What does `cryptsetup benchmark` say? I used to use a Celeron G1840 in my
NAS, which is Intel Haswell without AES_NI. It was able to do ~ 150 MB/s raw
encryption throughput when transferring to or from a LUKS’ed image in a
ramdisk, so almost 150 % of gigabit ethernet speed.
When I first set up the old 770T system, I did that.  It was faster with
no encryption on the 770T end but I did have encryption on my main rig's
end.  The difference was a pretty good bit.  Pretty much all my stuff is
encrypted.  Anyway, I was still using the old mount options and it was
still faster. 

I've never used that benchmark.  Didn't know it exists.  This is the
results.  Keep in mind, fireball is my main rig.  The FX-8350 thingy. 
The NAS is currently the old 770T system.  Sometimes it is a old Dell
Inspiron but not this time.  ;-)



***@fireball / # cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       878204 iterations per second for 256-bit key
PBKDF2-sha256     911805 iterations per second for 256-bit key
PBKDF2-sha512     698119 iterations per second for 256-bit key
PBKDF2-ripemd160  548418 iterations per second for 256-bit key
PBKDF2-whirlpool  299251 iterations per second for 256-bit key
argon2i       4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b        63.8 MiB/s        51.4 MiB/s
    serpent-cbc        128b        90.9 MiB/s       307.6 MiB/s
    twofish-cbc        128b       200.4 MiB/s       218.4 MiB/s
        aes-cbc        256b        54.6 MiB/s        37.5 MiB/s
    serpent-cbc        256b        90.4 MiB/s       302.6 MiB/s
    twofish-cbc        256b       198.2 MiB/s       216.7 MiB/s
        aes-xts        256b        68.0 MiB/s        45.0 MiB/s
    serpent-xts        256b       231.9 MiB/s       227.6 MiB/s
    twofish-xts        256b       191.8 MiB/s       163.1 MiB/s
        aes-xts        512b        42.4 MiB/s        18.9 MiB/s
    serpent-xts        512b       100.9 MiB/s       124.6 MiB/s
    twofish-xts        512b       154.8 MiB/s       173.3 MiB/s
***@fireball / #



***@nas:~# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       741567 iterations per second for 256-bit key
PBKDF2-sha256     910222 iterations per second for 256-bit key
PBKDF2-sha512     781353 iterations per second for 256-bit key
PBKDF2-ripemd160  547845 iterations per second for 256-bit key
PBKDF2-whirlpool  350929 iterations per second for 256-bit key
argon2i       4 iterations, 571787 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
argon2id      4 iterations, 524288 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b       130.6 MiB/s       128.0 MiB/s
    serpent-cbc        128b        64.7 MiB/s       161.8 MiB/s
    twofish-cbc        128b       175.4 MiB/s       218.8 MiB/s
        aes-cbc        256b       120.1 MiB/s       122.2 MiB/s
    serpent-cbc        256b        84.5 MiB/s       210.8 MiB/s
    twofish-cbc        256b       189.5 MiB/s       218.6 MiB/s
        aes-xts        256b       167.0 MiB/s       162.1 MiB/s
    serpent-xts        256b       173.9 MiB/s       204.5 MiB/s
    twofish-xts        256b       204.4 MiB/s       213.2 MiB/s
        aes-xts        512b       127.9 MiB/s       122.9 MiB/s
    serpent-xts        512b       201.5 MiB/s       204.7 MiB/s
    twofish-xts        512b       215.0 MiB/s       213.0 MiB/s
***@nas:~#



Is that about what you would expect?  Fireball is on a 970 mobo.  It's
slightly newer.  I think the 770T is about 2 years older, maybe 3. 
Post by Frank Steinmetzger
Post by Michael
Post by Neil Bothwick
If you're copying over the network, that will be the limiting factor.
Someone posted some extra options to mount with and add to exports
file.
Ah right, you use NFS. If not, I’d have suggested not to use rsync over ssh,
because that would indeed introduce a lot of encryption overhead.
I thought nfs was the proper way.  I use ssh and I use rsync,
separately.  Didn't know they can be used together tho. 
Post by Frank Steinmetzger
Post by Michael
I still think encryption is slowing it down some. As you say tho,
ethernet isn't helping which is why I may look into other options later,
faster ethernet or fiber if I can find something cheap enough.
There are a lot of hypotheses in your statements, but not much testing to
prove or disprove any of them.
Why don't you try to isolate the cause by testing one system element at a time
and see what results you get.
[
]
Unless you're running Pentium 4 or some other old CPU, it is almost certain
your CPU is capable of using AES-NI to offload to hardware some/all of the
encryption/decryption load - as long as you have the crypto module built in
your kernel.
The FX-8350 may be old, but it actually does have AES instructions.
~ LC_ALL=C cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 1323959 iterations per second for 256-bit key
PBKDF2-sha256 1724631 iterations per second for 256-bit key
PBKDF2-sha512 1137284 iterations per second for 256-bit key
PBKDF2-ripemd160 706587 iterations per second for 256-bit key
PBKDF2-whirlpool 510007 iterations per second for 256-bit key
argon2i 7 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 7 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 679.8 MiB/s 2787.0 MiB/s
serpent-cbc 128b 91.4 MiB/s 582.1 MiB/s
twofish-cbc 128b 194.9 MiB/s 368.3 MiB/s
aes-cbc 256b 502.3 MiB/s 2155.4 MiB/s
serpent-cbc 256b 90.3 MiB/s 582.5 MiB/s
twofish-cbc 256b 194.0 MiB/s 368.6 MiB/s
aes-xts 256b 2470.8 MiB/s 2478.7 MiB/s
serpent-xts 256b 537.4 MiB/s 526.1 MiB/s
twofish-xts 256b 347.3 MiB/s 347.3 MiB/s
aes-xts 512b 1932.6 MiB/s 1958.0 MiB/s
serpent-xts 512b 532.9 MiB/s 522.9 MiB/s
twofish-xts 512b 348.4 MiB/s 348.9 MiB/s
aes-xts 512b 1122,2 MiB/s 1123,7 MiB/s
-- GrÌße | Greetings | Salut | Qapla’ Please do not share anything
from, with or about me on any social network. The severity of the itch
is inversely proportional to the reach.
That may explain why I don't see as much load on my main rig then.  It
has the extra instructions.  I'm not sure if the 770T does or not.  It
has Ubuntu so I can't run the Gentoo CPU flag thingy.  So, I checked
/proc/cpuinfo and it doesn't show it on the 770T but my main rig
Fireball does.  So, it seems Fireball has it, older 770T NAS box does
not.  That could be a bottleneck.  Maybe. 

Eventually, I'll get this all sorted.  Fireball may become the NAS box
thingy.  New rig would be my main system.  Maybe.  Hard to say right
now.  There will be a new rig for my main system but not sure on rest.  o_O

One thing I did learn about LVM.  I hooked the drives I had on the old
Dell to the 770T and it saw the LVM drives setup right away.  I just
used cryptsetup as usual and off it went.  I've never did that before. 
Works just like a regular drive.  :-D  Nifty.

Dale

:-)  :-)
Michael
2023-10-13 07:50:01 UTC
Permalink
Post by Frank Steinmetzger
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 878204 iterations per second for 256-bit key
PBKDF2-sha256 911805 iterations per second for 256-bit key
PBKDF2-sha512 698119 iterations per second for 256-bit key
PBKDF2-ripemd160 548418 iterations per second for 256-bit key
PBKDF2-whirlpool 299251 iterations per second for 256-bit key
argon2i 4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 63.8 MiB/s 51.4 MiB/s
serpent-cbc 128b 90.9 MiB/s 307.6 MiB/s
twofish-cbc 128b 200.4 MiB/s 218.4 MiB/s
aes-cbc 256b 54.6 MiB/s 37.5 MiB/s
serpent-cbc 256b 90.4 MiB/s 302.6 MiB/s
twofish-cbc 256b 198.2 MiB/s 216.7 MiB/s
aes-xts 256b 68.0 MiB/s 45.0 MiB/s
serpent-xts 256b 231.9 MiB/s 227.6 MiB/s
twofish-xts 256b 191.8 MiB/s 163.1 MiB/s
aes-xts 512b 42.4 MiB/s 18.9 MiB/s
serpent-xts 512b 100.9 MiB/s 124.6 MiB/s
twofish-xts 512b 154.8 MiB/s 173.3 MiB/s
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 741567 iterations per second for 256-bit key
PBKDF2-sha256 910222 iterations per second for 256-bit key
PBKDF2-sha512 781353 iterations per second for 256-bit key
PBKDF2-ripemd160 547845 iterations per second for 256-bit key
PBKDF2-whirlpool 350929 iterations per second for 256-bit key
argon2i 4 iterations, 571787 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
argon2id 4 iterations, 524288 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 130.6 MiB/s 128.0 MiB/s
serpent-cbc 128b 64.7 MiB/s 161.8 MiB/s
twofish-cbc 128b 175.4 MiB/s 218.8 MiB/s
aes-cbc 256b 120.1 MiB/s 122.2 MiB/s
serpent-cbc 256b 84.5 MiB/s 210.8 MiB/s
twofish-cbc 256b 189.5 MiB/s 218.6 MiB/s
aes-xts 256b 167.0 MiB/s 162.1 MiB/s
serpent-xts 256b 173.9 MiB/s 204.5 MiB/s
twofish-xts 256b 204.4 MiB/s 213.2 MiB/s
aes-xts 512b 127.9 MiB/s 122.9 MiB/s
serpent-xts 512b 201.5 MiB/s 204.7 MiB/s
twofish-xts 512b 215.0 MiB/s 213.0 MiB/s
Is that about what you would expect? Fireball is on a 970 mobo. It's
slightly newer. I think the 770T is about 2 years older, maybe 3.
grep AES /usr/src/linux/.config

or,

zgrep AES /proc/config.gz

Or, grep your *current* kernel config wherever it is stored.
Dale
2023-10-13 17:10:01 UTC
Permalink
Post by Michael
Post by Frank Steinmetzger
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 878204 iterations per second for 256-bit key
PBKDF2-sha256 911805 iterations per second for 256-bit key
PBKDF2-sha512 698119 iterations per second for 256-bit key
PBKDF2-ripemd160 548418 iterations per second for 256-bit key
PBKDF2-whirlpool 299251 iterations per second for 256-bit key
argon2i 4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 63.8 MiB/s 51.4 MiB/s
serpent-cbc 128b 90.9 MiB/s 307.6 MiB/s
twofish-cbc 128b 200.4 MiB/s 218.4 MiB/s
aes-cbc 256b 54.6 MiB/s 37.5 MiB/s
serpent-cbc 256b 90.4 MiB/s 302.6 MiB/s
twofish-cbc 256b 198.2 MiB/s 216.7 MiB/s
aes-xts 256b 68.0 MiB/s 45.0 MiB/s
serpent-xts 256b 231.9 MiB/s 227.6 MiB/s
twofish-xts 256b 191.8 MiB/s 163.1 MiB/s
aes-xts 512b 42.4 MiB/s 18.9 MiB/s
serpent-xts 512b 100.9 MiB/s 124.6 MiB/s
twofish-xts 512b 154.8 MiB/s 173.3 MiB/s
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 741567 iterations per second for 256-bit key
PBKDF2-sha256 910222 iterations per second for 256-bit key
PBKDF2-sha512 781353 iterations per second for 256-bit key
PBKDF2-ripemd160 547845 iterations per second for 256-bit key
PBKDF2-whirlpool 350929 iterations per second for 256-bit key
argon2i 4 iterations, 571787 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
argon2id 4 iterations, 524288 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 130.6 MiB/s 128.0 MiB/s
serpent-cbc 128b 64.7 MiB/s 161.8 MiB/s
twofish-cbc 128b 175.4 MiB/s 218.8 MiB/s
aes-cbc 256b 120.1 MiB/s 122.2 MiB/s
serpent-cbc 256b 84.5 MiB/s 210.8 MiB/s
twofish-cbc 256b 189.5 MiB/s 218.6 MiB/s
aes-xts 256b 167.0 MiB/s 162.1 MiB/s
serpent-xts 256b 173.9 MiB/s 204.5 MiB/s
twofish-xts 256b 204.4 MiB/s 213.2 MiB/s
aes-xts 512b 127.9 MiB/s 122.9 MiB/s
serpent-xts 512b 201.5 MiB/s 204.7 MiB/s
twofish-xts 512b 215.0 MiB/s 213.0 MiB/s
Is that about what you would expect? Fireball is on a 970 mobo. It's
slightly newer. I think the 770T is about 2 years older, maybe 3.
grep AES /usr/src/linux/.config
or,
zgrep AES /proc/config.gz
Or, grep your *current* kernel config wherever it is stored.
I got the idea but assuming you wanted that info from the NAS box, I had
to dig a little.  It's Ubuntu.  It doesn't have kernel sources, no
config.gz in /proc either.  I found this.  I assume it is accurate. 
Hopefully. 


***@nas:~# cat /boot/config-5.15.0-86-generic | grep -i aes
CONFIG_SND_MAESTRO3=m
CONFIG_SND_MAESTRO3_INPUT=y
CONFIG_CRYPTO_AEGIS128_AESNI_SSE2=m
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_TI=m
CONFIG_CRYPTO_AES_NI_INTEL=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_SM4_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m
CONFIG_CRYPTO_LIB_AES=y
***@nas:~#


I don't usually use modules.  So, this is not something I run into
much.  I'm adding this info since I think it will help as well. 


***@nas:~# lsmod | grep -i aes
***@nas:~#


I see the main aes option is built in so it shouldn't be listed above if
I recall correctly.  The other two options are modules but not loaded. 
That said, I don't know if they are needed either.  On my main rig, I
have AES_TI built in.  Anyway, I thought I would include that in case it
helps. 

I was thinking about later on upgrading the CPU to a 6 core version.  I
may research and see if it includes the aes instruction set.  It may
help.  It may not.  Right now, I don't know if the 770T is even going to
be a NAS box and need encryption. 

It could be that given that mobo and CPUs age, it's doing the best it
can.  After all, the Dell box was also fairly slow. 

Dale

:-)  :-) 
Mark Knecht
2023-10-13 17:20:01 UTC
Permalink
Post by Frank Steinmetzger
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 878204 iterations per second for 256-bit key
PBKDF2-sha256 911805 iterations per second for 256-bit key
PBKDF2-sha512 698119 iterations per second for 256-bit key
PBKDF2-ripemd160 548418 iterations per second for 256-bit key
PBKDF2-whirlpool 299251 iterations per second for 256-bit key
argon2i 4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 63.8 MiB/s 51.4 MiB/s
serpent-cbc 128b 90.9 MiB/s 307.6 MiB/s
twofish-cbc 128b 200.4 MiB/s 218.4 MiB/s
aes-cbc 256b 54.6 MiB/s 37.5 MiB/s
serpent-cbc 256b 90.4 MiB/s 302.6 MiB/s
twofish-cbc 256b 198.2 MiB/s 216.7 MiB/s
aes-xts 256b 68.0 MiB/s 45.0 MiB/s
serpent-xts 256b 231.9 MiB/s 227.6 MiB/s
twofish-xts 256b 191.8 MiB/s 163.1 MiB/s
aes-xts 512b 42.4 MiB/s 18.9 MiB/s
serpent-xts 512b 100.9 MiB/s 124.6 MiB/s
twofish-xts 512b 154.8 MiB/s 173.3 MiB/s
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 741567 iterations per second for 256-bit key
PBKDF2-sha256 910222 iterations per second for 256-bit key
PBKDF2-sha512 781353 iterations per second for 256-bit key
PBKDF2-ripemd160 547845 iterations per second for 256-bit key
PBKDF2-whirlpool 350929 iterations per second for 256-bit key
argon2i 4 iterations, 571787 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
argon2id 4 iterations, 524288 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 130.6 MiB/s 128.0 MiB/s
serpent-cbc 128b 64.7 MiB/s 161.8 MiB/s
twofish-cbc 128b 175.4 MiB/s 218.8 MiB/s
aes-cbc 256b 120.1 MiB/s 122.2 MiB/s
serpent-cbc 256b 84.5 MiB/s 210.8 MiB/s
twofish-cbc 256b 189.5 MiB/s 218.6 MiB/s
aes-xts 256b 167.0 MiB/s 162.1 MiB/s
serpent-xts 256b 173.9 MiB/s 204.5 MiB/s
twofish-xts 256b 204.4 MiB/s 213.2 MiB/s
aes-xts 512b 127.9 MiB/s 122.9 MiB/s
serpent-xts 512b 201.5 MiB/s 204.7 MiB/s
twofish-xts 512b 215.0 MiB/s 213.0 MiB/s
Is that about what you would expect? Fireball is on a 970 mobo. It's
slightly newer. I think the 770T is about 2 years older, maybe 3.
THis was just for kicks because I think somewhere, this thread or some
other I think you mentioned a Ryzen 5900 and mine is a 5950, now about
18 months old:

***@science2:~$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 2212185 iterations per second for 256-bit key
PBKDF2-sha256 4161015 iterations per second for 256-bit key
PBKDF2-sha512 1798586 iterations per second for 256-bit key
PBKDF2-ripemd160 841553 iterations per second for 256-bit key
PBKDF2-whirlpool 675628 iterations per second for 256-bit key
argon2i 11 iterations, 1048576 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000
ms time)
argon2id 11 iterations, 1048576 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000
ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 1181.2 MiB/s 5132.1 MiB/s
serpent-cbc 128b 107.8 MiB/s 426.1 MiB/s
twofish-cbc 128b 221.1 MiB/s 418.1 MiB/s
aes-cbc 256b 890.1 MiB/s 4167.7 MiB/s
serpent-cbc 256b 116.0 MiB/s 428.3 MiB/s
twofish-cbc 256b 224.2 MiB/s 417.7 MiB/s
aes-xts 256b 4121.7 MiB/s 4115.7 MiB/s
serpent-xts 256b 385.9 MiB/s 401.6 MiB/s
twofish-xts 256b 394.5 MiB/s 405.0 MiB/s
aes-xts 512b 3480.2 MiB/s 3486.3 MiB/s
serpent-xts 512b 408.9 MiB/s 401.4 MiB/s
twofish-xts 512b 395.9 MiB/s 404.8 MiB/s
***@science2:~$
Dale
2023-10-13 17:40:01 UTC
Permalink
Post by Mark Knecht
Post by Frank Steinmetzger
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       878204 iterations per second for 256-bit key
PBKDF2-sha256     911805 iterations per second for 256-bit key
PBKDF2-sha512     698119 iterations per second for 256-bit key
PBKDF2-ripemd160  548418 iterations per second for 256-bit key
PBKDF2-whirlpool  299251 iterations per second for 256-bit key
argon2i       4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b        63.8 MiB/s        51.4 MiB/s
    serpent-cbc        128b        90.9 MiB/s       307.6 MiB/s
    twofish-cbc        128b       200.4 MiB/s       218.4 MiB/s
        aes-cbc        256b        54.6 MiB/s        37.5 MiB/s
    serpent-cbc        256b        90.4 MiB/s       302.6 MiB/s
    twofish-cbc        256b       198.2 MiB/s       216.7 MiB/s
        aes-xts        256b        68.0 MiB/s        45.0 MiB/s
    serpent-xts        256b       231.9 MiB/s       227.6 MiB/s
    twofish-xts        256b       191.8 MiB/s       163.1 MiB/s
        aes-xts        512b        42.4 MiB/s        18.9 MiB/s
    serpent-xts        512b       100.9 MiB/s       124.6 MiB/s
    twofish-xts        512b       154.8 MiB/s       173.3 MiB/s
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       741567 iterations per second for 256-bit key
PBKDF2-sha256     910222 iterations per second for 256-bit key
PBKDF2-sha512     781353 iterations per second for 256-bit key
PBKDF2-ripemd160  547845 iterations per second for 256-bit key
PBKDF2-whirlpool  350929 iterations per second for 256-bit key
argon2i       4 iterations, 571787 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
argon2id      4 iterations, 524288 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b       130.6 MiB/s       128.0 MiB/s
    serpent-cbc        128b        64.7 MiB/s       161.8 MiB/s
    twofish-cbc        128b       175.4 MiB/s       218.8 MiB/s
        aes-cbc        256b       120.1 MiB/s       122.2 MiB/s
    serpent-cbc        256b        84.5 MiB/s       210.8 MiB/s
    twofish-cbc        256b       189.5 MiB/s       218.6 MiB/s
        aes-xts        256b       167.0 MiB/s       162.1 MiB/s
    serpent-xts        256b       173.9 MiB/s       204.5 MiB/s
    twofish-xts        256b       204.4 MiB/s       213.2 MiB/s
        aes-xts        512b       127.9 MiB/s       122.9 MiB/s
    serpent-xts        512b       201.5 MiB/s       204.7 MiB/s
    twofish-xts        512b       215.0 MiB/s       213.0 MiB/s
Is that about what you would expect?  Fireball is on a 970 mobo.  It's
slightly newer.  I think the 770T is about 2 years older, maybe 3.
THis was just for kicks because I think somewhere, this thread or some
other I think you mentioned a Ryzen 5900 and mine is a 5950, now about
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      2212185 iterations per second for 256-bit key
PBKDF2-sha256    4161015 iterations per second for 256-bit key
PBKDF2-sha512    1798586 iterations per second for 256-bit key
PBKDF2-ripemd160  841553 iterations per second for 256-bit key
PBKDF2-whirlpool  675628 iterations per second for 256-bit key
argon2i      11 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000
ms time)
argon2id     11 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000
ms time)
#     Algorithm |       Key |      Encryption |      Decryption
       aes-cbc        128b      1181.2 MiB/s      5132.1 MiB/s
   serpent-cbc        128b       107.8 MiB/s       426.1 MiB/s
   twofish-cbc        128b       221.1 MiB/s       418.1 MiB/s
       aes-cbc        256b       890.1 MiB/s      4167.7 MiB/s
   serpent-cbc        256b       116.0 MiB/s       428.3 MiB/s
   twofish-cbc        256b       224.2 MiB/s       417.7 MiB/s
       aes-xts        256b      4121.7 MiB/s      4115.7 MiB/s
   serpent-xts        256b       385.9 MiB/s       401.6 MiB/s
   twofish-xts        256b       394.5 MiB/s       405.0 MiB/s
       aes-xts        512b      3480.2 MiB/s      3486.3 MiB/s
   serpent-xts        512b       408.9 MiB/s       401.4 MiB/s
   twofish-xts        512b       395.9 MiB/s       404.8 MiB/s
I'm planning on my new rig having the Ryzen 5900X.  Is the 5950 better? 
While I've kinda picked that one, I'm open to ideas if it is faster and
I can afford it.  As it is, I'm looking at between $300 and $350 for the
5900.  My last CPU cost a little over $100. 

While at it.  In the past, I always bought the mobo, CPU and memory from
the same place.  Generally if one of those is bad, it's sometimes hard
to know which one is bad.  Sometimes even the BIOS beep codes are no
help because there may be none.  If the mobo doesn't boot up, worst
case, send all three back to the same place.  Given how far things have
come, do I need to worry about a bad one out of the box anymore?  I can
save some money if I buy from different places. 

Dale

:-)  :-) 
Mark Knecht
2023-10-13 18:20:01 UTC
Permalink
I'm planning on my new rig having the Ryzen 5900X. Is the 5950 better?
While I've kinda picked that one, I'm open to ideas if it is faster and I
can afford it. As it is, I'm looking at between $300 and $350 for the
5900. My last CPU cost a little over $100.
I'm not going to say one is better than the other. The 5950X has more
cores, the 5900X runs at a higher speed. It depends on your workload which
will be better for you. I do a lot of things based around machine learning
where I felt I was better off having more cores - give 12 to the ML job,
keep 4 for my personal use. It's worked out well. However you don't ever
talk much about what you actually use your computers for other than having
250 disk drives and moving data around your network. Depending on how you
are moving data you might be better off with 5900X going faster.

You can use this site to get some comparative data:

https://cpu.userbenchmark.com/Compare/AMD-Ryzen-9-5950X-vs-AMD-Ryzen-9-5900X/4086vs4087

BTW - you probably know both of these CPUs have been superseded by the
7900X and 7950X. THere's also the 3D versions which have faster and larger
cache.
While at it. In the past, I always bought the mobo, CPU and memory from
the same place. Generally if one of those is bad, it's sometimes hard to
know which one is bad. Sometimes even the BIOS beep codes are no help
because there may be none. If the mobo doesn't boot up, worst case, send
all three back to the same place. Given how far things have come, do I
need to worry about a bad one out of the box anymore? I can save some
money if I buy from different places.

Cannot answer but you need a return policy from every vendor. If it doesn't
boot and you cannot figure it out you send everything back to multiple
vendors I guess.

Until recently I built all my machines myself. My 5900X machine has water
cooling and I had cash so I paid a local storefront here to build it. I
bought right in the middle of the pandemic and the chip shortage cost me
huge dollars. Most expensive machine I've ever owned. Probably could build
it today for less than 50% of what I paid.
Michael
2023-10-14 09:50:01 UTC
Permalink
Post by Dale
Post by Michael
Post by Frank Steinmetzger
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 878204 iterations per second for 256-bit key
PBKDF2-sha256 911805 iterations per second for 256-bit key
PBKDF2-sha512 698119 iterations per second for 256-bit key
PBKDF2-ripemd160 548418 iterations per second for 256-bit key
PBKDF2-whirlpool 299251 iterations per second for 256-bit key
argon2i 4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 1048576 memory, 4 parallel threads (CPUs)
for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 63.8 MiB/s 51.4 MiB/s
serpent-cbc 128b 90.9 MiB/s 307.6 MiB/s
twofish-cbc 128b 200.4 MiB/s 218.4 MiB/s
aes-cbc 256b 54.6 MiB/s 37.5 MiB/s
serpent-cbc 256b 90.4 MiB/s 302.6 MiB/s
twofish-cbc 256b 198.2 MiB/s 216.7 MiB/s
aes-xts 256b 68.0 MiB/s 45.0 MiB/s
serpent-xts 256b 231.9 MiB/s 227.6 MiB/s
twofish-xts 256b 191.8 MiB/s 163.1 MiB/s
aes-xts 512b 42.4 MiB/s 18.9 MiB/s
serpent-xts 512b 100.9 MiB/s 124.6 MiB/s
twofish-xts 512b 154.8 MiB/s 173.3 MiB/s
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 741567 iterations per second for 256-bit key
PBKDF2-sha256 910222 iterations per second for 256-bit key
PBKDF2-sha512 781353 iterations per second for 256-bit key
PBKDF2-ripemd160 547845 iterations per second for 256-bit key
PBKDF2-whirlpool 350929 iterations per second for 256-bit key
argon2i 4 iterations, 571787 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
argon2id 4 iterations, 524288 memory, 4 parallel threads (CPUs) for
256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 130.6 MiB/s 128.0 MiB/s
serpent-cbc 128b 64.7 MiB/s 161.8 MiB/s
twofish-cbc 128b 175.4 MiB/s 218.8 MiB/s
aes-cbc 256b 120.1 MiB/s 122.2 MiB/s
serpent-cbc 256b 84.5 MiB/s 210.8 MiB/s
twofish-cbc 256b 189.5 MiB/s 218.6 MiB/s
aes-xts 256b 167.0 MiB/s 162.1 MiB/s
serpent-xts 256b 173.9 MiB/s 204.5 MiB/s
twofish-xts 256b 204.4 MiB/s 213.2 MiB/s
aes-xts 512b 127.9 MiB/s 122.9 MiB/s
serpent-xts 512b 201.5 MiB/s 204.7 MiB/s
twofish-xts 512b 215.0 MiB/s 213.0 MiB/s
Is that about what you would expect? Fireball is on a 970 mobo. It's
slightly newer. I think the 770T is about 2 years older, maybe 3.
grep AES /usr/src/linux/.config
or,
zgrep AES /proc/config.gz
Or, grep your *current* kernel config wherever it is stored.
I got the idea but assuming you wanted that info from the NAS box, I had
to dig a little. It's Ubuntu. It doesn't have kernel sources, no
config.gz in /proc either. I found this. I assume it is accurate.
Hopefully.
CONFIG_SND_MAESTRO3=m
CONFIG_SND_MAESTRO3_INPUT=y
CONFIG_CRYPTO_AEGIS128_AESNI_SSE2=m
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_TI=m
CONFIG_CRYPTO_AES_NI_INTEL=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_SM4_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m
CONFIG_CRYPTO_LIB_AES=y
I don't usually use modules. So, this is not something I run into
much. I'm adding this info since I think it will help as well.
The module is not loaded, hence the pedestrian performance.

Check if the CPU has the AES instruction set - I expect it doesn't, or the
module(s) would have been loaded:

grep -m1 -o aes /proc/cpuinfo

Ah! Just read your other message, the Phenom II X4 955 does not have AES-NI:

https://www.cpu-monkey.com/en/cpu-amd_phenom_ii_x4_955

So cryptographic performance won't get any better in this box.
Frank Steinmetzger
2023-10-13 22:50:01 UTC
Permalink
Post by Dale
Post by Frank Steinmetzger
Post by Michael
Why don't you test throughput without encryption to confirm your assumption?
What does `cryptsetup benchmark` say? I used to use a Celeron G1840 in my
NAS, which is Intel Haswell without AES_NI. It was able to do ~ 150 MB/s raw
encryption throughput when transferring to or from a LUKS’ed image in a
ramdisk, so almost 150 % of gigabit ethernet speed.
[
]
I've never used that benchmark.  Didn't know it exists.  This is the
results.  Keep in mind, fireball is my main rig.  The FX-8350 thingy. 
The NAS is currently the old 770T system.  Sometimes it is a old Dell
Inspiron but not this time.  ;-)
[
]
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b        63.8 MiB/s        51.4 MiB/s
    serpent-cbc        128b        90.9 MiB/s       307.6 MiB/s
    twofish-cbc        128b       200.4 MiB/s       218.4 MiB/s
        aes-cbc        256b        54.6 MiB/s        37.5 MiB/s
    serpent-cbc        256b        90.4 MiB/s       302.6 MiB/s
    twofish-cbc        256b       198.2 MiB/s       216.7 MiB/s
        aes-xts        256b        68.0 MiB/s        45.0 MiB/s
    serpent-xts        256b       231.9 MiB/s       227.6 MiB/s
    twofish-xts        256b       191.8 MiB/s       163.1 MiB/s
        aes-xts        512b        42.4 MiB/s        18.9 MiB/s
    serpent-xts        512b       100.9 MiB/s       124.6 MiB/s
    twofish-xts        512b       154.8 MiB/s       173.3 MiB/s
Phew, this looks veeeery slow. As you can clearly see, this is not enough to
even saturate Gbit ethernet. Unfortunately, I don’t have any benchmark data
left over from the mentioned celeron.
(Perhaps that’s why the industry chose to implement AES in hardware, because
it was the slowest of the bunch.)

It looks like there is no hardware acceleration involved. But according to
https://en.wikipedia.org/wiki/List_of_AMD_FX_processors#Piledriver-based and
https://www.cpu-world.com/CPUs/Bulldozer/AMD-FX-Series%20FX-8350.html it has
the extension. I’d say something is amiss in your kernel.

Heck, even my ultra-low-end eeepc with its no-AES Atom processor N450 from
2009 is less than 50 % slower, and for aes-xts 512b it is actually faster!
And that was a snail even in its day. It is so low-end that its in-order
architecture is not vulnerable to spectre and meltdown. :D It just scrunched
several minutes on updating the GPG keyring of its arch linux installation.

eeePC # LC_ALL=C cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 228348 iterations per second for 256-bit key
PBKDF2-sha256 335222 iterations per second for 256-bit key
PBKDF2-sha512 253034 iterations per second for 256-bit key
PBKDF2-ripemd160 172690 iterations per second for 256-bit key
PBKDF2-whirlpool 94705 iterations per second for 256-bit key
argon2i 4 iterations, 71003 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 71506 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 31.0 MiB/s 33.6 MiB/s
serpent-cbc 128b 28.1 MiB/s 62.9 MiB/s
twofish-cbc 128b 28.6 MiB/s 31.0 MiB/s
aes-cbc 256b 24.0 MiB/s 25.6 MiB/s
serpent-cbc 256b 28.3 MiB/s 62.7 MiB/s
twofish-cbc 256b 28.6 MiB/s 31.0 MiB/s
aes-xts 256b 32.5 MiB/s 33.4 MiB/s
serpent-xts 256b 50.5 MiB/s 60.5 MiB/s
twofish-xts 256b 25.6 MiB/s 30.7 MiB/s
aes-xts 512b 25.0 MiB/s 25.6 MiB/s
serpent-xts 512b 60.2 MiB/s 60.4 MiB/s
twofish-xts 512b 30.2 MiB/s 30.7 MiB/s
Post by Dale
[
]
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b       130.6 MiB/s       128.0 MiB/s
    serpent-cbc        128b        64.7 MiB/s       161.8 MiB/s
    twofish-cbc        128b       175.4 MiB/s       218.8 MiB/s
        aes-cbc        256b       120.1 MiB/s       122.2 MiB/s
    serpent-cbc        256b        84.5 MiB/s       210.8 MiB/s
    twofish-cbc        256b       189.5 MiB/s       218.6 MiB/s
        aes-xts        256b       167.0 MiB/s       162.1 MiB/s
    serpent-xts        256b       173.9 MiB/s       204.5 MiB/s
    twofish-xts        256b       204.4 MiB/s       213.2 MiB/s
        aes-xts        512b       127.9 MiB/s       122.9 MiB/s
    serpent-xts        512b       201.5 MiB/s       204.7 MiB/s
    twofish-xts        512b       215.0 MiB/s       213.0 MiB/s
Interesting; AES is much better than the FX-8350, but others are worse.
There are many intricate factors, such as cache size, assembler
optimisations and such.
Post by Dale
Post by Frank Steinmetzger
Ah right, you use NFS. If not, I’d have suggested not to use rsync over ssh,
because that would indeed introduce a lot of encryption overhead.
I thought nfs was the proper way.  I use ssh and I use rsync,
separately.  Didn't know they can be used together tho. 
When you do `rsync -ai source host:/path/to/destination/`, you use ssh for
transport.
Post by Dale
Post by Frank Steinmetzger
Post by Michael
I still think encryption is slowing it down some. As you say tho,
ethernet isn't helping which is why I may look into other options later,
faster ethernet or fiber if I can find something cheap enough.
What do you mean with “ethernet is not helping”? As we could see above, your
AES throughput cannot keep up with Gbit.
Post by Dale
That may explain why I don't see as much load on my main rig then.  It
has the extra instructions.  I'm not sure if the 770T does or not.
The mobo should have no influence on crypto performance.
Post by Dale
  It
has Ubuntu so I can't run the Gentoo CPU flag thingy.  So, I checked
/proc/cpuinfo and it doesn't show it on the 770T but my main rig
Fireball does.  So, it seems Fireball has it, older 770T NAS box does
not.  That could be a bottleneck.  Maybe. 
But interestingly, the NAS box shows higher AES throughput than fireball,
probably through raw performance. (What processor does it have?)
--
GrÌße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.

You call this cappucino? It’s not even sprinkled with Parmesan!
Dale
2023-10-14 03:30:01 UTC
Permalink
Post by Frank Steinmetzger
Post by Dale
Post by Frank Steinmetzger
Post by Michael
Why don't you test throughput without encryption to confirm your assumption?
What does `cryptsetup benchmark` say? I used to use a Celeron G1840 in my
NAS, which is Intel Haswell without AES_NI. It was able to do ~ 150 MB/s raw
encryption throughput when transferring to or from a LUKS’ed image in a
ramdisk, so almost 150 % of gigabit ethernet speed.
[
]
I've never used that benchmark.  Didn't know it exists.  This is the
results.  Keep in mind, fireball is my main rig.  The FX-8350 thingy. 
The NAS is currently the old 770T system.  Sometimes it is a old Dell
Inspiron but not this time.  ;-)
[
]
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b        63.8 MiB/s        51.4 MiB/s
    serpent-cbc        128b        90.9 MiB/s       307.6 MiB/s
    twofish-cbc        128b       200.4 MiB/s       218.4 MiB/s
        aes-cbc        256b        54.6 MiB/s        37.5 MiB/s
    serpent-cbc        256b        90.4 MiB/s       302.6 MiB/s
    twofish-cbc        256b       198.2 MiB/s       216.7 MiB/s
        aes-xts        256b        68.0 MiB/s        45.0 MiB/s
    serpent-xts        256b       231.9 MiB/s       227.6 MiB/s
    twofish-xts        256b       191.8 MiB/s       163.1 MiB/s
        aes-xts        512b        42.4 MiB/s        18.9 MiB/s
    serpent-xts        512b       100.9 MiB/s       124.6 MiB/s
    twofish-xts        512b       154.8 MiB/s       173.3 MiB/s
Phew, this looks veeeery slow. As you can clearly see, this is not enough to
even saturate Gbit ethernet. Unfortunately, I don’t have any benchmark data
left over from the mentioned celeron.
(Perhaps that’s why the industry chose to implement AES in hardware, because
it was the slowest of the bunch.)
It looks like there is no hardware acceleration involved. But according to
https://en.wikipedia.org/wiki/List_of_AMD_FX_processors#Piledriver-based and
https://www.cpu-world.com/CPUs/Bulldozer/AMD-FX-Series%20FX-8350.html it has
the extension. I’d say something is amiss in your kernel.
Heck, even my ultra-low-end eeepc with its no-AES Atom processor N450 from
2009 is less than 50 % slower, and for aes-xts 512b it is actually faster!
And that was a snail even in its day. It is so low-end that its in-order
architecture is not vulnerable to spectre and meltdown. :D It just scrunched
several minutes on updating the GPG keyring of its arch linux installation.
eeePC # LC_ALL=C cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 228348 iterations per second for 256-bit key
PBKDF2-sha256 335222 iterations per second for 256-bit key
PBKDF2-sha512 253034 iterations per second for 256-bit key
PBKDF2-ripemd160 172690 iterations per second for 256-bit key
PBKDF2-whirlpool 94705 iterations per second for 256-bit key
argon2i 4 iterations, 71003 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 71506 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 31.0 MiB/s 33.6 MiB/s
serpent-cbc 128b 28.1 MiB/s 62.9 MiB/s
twofish-cbc 128b 28.6 MiB/s 31.0 MiB/s
aes-cbc 256b 24.0 MiB/s 25.6 MiB/s
serpent-cbc 256b 28.3 MiB/s 62.7 MiB/s
twofish-cbc 256b 28.6 MiB/s 31.0 MiB/s
aes-xts 256b 32.5 MiB/s 33.4 MiB/s
serpent-xts 256b 50.5 MiB/s 60.5 MiB/s
twofish-xts 256b 25.6 MiB/s 30.7 MiB/s
aes-xts 512b 25.0 MiB/s 25.6 MiB/s
serpent-xts 512b 60.2 MiB/s 60.4 MiB/s
twofish-xts 512b 30.2 MiB/s 30.7 MiB/s
Post by Dale
[
]
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b       130.6 MiB/s       128.0 MiB/s
    serpent-cbc        128b        64.7 MiB/s       161.8 MiB/s
    twofish-cbc        128b       175.4 MiB/s       218.8 MiB/s
        aes-cbc        256b       120.1 MiB/s       122.2 MiB/s
    serpent-cbc        256b        84.5 MiB/s       210.8 MiB/s
    twofish-cbc        256b       189.5 MiB/s       218.6 MiB/s
        aes-xts        256b       167.0 MiB/s       162.1 MiB/s
    serpent-xts        256b       173.9 MiB/s       204.5 MiB/s
    twofish-xts        256b       204.4 MiB/s       213.2 MiB/s
        aes-xts        512b       127.9 MiB/s       122.9 MiB/s
    serpent-xts        512b       201.5 MiB/s       204.7 MiB/s
    twofish-xts        512b       215.0 MiB/s       213.0 MiB/s
Interesting; AES is much better than the FX-8350, but others are worse.
There are many intricate factors, such as cache size, assembler
optimisations and such.
Post by Dale
Post by Frank Steinmetzger
Ah right, you use NFS. If not, I’d have suggested not to use rsync over ssh,
because that would indeed introduce a lot of encryption overhead.
I thought nfs was the proper way.  I use ssh and I use rsync,
separately.  Didn't know they can be used together tho. 
When you do `rsync -ai source host:/path/to/destination/`, you use ssh for
transport.
Well, I may be doing this all wrong.  First, I ssh into the NAS box.  I
do the decrypt stuff and mount the LV in it's proper place.  Then I
switch back to a Konsole tab for Fireball.  I mount the NAS box to a
mount point on Fireball with nfs thingy.  From there I use rsync to copy
from one point to the other.  I mostly use this command and options for
restore. 


rsync -uivr --progress  /mnt/1/ /mnt/2/


Sometimes that varies a bit depending on exactly what I am copying and
from where to where.  Example, when updating my backups, I include the
--delete option because if I delete a file, I almost always want it gone
on the backup too.  I also shortened the source and target.  That should
give you a good idea how wrong I'm doing this tho.  ROFL  :/ 

 
Post by Frank Steinmetzger
Post by Dale
Post by Frank Steinmetzger
Post by Michael
I still think encryption is slowing it down some. As you say tho,
ethernet isn't helping which is why I may look into other options later,
faster ethernet or fiber if I can find something cheap enough.
What do you mean with “ethernet is not helping”? As we could see above, your
AES throughput cannot keep up with Gbit.
Well, I was thinking the ethernet might be slowing things at times.  I'm
not sure on that tho.  I do know the CPU fan ramps up to a good speed
for a while then goes back to basically what it is when idle.  After a
short time, it speeds up again and repeats.  It has done this throughout
the whole restore process. 
Post by Frank Steinmetzger
Post by Dale
That may explain why I don't see as much load on my main rig then.  It
has the extra instructions.  I'm not sure if the 770T does or not.
The mobo should have no influence on crypto performance.
Post by Dale
  It
has Ubuntu so I can't run the Gentoo CPU flag thingy.  So, I checked
/proc/cpuinfo and it doesn't show it on the 770T but my main rig
Fireball does.  So, it seems Fireball has it, older 770T NAS box does
not.  That could be a bottleneck.  Maybe. 
But interestingly, the NAS box shows higher AES throughput than fireball,
probably through raw performance. (What processor does it have?)
That's interesting.  I thought that to but thought maybe I was reading
the results wrong.  It has a Phenom II X4 955.  Keep in mind, the NAS
box has Ubuntu on it.  It's not a kernel I built or configured.  If you
think it is missing something, it just may be. Building a new kernel
could get interesting tho.  May need a hammer.  o_O 

So to make sure I get this, you're saying the old 770T NAS box is
performing better on encryption than my slightly newer rig with aes
support on the CPU?  That would be interesting.  If so, that 770T may be
a dedicated NAS box thingy.  Once I get done building a new rig and all.

Just a FYI.  My restore from backup has finished.  To test anything, I
may have to get a bag of tricks.  I guess I could find a large file,
several GBs in size, and copy, delete, copy, delete etc to get some
results.  I'm about to connect some external hard drives to restore some
smaller directories, my smaller directories are still quite large tho. 
Those drives attach directly to my system, no ethernet.  I'm curious to
see if the data throughput behaves the same way.  I seem to recall in
another thread that it does.

Dale

:-)  :-) 

P. S.  I proofed this thing several times.  I think I got it all right. 
O_O
Michael
2023-10-14 12:30:01 UTC
Permalink
Post by Frank Steinmetzger
Post by Frank Steinmetzger
Post by Michael
Why don't you test throughput without encryption to confirm your
assumption?>>>
What does `cryptsetup benchmark` say? I used to use a Celeron G1840 in my
NAS, which is Intel Haswell without AES_NI. It was able to do ~ 150 MB/s
raw encryption throughput when transferring to or from a LUKS’ed image
in a ramdisk, so almost 150 % of gigabit ethernet speed.
[
]
I've never used that benchmark. Didn't know it exists. This is the
results. Keep in mind, fireball is my main rig. The FX-8350 thingy.
The NAS is currently the old 770T system. Sometimes it is a old Dell
Inspiron but not this time. ;-)
[
]
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 63.8 MiB/s 51.4 MiB/s
serpent-cbc 128b 90.9 MiB/s 307.6 MiB/s
twofish-cbc 128b 200.4 MiB/s 218.4 MiB/s
aes-cbc 256b 54.6 MiB/s 37.5 MiB/s
serpent-cbc 256b 90.4 MiB/s 302.6 MiB/s
twofish-cbc 256b 198.2 MiB/s 216.7 MiB/s
aes-xts 256b 68.0 MiB/s 45.0 MiB/s
serpent-xts 256b 231.9 MiB/s 227.6 MiB/s
twofish-xts 256b 191.8 MiB/s 163.1 MiB/s
aes-xts 512b 42.4 MiB/s 18.9 MiB/s
serpent-xts 512b 100.9 MiB/s 124.6 MiB/s
twofish-xts 512b 154.8 MiB/s 173.3 MiB/s
Phew, this looks veeeery slow. As you can clearly see, this is not enough
to even saturate Gbit ethernet. Unfortunately, I don’t have any benchmark
data left over from the mentioned celeron.
(Perhaps that’s why the industry chose to implement AES in hardware,
because it was the slowest of the bunch.)
It looks like there is no hardware acceleration involved. But according to
https://en.wikipedia.org/wiki/List_of_AMD_FX_processors#Piledriver-based
and https://www.cpu-world.com/CPUs/Bulldozer/AMD-FX-Series%20FX-8350.html
it has the extension. I’d say something is amiss in your kernel.
Yes, I also think AES_NI has not been enabled in Dale's kernel config.

I just ran 'cryptsetup benchmark' on an A10-7850K APU (Kaveri Steamroller core
as opposed to the 2 year older FX-8350 Vishera Piledriver core) and aes-xts
fares much better;

# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 1028015 iterations per second for 256-bit key
PBKDF2-sha256 1464491 iterations per second for 256-bit key
PBKDF2-sha512 1123875 iterations per second for 256-bit key
PBKDF2-ripemd160 708497 iterations per second for 256-bit key
PBKDF2-whirlpool 389515 iterations per second for 256-bit key
argon2i 5 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-
bit key (requested 2000 ms time)
argon2id 5 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-
bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 586.6 MiB/s 2169.8 MiB/s
serpent-cbc 128b 88.6 MiB/s 330.6 MiB/s
twofish-cbc 128b 203.1 MiB/s 277.6 MiB/s
aes-cbc 256b 443.0 MiB/s 1712.5 MiB/s
serpent-cbc 256b 89.7 MiB/s 329.8 MiB/s
twofish-cbc 256b 204.0 MiB/s 277.3 MiB/s
aes-xts 256b 1840.9 MiB/s 1857.6 MiB/s <==
serpent-xts 256b 288.4 MiB/s 299.6 MiB/s
twofish-xts 256b 240.6 MiB/s 252.9 MiB/s
aes-xts 512b 1459.3 MiB/s 1474.2 MiB/s <==
serpent-xts 512b 291.6 MiB/s 299.0 MiB/s
twofish-xts 512b 242.8 MiB/s 252.7 MiB/s


Whether 256 or 512 bit aes-xts performance would fill up a 1Gbps pipe.
Without AES_NI the performance on this CPU is ~10 times slower. I expect the
FX-8350 would produce comparable results once the kernel crypto options are
sorted.
Post by Frank Steinmetzger
Post by Frank Steinmetzger
Ah right, you use NFS. If not, I’d have suggested not to use rsync over
ssh, because that would indeed introduce a lot of encryption overhead.
I thought nfs was the proper way. I use ssh and I use rsync,
separately. Didn't know they can be used together tho.
When you do `rsync -ai source host:/path/to/destination/`, you use ssh for
transport.
Well, I may be doing this all wrong. First, I ssh into the NAS box. I
do the decrypt stuff and mount the LV in it's proper place. Then I
switch back to a Konsole tab for Fireball. I mount the NAS box to a
mount point on Fireball with nfs thingy. From there I use rsync to copy
from one point to the other. I mostly use this command and options for
restore.
rsync -uivr --progress /mnt/1/ /mnt/2/
Sometimes that varies a bit depending on exactly what I am copying and
from where to where. Example, when updating my backups, I include the
--delete option because if I delete a file, I almost always want it gone
on the backup too. I also shortened the source and target. That should
give you a good idea how wrong I'm doing this tho. ROFL :/
Perhaps you have configured the rsync options to suit your backup needs, but
why have you chosen '-u'? Do you expect to have files in your NAS which are
*newer* than your Fireball fs?

Wouldn't '-a' be more appropriate? You could add '-A' and 'X' to include any
ACLs and extended attributes, '-H' to copy hard links rather than making
separate copies.
Post by Frank Steinmetzger
Post by Frank Steinmetzger
Post by Michael
I still think encryption is slowing it down some. As you say tho,
ethernet isn't helping which is why I may look into other options later,
faster ethernet or fiber if I can find something cheap enough.
What do you mean with “ethernet is not helping”? As we could see above,
your AES throughput cannot keep up with Gbit.
Well, I was thinking the ethernet might be slowing things at times. I'm
not sure on that tho. I do know the CPU fan ramps up to a good speed
for a while then goes back to basically what it is when idle. After a
short time, it speeds up again and repeats. It has done this throughout
the whole restore process.
Your original hunch was correct - you need to enable hardware acceleration for
crypto in your kernel. The 1Gbps network link is not saturated as things
stand. The speed up and down of your CPU could be caused by thermal
hysteresis on the fan control circuit, or because it is waiting for the
receiving end to process the data already sent and buffered awaiting to be
written to disk.
Post by Frank Steinmetzger
That may explain why I don't see as much load on my main rig then. It
has the extra instructions. I'm not sure if the 770T does or not.
With hardware acceleration the A10-7850K APU shows between 10-25% CPU load in
GkrellM during the cryptsetup benchmark.
Post by Frank Steinmetzger
The mobo should have no influence on crypto performance.
It
has Ubuntu so I can't run the Gentoo CPU flag thingy. So, I checked
/proc/cpuinfo and it doesn't show it on the 770T but my main rig
Fireball does. So, it seems Fireball has it, older 770T NAS box does
not. That could be a bottleneck. Maybe.
But interestingly, the NAS box shows higher AES throughput than fireball,
probably through raw performance. (What processor does it have?)
That's interesting. I thought that to but thought maybe I was reading
the results wrong. It has a Phenom II X4 955. Keep in mind, the NAS
box has Ubuntu on it. It's not a kernel I built or configured. If you
think it is missing something, it just may be. Building a new kernel
could get interesting tho. May need a hammer. o_O
So to make sure I get this, you're saying the old 770T NAS box is
performing better on encryption than my slightly newer rig with aes
support on the CPU? That would be interesting. If so, that 770T may be
a dedicated NAS box thingy. Once I get done building a new rig and all.
If you build a new box, you can retire the Phenom and use the FX-8350 box as a
NAS server for your backups, *after* you have configured encryption in its
kernel.
Just a FYI. My restore from backup has finished. To test anything, I
may have to get a bag of tricks. I guess I could find a large file,
several GBs in size, and copy, delete, copy, delete etc to get some
results. I'm about to connect some external hard drives to restore some
smaller directories, my smaller directories are still quite large tho.
Those drives attach directly to my system, no ethernet. I'm curious to
see if the data throughput behaves the same way. I seem to recall in
another thread that it does.
In the first instance fix your kernel and reboot before you test anything
else. You should see a considerable improvement, as far as the receiving end
allows.

Loading...