On Mon, Sep 18, 2023 at 3:44 PM Peter Humphrey
It may be less complex than you think, Jack. I envisage a
package being
marked
as solitary, and when portage reaches that package, it waits
until all
current
jobs have finished, then it starts the solitary package with the
environment
specified for it, and it doesn't start the next one until that
one has
finished.
The dependency calculation shouldn't need to be changed.
It seems simple the way I see it.
How does that improve emerge performance overall?
By allocating all the system resources to huge packages while not flooding the
system with lesser ones. For example, I can set -j20 for webkit-gtk today
without overflowing the 64GB RAM, and still have 4 CPU threads available to
other tasks. The change I've proposed should make the whole operation more
efficient overall and take less time.
As things stand today, I have to make do with -j12 or so, wasting time and
resources. I have load-average set at 32, so if I were to set -j20 generally
I'd run out of RAM in no time. I've had many instances of packages failing to
compile in a large update, but going just fine on their own; and I've had
mysterious operational errors resulting, I suspect, from otherwise undetected
miscompilation.
Previous threads have more detail of what I've tried already.
I did read all those but no matter how you move things around you still
have only X resources available all the time.
Whether you just let emerge do it's thing or try get it to do big
packages on their own, everything is still going to use the same number
of cpu cycles overall and you will save nothing.
Except a big chunk off your power bill ... a system under stress uses
more energy for the same amount of work.
What you have is not a portage problem. It is a orthodox parallelism
problem, and I think you are thinking your constraint is unique in the
work - it isn't.
With parallelism, trying to fiddle single nodes to improve things
overall never really works out.
A big problem you are missing is that portage does not have control of
the system. It can control its usage of the system, but if I want emerge
to use as much SPARE resource IN THE BACKGROUND as it can without
impacting on on-line responsiveness, that is HARD.
I would like to be able to tell portage "these programs are resource
hogs, don't parallelise them". If portage has loads of little jobs, it
can fire them off one after the other as resource becomes available. If
it fires a hog (or worse, two) off at the same time, the system can
rapidly collapse under load.
Even better, if portage knew roughly how much resource each job
required, it could (within constraints) start with the jobs that
required least resource and run loads of them, and by firing jobs off in
order of increasing demandingness, the number of jobs running in
parallel would naturally tail off.
At the end of the day, if the computer takes an extra 20% time, I'm not
bothered. If I'm sat at the computer 20% time extra because the system
isn't responding because emerge has bogged it down, then I do care. And
when I'm building things like webkit-gtk, llvm, LO, FF and TB, they do
hammer my system. If they're running in parallel, my system would be
near unusable.
Cheers,
Wol