Discussion:
Spamd child states?
John Horne
2006-09-05 23:19:37 UTC
Permalink
Hello,

Running SA 3.1.4, I see messages such as these in the log file:

Sep 6 00:05:21 tracy spamd[1710]: prefork: child states: KKKKKKI
From the code the various letters seem to indicate killed, initialised,
busy etc. My question though is are these just informational type
messages? Are they something I need to take note of or monitor?


Thanks,

John.
--
---------------------------------------------------------------
John Horne, University of Plymouth, UK Tel: +44 (0)1752 233914
E-mail: ***@plymouth.ac.uk Fax: +44 (0)1752 233839
Bowie Bailey
2006-09-06 13:35:15 UTC
Permalink
Post by John Horne
Hello,
Sep 6 00:05:21 tracy spamd[1710]: prefork: child states: KKKKKKI
From the code the various letters seem to indicate killed,
initialised,
busy etc. My question though is are these just informational type
messages? Are they something I need to take note of or monitor?
Not unless they are causing a problem. These are just status messages
so you can see how the child processes are being used.

One thing I note is that you have set your max-children to at least 7.
With this many children, keep a close eye on your memory usage. You
should generally allow about 50M of ram for each child. With 7
children, you will need 350M of memory just for SA. With other stuff
running, this can easily be too much for a 1GB server. If you start
running into performance problems, you might want to take a closer
look at this.
--
Bowie
John Horne
2006-09-06 15:03:00 UTC
Permalink
Post by Bowie Bailey
Post by John Horne
Hello,
Sep 6 00:05:21 tracy spamd[1710]: prefork: child states: KKKKKKI
From the code the various letters seem to indicate killed,
initialised,
busy etc. My question though is are these just informational type
messages? Are they something I need to take note of or monitor?
Not unless they are causing a problem. These are just status messages
so you can see how the child processes are being used.
One thing I note is that you have set your max-children to at least 7.
Yes, it is set to 8. The above message was taken soon after an SA
restart, so perhaps only 7 had started at that time.
Post by Bowie Bailey
With this many children, keep a close eye on your memory usage. You
should generally allow about 50M of ram for each child. With 7
children, you will need 350M of memory just for SA. With other stuff
running, this can easily be too much for a 1GB server. If you start
running into performance problems, you might want to take a closer
look at this.
The server has 2GB of ram. It runs an MTA and SA, but does not do virus
checking. I only installed 3.1.4 yesterday, so it is a little early to
say if there are problems. However, I am seeing in the logs messages
like these:

Sep 6 15:58:11 tracy spamd[13052]: prefork: server reached
--max-children setting, consider raising it

SA seems to indicate that the value should be raised rather than
lowered.



John.
--
---------------------------------------------------------------
John Horne, University of Plymouth, UK Tel: +44 (0)1752 233914
E-mail: ***@plymouth.ac.uk Fax: +44 (0)1752 233839
John Horne
2006-09-06 15:09:42 UTC
Permalink
Post by John Horne
The server has 2GB of ram. It runs an MTA and SA, but does not do virus
checking. I only installed 3.1.4 yesterday, so it is a little early to
say if there are problems. However, I am seeing in the logs messages
Sep 6 15:58:11 tracy spamd[13052]: prefork: server reached
--max-children setting, consider raising it
SA seems to indicate that the value should be raised rather than
lowered.
Actually these messages seem to be a bit confusing. The log shows:

Sep 6 16:05:39 tracy spamd[13052]: prefork: child states: KKKKKKBB
Sep 6 16:05:39 tracy spamd[13052]: prefork: server reached
--max-children setting, consider raising it

Which seems to indicate that 2 children are busy but 6 have been killed.
Why should I then raise the value? Surely it should just restart one of
the killed children? If it said 'BBBBBBBB' then I would understand it
suggesting to raise the value.



John.
--
---------------------------------------------------------------
John Horne, University of Plymouth, UK Tel: +44 (0)1752 233914
E-mail: ***@plymouth.ac.uk Fax: +44 (0)1752 233839
Bowie Bailey
2006-09-06 15:35:13 UTC
Permalink
Post by John Horne
Post by John Horne
The server has 2GB of ram. It runs an MTA and SA, but does not do
virus checking. I only installed 3.1.4 yesterday, so it is a little
early to say if there are problems. However, I am seeing in the
Sep 6 15:58:11 tracy spamd[13052]: prefork: server reached
--max-children setting, consider raising it
SA seems to indicate that the value should be raised rather than
lowered.
SA will always suggest that you increase the setting whenever you
reach the limit. It is up to you to take into account your memory
usage and determine if it is a good idea. SA will slow things down a
bit if it doesn't have enough children, but it will bring things to a
screeching halt if the children get swapped out of memory.
Post by John Horne
Sep 6 16:05:39 tracy spamd[13052]: prefork: child states: KKKKKKBB
Sep 6 16:05:39 tracy spamd[13052]: prefork: server reached
--max-children setting, consider raising it
Which seems to indicate that 2 children are busy but 6 have been
killed. Why should I then raise the value? Surely it should just
restart one of the killed children? If it said 'BBBBBBBB' then I
would understand it suggesting to raise the value.
If that many children exist, they must have been busy at one time.
Try doing 'grep prefork /var/log/maillog' to get an idea of how the
children have been operating.
--
Bowie
Theo Van Dinter
2006-09-06 15:38:00 UTC
Permalink
Post by John Horne
Sep 6 16:05:39 tracy spamd[13052]: prefork: child states: KKKKKKBB
Sep 6 16:05:39 tracy spamd[13052]: prefork: server reached
--max-children setting, consider raising it
My understanding (I haven't really looked at that code) is that "K" means the
child has been killed but it hasn't exited yet. If a child is in that state
for more than, say, 5 seconds, there's likely an issue where it doesn't
actually die off, imo.

You should generally see states of I or B.
--
Randomly Generated Tagline:
"First they ignore you, then they laugh at you, then they fight you,
then you win." - Gandhi
John Horne
2006-09-06 16:17:22 UTC
Permalink
Post by Theo Van Dinter
My understanding (I haven't really looked at that code) is that "K" means the
child has been killed but it hasn't exited yet. If a child is in that state
for more than, say, 5 seconds, there's likely an issue where it doesn't
actually die off, imo.
You should generally see states of I or B.
I get the feeling that something is wrong here. I have restarted SA, and
grepped the log file. It shows:

=======================================================================
prefork: child states: BI
prefork: child states: BB
prefork: child states: BBB
prefork: child states: BBBB
prefork: child states: BBBBS
prefork: child states: BBBBII
prefork: child states: IBBBII
prefork: child states: IIBBIK
prefork: child states: IIIBKK
prefork: child states: IIKIKK
prefork: child states: IBKKKK
prefork: child states: IIKKKK
prefork: child states: BBKKKK
prefork: child states: BBKKKKB
prefork: child states: BBKKKKBB
prefork: server reached --max-children setting, consider raising it
prefork: child states: BIKKKKBB
prefork: child states: IBKKKKBB
prefork: child states: IBKKKKIB
prefork: child states: IIKKKKIB
prefork: child states: BIKKKKKI
prefork: child states: IBKKKKKB
prefork: child states: BBKKKKKI
prefork: child states: BIKKKKKI
prefork: child states: IIKKKKKI
prefork: child states: IBKKKKKK
prefork: child states: IIKKKKKK
prefork: child states: BBKKKKKK
prefork: server reached --max-children setting, consider raising it
prefork: child states: BBKKKKKK
prefork: server reached --max-children setting, consider raising it
prefork: child states: IBKKKKKK
prefork: child states: BIKKKKKK
prefork: child states: IIKKKKKK
=======================================================================

Some of the processes seem to almost immediately go in to the 'killed'
state and stay there. 'ps auxww' shows that all 8 child processes are
started. Running an strace (this is a Fedora Core 4 server) on some of
the processes seems to show that they are waiting on select, and then
get a 'resources unavailable' error. What resource I have no idea. E.g:

=======================================================================
strace -Ff -p 12805
Process 12805 attached - interrupt to quit
select(16, [10], NULL, NULL, {290, 888000}) = 1 (in [10], left {147,
820000})
read(10, "P....\n", 6) = 6
read(10, 0xb4515f0, 6) = -1 EAGAIN (Resource
temporarily unavailable)
time(NULL) = 1157559274
select(16, [10], NULL, NULL, {300, 0}
=======================================================================

The process just sits there in this loop of some sort, and never seems
to do any actual spam processing.

Any ideas about this?


Thanks,

John.
--
---------------------------------------------------------------
John Horne, University of Plymouth, UK Tel: +44 (0)1752 233914
E-mail: ***@plymouth.ac.uk Fax: +44 (0)1752 233839
Justin Mason
2006-09-06 16:35:17 UTC
Permalink
Post by John Horne
Post by Theo Van Dinter
My understanding (I haven't really looked at that code) is that "K" means the
child has been killed but it hasn't exited yet. If a child is in that state
for more than, say, 5 seconds, there's likely an issue where it doesn't
actually die off, imo.
You should generally see states of I or B.
I get the feeling that something is wrong here. I have restarted SA, and
=======================================================================
prefork: child states: BI
prefork: child states: BB
prefork: child states: BBB
prefork: child states: BBBB
prefork: child states: BBBBS
prefork: child states: BBBBII
prefork: child states: IBBBII
prefork: child states: IIBBIK
prefork: child states: IIIBKK
prefork: child states: IIKIKK
prefork: child states: IBKKKK
prefork: child states: IIKKKK
prefork: child states: BBKKKK
prefork: child states: BBKKKKB
prefork: child states: BBKKKKBB
prefork: server reached --max-children setting, consider raising it
prefork: child states: BIKKKKBB
prefork: child states: IBKKKKBB
prefork: child states: IBKKKKIB
prefork: child states: IIKKKKIB
prefork: child states: BIKKKKKI
prefork: child states: IBKKKKKB
prefork: child states: BBKKKKKI
prefork: child states: BIKKKKKI
prefork: child states: IIKKKKKI
prefork: child states: IBKKKKKK
prefork: child states: IIKKKKKK
prefork: child states: BBKKKKKK
prefork: server reached --max-children setting, consider raising it
prefork: child states: BBKKKKKK
prefork: server reached --max-children setting, consider raising it
prefork: child states: IBKKKKKK
prefork: child states: BIKKKKKK
prefork: child states: IIKKKKKK
=======================================================================
Some of the processes seem to almost immediately go in to the 'killed'
state and stay there. 'ps auxww' shows that all 8 child processes are
started. Running an strace (this is a Fedora Core 4 server) on some of
the processes seems to show that they are waiting on select, and then
=======================================================================
strace -Ff -p 12805
Process 12805 attached - interrupt to quit
select(16, [10], NULL, NULL, {290, 888000}) = 1 (in [10], left {147,
820000})
read(10, "P....\n", 6) = 6
read(10, 0xb4515f0, 6) = -1 EAGAIN (Resource
temporarily unavailable)
time(NULL) = 1157559274
select(16, [10], NULL, NULL, {300, 0}
=======================================================================
The process just sits there in this loop of some sort, and never seems
to do any actual spam processing.
Any ideas about this?
That looks bad :( The strace snippet, however, is pretty normal-looking.

First off, are you using an up-to-date 3.1.x release?

Secondly, you need to strace both the child *and* the parent spamd process
-- the easiest way to do this is to "strace -f" the parent spamd, then
kill -15 the kids so it starts new (traced) ones.

--j.
John Horne
2006-09-06 16:54:08 UTC
Permalink
Post by Justin Mason
That looks bad :( The strace snippet, however, is pretty normal-looking.
First off, are you using an up-to-date 3.1.x release?
Yes, version 3.1.4.
Post by Justin Mason
Secondly, you need to strace both the child *and* the parent spamd process
-- the easiest way to do this is to "strace -f" the parent spamd, then
kill -15 the kids so it starts new (traced) ones.
Okay, I did that. It ran for a few minutes and produced a 10MB file.
What is odd is that while strace was running the log file shows for the
child states:

===============================================================
prefork: child states: BB
prefork: child states: BS
prefork: child states: BBS
prefork: child states: BBBS
prefork: child states: BBBBS
prefork: child states: BBBBBS
prefork: child states: BBBBBBS
prefork: child states: BBBBBBBS
prefork: server reached --max-children setting, consider raising it
prefork: child states: BBBBBBBB
prefork: server reached --max-children setting, consider raising it
prefork: child states: BBBBBBBB
prefork: server reached --max-children setting, consider raising it
prefork: child states: BBBBBBBB
prefork: server reached --max-children setting, consider raising it
prefork: child states: BBBBBBBB
===============================================================

This then goes on for quite a bit, but the child state remains at
'BBBBBBBB'.

Now that I have stopped strace, the state is:

===============================================================
prefork: child states: BIKKKKKK
prefork: child states: IIKKKKKK
prefork: child states: IIKKKKKK
prefork: child states: IIKKKKKK
prefork: child states: IBKKKKKK
===============================================================


John.
--
---------------------------------------------------------------
John Horne, University of Plymouth, UK Tel: +44 (0)1752 233914
E-mail: ***@plymouth.ac.uk Fax: +44 (0)1752 233839
Justin Mason
2006-09-06 17:06:14 UTC
Permalink
Post by John Horne
Post by Justin Mason
That looks bad :( The strace snippet, however, is pretty normal-looking.
First off, are you using an up-to-date 3.1.x release?
Yes, version 3.1.4.
Post by Justin Mason
Secondly, you need to strace both the child *and* the parent spamd process
-- the easiest way to do this is to "strace -f" the parent spamd, then
kill -15 the kids so it starts new (traced) ones.
Okay, I did that. It ran for a few minutes and produced a 10MB file.
What is odd is that while strace was running the log file shows for the
===============================================================
prefork: child states: BB
prefork: child states: BS
prefork: child states: BBS
prefork: child states: BBBS
prefork: child states: BBBBS
prefork: child states: BBBBBS
prefork: child states: BBBBBBS
prefork: child states: BBBBBBBS
prefork: server reached --max-children setting, consider raising it
prefork: child states: BBBBBBBB
prefork: server reached --max-children setting, consider raising it
prefork: child states: BBBBBBBB
prefork: server reached --max-children setting, consider raising it
prefork: child states: BBBBBBBB
prefork: server reached --max-children setting, consider raising it
prefork: child states: BBBBBBBB
===============================================================
This then goes on for quite a bit, but the child state remains at
'BBBBBBBB'.
===============================================================
prefork: child states: BIKKKKKK
prefork: child states: IIKKKKKK
prefork: child states: IIKKKKKK
prefork: child states: IIKKKKKK
prefork: child states: IBKKKKKK
===============================================================
Definitely very screwy. Can I see the log?

--j.
John Horne
2006-09-07 14:09:22 UTC
Permalink
Post by John Horne
I get the feeling that something is wrong here. I have restarted SA, and
=======================================================================
prefork: child states: BI
prefork: child states: BB
prefork: child states: BBB
prefork: child states: BBBB
prefork: child states: BBBBS
prefork: child states: BBBBII
prefork: child states: IBBBII
prefork: child states: IIBBIK
prefork: child states: IIIBKK
prefork: child states: IIKIKK
prefork: child states: IBKKKK
prefork: child states: IIKKKK
prefork: child states: BBKKKK
prefork: child states: BBKKKKB
[snipped]

I investigated this further last night when our server was less busy.
Below is the message I sent to Justin Mason explaining what I think is
happening. The problem lies with SElinux. Under FC4 I cannot see
anything I can turn on/off in selinux to resolve this, so we will need
to run the server with selinux disabled. I suspect selinux needs a
little tweak to allow both SA and selinux to run.
Post by John Horne
Hello,
I noticed that always the first 2 child processes started remained
working okay. I assume that these 2 were related to the --min-children
and --min-spare options. All the children options, except
--max-children, are default in our configuration. However, any
subsequent child process started falls in to the 'K' state and seems
to remain there.
Our servers are quieter at this time of night (midnight!), so I
straced the master process after killing all the children again. The
============================================================
Sep 7 00:20:42 tracy spamd[1666]: spamd: server successfully spawned
child process, pid 16267
Sep 7 00:20:42 tracy spamd[1666]: spamd: server successfully spawned
child process, pid 16268
Sep 7 00:21:36 tracy spamd[1666]: spamd: server successfully spawned
child process, pid 16341
============================================================
The attached log shows, for pid 16341, that the kill call gives an
error - Operation not permitted. This explains why the child is not
killed, but not as to why the op is not permitted.
The server is running Fedora Core 4 Linux, and has SElinux enabled. I
temporarily disabled selinux, and that seems to have resolved the
[pid 1666] kill(19990, SIGINT) = 0
===============================================================
Sep 7 00:46:07 tracy spamd[1666]: prefork: child states: BB
Sep 7 00:46:07 tracy spamd[1666]: prefork: child states: BBI
Sep 7 00:46:09 tracy spamd[1666]: prefork: child states: IBI
Sep 7 00:46:09 tracy spamd[1666]: prefork: child states: III
Sep 7 00:46:09 tracy spamd[1666]: prefork: child states: II
===============================================================
As can be seen the new children process is successfully killed off.
So I guess now I need to see what it actually is in selinux that is
stopping the master process from killing of its child processes. That
can wait till tomorrow.
John.
--
---------------------------------------------------------------
John Horne, University of Plymouth, UK Tel: +44 (0)1752 233914
E-mail: ***@plymouth.ac.uk Fax: +44 (0)1752 233839
Loading...