Subject: Re: hz parameter
To: HARAWAT.IN.ORACLE.COM <HARAWAT.IN.ORACLE.COM.ofcmail@in.oracle.com>
From: Jim Reid <jim.reid@eurocontrol.be>
List: tech-kern
Date: 10/04/1996 10:48:34
>>>>> "Harish" == "HARAWAT IN ORACLE COM" <HARAWAT.IN.ORACLE.COM.ofcmail@in.oracle.com> writes:

    Harish> Hi , I have a problem with hz parameter, I want to know
    Harish> what exactly is this parameter used for, as far as I
    Harish> underestand this is meant for specifying the clock
    Harish> frequency of the clock. Iam just playing around with
    Harish> netbsd, one of the machines I have has a frequency of 66
    Harish> MHz and another one has a frequency of 133 MHz. In the 133
    Harish> MHz machine some of the nfs requests are timing out,
    Harish> whereas in 66 MHz machine that problem is not there. If I
    Harish> reduce the value of hz parameter in the 66 Mhz machine the
    Harish> nfs requests starts timing out, whereas if I increase the
    Harish> hz value of my 133 MHz machine the problem still
    Harish> continues.  Do any one of you suggest a possible solution/
    Harish> reason for this.

The HZ parameter has nothing to do with the clock speed of the CPU. It
defines the rate at which the system's real time clock interrupts:
usually 100 times a second. This real time clock is used by the kernel
to manage time: maintain the time of day, process scheduling, network
protocol timeouts and retransmissions, time-related activities such as
profiling, SIGALARM and so on.

NFS timeouts have got a number of possible causes. It's highly
unlikely that this has anything to do with CPU speeds or the HZ
parameter. The problems occur because the server fails to respond to
an NFS request quickly enough. This could be because the server is
overloaded or doesn't have enough resources - network buffers,
available nfsd processes, etc - to deal with the request. It could be
because the network is saturated with traffic. There could be a broken
interface or loose/bad connection. NFS stresses the network much more
than ftp or telnet traffic, so large numbers of errors or dropped
packets could be going undetected until NFS is used.

You might want to consider using NFS over TCP. This is usually better
over lossy networks or if there are routers between the client(s) and
server(s). An 8K NFS write usually means sending 6 or 7 ethernet
packets back to back on the wire. Some interfaces can't cope with
bursty traffic that - the hardware is resetting itself after handling
the previous packet as the next packet in the request is already on
the wire. The server misses a packet. It can't re-assemble the request
- usually sent as a UDP datagram. So, the server throws it away. The
client notices that its request has gone unanswered so it repeats it.
Another 6 or 7 ethernet frames go out back to back to the server. The
server drops one. The message is discarded and the loop is closed.
[Same goes with crap hardware in clients: they just can't receive
replies to their 8K read requests.]

Now you can experiment with the read and write block sizes, timeout
values and retransmission strategies to alleviate the problem.
However, the best approach is to find out what is going wrong and fix
that rather than footer about at the margins with NFS parameters and
mount options. Having said that, NFS over TCP is probably a simpler
option in these circumstances since TCP already has good code for
adapting traffic to the network round trip times, flow control,
dealing with packet loss and restransmission and so on.