Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: new rust (was: gdb issues?)



> Program terminated with signal SIGSEGV, Segmentation fault.
...
> #0  0x60d0fe74 in _cpuset_isset () from /usr/lib/libc.so.12
> #1  0x03d2bf8c in std::sys::unix::thread::available_parallelism ()

...

> At least it gives a bit of clue about where to go looking for the
> null pointer de-reference, so that's at least something...

This gets me to

work/rustc-1.73.0-src/library/std/src/sys/unix/thread.rs

which says:

            #[cfg(target_os = "netbsd")]
            {
                unsafe {
                    let set = libc::_cpuset_create();
                    if !set.is_null() {
                        let mut count: usize = 0;
                        if libc::pthread_getaffinity_np(libc::pthread_self(), libc::_cpuset_size(set), set) == 0 {
                            for i in 0..u64::MAX {
                                match libc::_cpuset_isset(i, set) {
                                    -1 => break,
                                    0 => continue,
                                    _ => count = count + 1,
                                }
                            }
                        }
                        libc::_cpuset_destroy(set);
                        if let Some(count) = NonZeroUsize::new(count) {
                            return Ok(count);
                        }
                    }
                }
            }

which on the surface looks innocent enough, and this is as near
as I can tell the same code as in rust 1.72.1, while the code in
1.71.1 is different, and falls back to using sysctl with this
code (the bootstrap program may be linked with the "old" standard
library, so the problem may have been in 1.72.1 too):

            let mut cpus: libc::c_uint = 0;
            let mut cpus_size = crate::mem::size_of_val(&cpus);

            unsafe {
                cpus = libc::sysconf(libc::_SC_NPROCESSORS_ONLN) as libc::c_uint;
            }

            // Fallback approach in case of errors or no hardware threads.
            if cpus < 1 {
                let mut mib = [libc::CTL_HW, libc::HW_NCPU, 0, 0];
                let res = unsafe {
                    libc::sysctl(
                        mib.as_mut_ptr(),
                        2,
                        &mut cpus as *mut _ as *mut _,
                        &mut cpus_size as *mut _ as *mut _,
                        ptr::null_mut(),
                        0,
                    )
                };

                // Handle errors if any.
                if res == -1 {
                    return Err(io::Error::last_os_error());
                } else if cpus == 0 {
                    return Err(io::const_io_error!(io::ErrorKind::NotFound, "The number of hardware threads is not known for the target platform"));
                }
            }
            Ok(unsafe { NonZeroUsize::new_unchecked(cpus as usize) })

(Actually, the fallback code is there in 1.73.0 and 1.72.1 too,
it's just not used due to the addition of the netbsd-specific
section above...)

The cpuset(3) man page says

     cpuset_isset(cpu, set)
              Checks if CPU specified by cpu is set in the CPU-set set.
              Returns the positive number if set, zero if not set, and -1 if
              cpu is invalid.

but ... under which conditions would it seg-fault inside that function?
Looking at the C code in common doesn't reveal anything frightening...

However, an attempt at a trivial re-implementation "to count
CPUs" in this manner in C does not trigger this issue on any of
my "problematic" platforms (or on amd64 for that matter):

#include <pthread.h>
#include <sched.h>
#include <stdio.h>

int
main(int argc, char **argv)
{
        int count = 0;
        cpuset_t *cset;
        int i;
        int ret;

        cset = cpuset_create();
        if (cset != NULL) {
	   	cpuset_zero(cset);
                if (pthread_getaffinity_np(pthread_self(),      
                    cpuset_size(cset),
                    cset) == 0)
                {
                        for (i = 0; i<256; i++) {
                                ret = cpuset_isset(i, cset);
                                if (ret == -1)
                                        break;
                                if (ret == 0)
                                        continue;
                                count++;
                        }
                }
        }
        printf("cpus: %d\n", count);
        return 0;
}

but also fails to count the number of CPUs (prints 0). So what
am I (and/or rust) doing wrong?  Or ... is this code simply wrong
anyway, and we need to re-instate the 1.71.1 code path by ripping
out the NetBSD-specific section quoted above?

Meanwhile, the warning in the pthread_getaffinity_np man page is
ignored:

     Portable applications should not use the pthread_setaffinity_np() and
     pthread_getaffinity_np() functions.

Although it could perhaps be argued that rust isn't all that
portable..., and perhaps in particular this piece of code?

Debugging the C program reveals that pthread_getaffinity_np() has
done exactly nothing to the "cset" contents as near as I can
tell, the "bits" entry doesn't change.

I'm running with the theory that the 1.72.1 bootstrap is faulty,
and I'm generating it anew with a patch to rip out the new code,
and will re-try the 1.73.0 build with that on my armv7 system,
but it'll take a while...

Regards,

- Havard


Home | Main Index | Thread Index | Old Index