NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Fun with SSD and GPT wedges



    Date:        Mon, 11 Feb 2019 15:42:56 -0600
    From:        Robert Nestor <rnestor%mac.com@localhost>
    Message-ID:  <3CDC9CF7-291F-481F-88B2-965F4DBA0FA2%mac.com@localhost>

  | *) GPT and DKCTL merrily allow me to create wedges that can?t be
  | mapped because the /dev nodes don?t exist.

There's nothing very interesting about that - you can plug in usb
thumb drives that can't be referenced because /dev/entries don't
exist as well.   If/when we ever get a devfs that issue will go
away, and /dev will magically simply contain the entries that match
whatever the hardware makes available (plus extra stuff like vnds).
Until then, you need to do it.

  | There aren?t any warnings or errors in the process,

Because nothing is actually wrong, nothing says that you have to
have a wedge for every GPT partition, or that you have to use them
if you do.

  | Wedges appear to be created in ascending numerical order with
  | lower numbered slots used first,

ascending order implies the lower numbes are used first... but yes,
that's how they appear.

  | but wiping the GPT header doesn?t seem to always immediately
  | free the corresponding wedges.

It doesn't.   You need to be aware of the logical separation here.
GPT is a disc partitioning scheme (as are MBR and disklabel) which
divides drives into multiple pieces.   Wedges are an OS software
reference device which map a range of blocks on a particular device
to a /dev name (ie: give a handle by which a piece of disc can be
referenced).

Normally, when a drive (which already contains a label) is scanned,
any GPT partitions (of suitable types) are connected to wedges so
they can be referenced - but you can always just create a wedge for
any random piece of a drive without any GPT back end if you like.

With a new drive, as it was originally coded, you would add a new
GPT partition, and the gpt command would tell you what you needed
to type (or cut&paste) as a dkctl command to create a wedge for the
new partition (if you wanted one).

But since just about everyone wants one, that's why they created
the partition usually, and since having one command spit out instructions
for running another command is kind of weird, when it could just
run the command itself, the gpt command was changed to create a
wedge when it creates a new partition (that is the in kernel data
struct, whether or not it has a /dev/dkN entry available).

Apart from that one concession, the gpt command and wedges are two
separate things (and so are MBR and wedges and disklabel and wedges
if you use them that way - most people don't as it isn't traditional.)

That means, that when you no longer want to keep a wedge, you need
to dkclt delwedge it to cause the kernel to lose the mapping.  What
you do (if anything) with the GPT map on the drive is unrelated.
But certainly keeping wedges when you're about to remap the drive
space in some different way would cause you all kinds of weirdness.

  | So after destroying the GPT header I used DKCTL to (re)make the wedges.

You should delwedge the old wedges first, then make new ones.

None of the above is any different for SSDs compared with spinning iron.


  | *) The SSD is so fast that after creating wedges and even listing them,
  | I find I can?t newts them until the buffers get flushed which dkctl
  | does with the synccache command.

I have never found a reason to use synccache in my life - though I
guess if I was about to remove power from a drive, it might be useful.

Whatever issue you're having, that cannot be the real solution.

Further, the faster the drive, the less issue you should have with
needing to wait - a SSD should always be done with whatever you have
asked of it with less delays than rotating tin, so "so fast that..."
really makes no sense.   Something else is happening.

Are you perhaps using the block devices (/dev/dkN) for any of this.

If you just say "dkN" newfs (etc) will convert that to the raw
device (/dev/rdkN) but that doesn't happen if you use the full
device name.

If you end up using the block devices, you may have data still
in the system's buffer cache (which the synccache dkctl command is
unrelated to), which may cause problems.

  | (At least that?s what I assume is happening based on what I?m seeing.)
  | Is this an error?

It is hard to tell without seeing what you are - but I suspect that you're
not doing something in quite the way that it is expected to be done.

  | *) GPT and DKCTL happily allow me to destroy, recreate (or mis-create)
  | wedges on disk(s) with wedges that are currently mounted in the running
  | system.  (Well it was running up to the point I did this.)

Yes, they do.

  | I would think this is maybe something the system should at least
  | issue a warning about, so is this a bug?

If you try and delete a wedge that the system is still using, you
should get an error, and if you aren't, then I'd call that a bug.
But it sounds as if you're just mangling things without really
telling the system what your intentions are.  That I'd call operator
error.

  | *) I had been using the ?name=? parameter in both newts and mount
  | with the wedges.  This worked fine for spinning disks, but I see all
  | sorts of random failures when the wedges are on SSD.

This makes no real sense - there's no difference between SSDs and
whirling rust, aside from the lack of noise, heat, and ...   To the
system they look just the same (the SSD is faster, but some spinnig
devices are faster than others too.)

  | My workaround is to figure out which DKn wedge corresponds to
  | the name using dkctl, and specify the wedge by /dev/dk? for newts and mount.

That should not be needed, but the kernel wedge needs to know
the name you want to use, just having it in the GPT label is not
enough unless it was there when the system scanned the drive (usually
when the drive is first detected, but you can also use "dkctl makewedges"
to rescan - but beware, after that, the dkN's that you had before
may not defer to the same weddges.

  | So, is any of what I?m seeing a real issue that requires a PR?

I will attach a script I use (just for the add a new partition
part of what you're doing, not the rest of the install, ...) so
you can see how it works.

The was written long ago (with an early version of the gpt command)
and could be optimised now, some of what it does is no longer needed
(eg: back then, there was no way to create and supply a label for,
a new GPT partition all in one step.)   I mostly only use this on
quite old systems...   Also beware, while it adds entries to fstab,
it never removes them again (in fact, it doesn't remove anything
except a temporary wedge it makes because it was needed that way
once) - if you want to undo what it has done, you need to do that
manually.   In the fstab entry, it will only add the "NAME="
style reference if it sees one like that already there (in upper case)
(some of the systems I use this on are so old that they do not support
that!   This hack allows the script to work out which form to use,
so you need to have manually added, or changed into, that form at
least one mount point.   Using /dev/dkN names as mount points is
dangerous - they are not stable.)

kre

ps: the script uses MAKEDEV to create new entries in /dev if it
needs them...   And no guarantees this still works with modern gpt/dkctl.
#! /bin/sh

case "$1" in
-[h?])	echo "Usage: $0 base-drive options ..."
	echo ""
	echo "  Options:"
	echo "     -s size   (mandatory option)"
	echo "     -b base   (starting blockno)"
	echo "     -l label  (GPT part label / wedge name)"
	echo "     -t filesys-type  (ffs, msdos, ...)  (default ffs)"
	echo ""
	echo "  for -t ffs (which is the default):"
	echo "     -m mount-point   (created if required)"
	echo "     -f fstab-options   (ro,noauto,...) (requires -m)"
	echo "           (-f can be repeated, or value can contain commas)"
	echo "     -B blocksize   (-b opt to newfs)"
	echo "     -F fragsize    (-f opt to newfs)"
	echo "     -N num-inodes  (-n opt to newfs)"
	echo "     -O ffs_version (-O opt to newfs) (default 2)"
	echo ""
	echo "  filesystem will be created (newfs) for -t ffs"
	echo "  if either (or both) -m or -N is given, not otherwise"
	echo "  filesystem must be created to be mounted (if -m && ! -f noauto)"
	echo "  filesystem must be mounted (or -f noauto) to be added to fstab"
	echo "  fstab options (-f) must be given to add entry to fstab"
	exit 0
	;;

-*)	echo >&2 "Usage: $0 base-drive options ..."; exit 1;;
esac

case "$#" in
0|1)	echo >&2 "Usage: $0 base-drive options ..."; exit 1;;
esac

# Option default values... anything not set here has no default
TYPE=ffs
FFS=2

# These options can be issued more than once, and values accumulate
# (value of APP_xxx variable is inserted between values as separator)
APP_FSTAB=,
APP_LABEL=_
# For other options, if used more than once, last value set wins.
# This allows funcs/aliases then values to be overridden easily

D=$1; shift

VAR=
for arg
do
	test -n "${VAR}" && {
		V=APP_${VAR}
		eval VV=\$"${V}"
		if test -n "${VV}"
		then
			eval ${VAR}='$'"${VAR}"'${'"${VAR}"':+'"'${VV}'"'}'"'${arg}'"
		else
			eval ${VAR}="'${arg}'"
		fi
		VAR=
		continue
	}

	F=${arg}
	case "${arg}" in

	-b)	VAR=BASE;;
	-f)	VAR=FSTAB;;
	-l)	VAR=LABEL;;
	-m)	VAR=MOUNT;;
	-s)	VAR=SIZE;;
	-t)	VAR=TYPE;;

	-B)	VAR=BLKSIZE;;
	-F)	VAR=FRAGSIZE;;
	-N)	VAR=INODES;;
	-O)	VAR=FFS;;

	-?)	echo >&2 "Unrecognised option '${arg}'"; exit 1;;
	*)	echo >&2 "Don't know what to do with '${arg}'"; exit 1;;

	esac
done

test -n "${VAR}" && {
	echo >&2 "No value given for ${F}"
	exit 1
}

test -z "${SIZE}" && {
	echo >&2 "Need a size (-s)"
	exit 1
}

# More recent gpt commands would do this internally, but not all, so
# convert from human readable size in GB or MB into a sector count...
case "${SIZE}" in
[0-9]*[Gg]|[0-9]*[Gg][Bb])	SIZE=$(( ${SIZE%[Gg]*} * 2097152 ));;
[0-9]*[Mm]|[0-9]*[Mm][Bb])	SIZE=$(( ${SIZE%[Mm]*} * 2048 ));;
*[^0-9]*)	echo >&2 "Unrecognised size (-s) value: ${SIZE}"; exit1;;
esac

if [ -n "${FSTAB}" ] && [ -z "${MOUNT}" ]
then
	echo >&2 "Cannot add to fstab (-f option) without mount point (-m)"
	exit 1
fi

if [ -n "${BLKSIZE}${FRAGSIZE}" ] && [ -z "${MOUNT}${INODES}" ]
then
	echo >&2 "Warning: -B or -F with neither -m nor -N is useless"
fi

if [ -n "${MOUNT}" ] && ! [ -d "${MOUNT}" ]
then
	echo "Attempting to create mount point: ${MOUNT}"
	mkdir -p "${MOUNT}" || exit 1
fi

set -- $( gpt add -s "${SIZE}" ${TYPE:+-t} "${TYPE}" "${D}" ) ||
{
	echo >&2 "gpt add failed!"
	exit 1
}
if [ "$9" -ne "${SIZE}" ]
then
	echo >&2 "Partition created with incorect size"
	exit 1
fi
B=$8

if [ -n "${LABEL}" ]
then
	gpt label -l "${LABEL}" -b "${B}" "${D}"
	# Don't panic if label fails...
else
	set -- $( dkctl "${D}" addwedge NO-NAME "${B}" "${SIZE}" "${TYPE}" ) ||
	{
		echo >&2 "Unable to make temporary wedge"
		gpt remove -b "${B}" "${D}"
		exit 1
	}
	DK="$1"
	test -n "${DK}" ||
	{
		echo >&2 "Unable get wedge id of temporary wedge"
		gpt show "${D}"
		dkctl "${D}" listwedges
		exit 1
	}
	dkctl "${D}" delwedge "${DK}" ||
	{
		echo >&2 "Unable to remove temporary wedge"
		gpt remove -b "${B}" "${D}"
		exit 1
	}
	LABEL="${DK}"
fi

set -- $( dkctl "${D}" addwedge "${LABEL}" "${B}" "${SIZE}" "${TYPE}" ) ||
{
	echo >&2 "Unable to make wedge for ${LABEL}"
	gpt remove -b "${B}" "${D}"
	exit 1
}
DK="$1"
test -n "${DK}" ||
{
	echo >&2 "Unable to extract name of created wedge"

	gpt show "${D}"
	dkctl "${D}" listwedges
	exit 1
}

test -e "/dev/${DK}" && test -e "/dev/r${DK}" ||
{
	cd /dev &&
	./MAKEDEV "${DK}"
} || {
	echo >&2 "${DK} created does not exist in /dev and could not be made"
	exit 1

	# Or perhaps ...
	dkctl "${D}" delwedge "${DK}"
	gpt remove -b "${B}" "${D}"
	exit 1
}

# This is purely because there is no code below to create any
# other type of filesystem, except a ffs filesystem.  We do
# not want to attempt to mount uninitialised trash, nor do
# we want that to happen at next boot (via fstab), hence if the
# filesystem type is not ffs, we just stop here, and allow
# everything else to be setup manually.
# Add the code for other newfs types, then exempt that type from the exit 0
case "${TYPE}" in
ffs)	;;
*)	exit 0;;
esac

if [ -n "${MOUNT}" ] || [ -n "${INODES}" ]
then
	newfs \
		${FFS:+-O} ${FFS}		\
		${BLKSIZE:+-b} ${BLKSIZE}	\
		${FRAGSIZE:+-f} ${FRAGSIZE}	\
		${INODES:+-n} ${INODES}		\
			"/dev/r${DK}" 			|| {
				echo >&2 "Newfs failed on /dev/r${DK}"
				dkctl "${D}" delwedge "${DK}"
				gpt remove -b "${B}" "${D}"
				exit 1
			}
fi

if [ -n "${MOUNT}" ]
then
	M=mount
	MOPTS=log
	case "${FSTAB}" in
	*ro*,log*|*log*,ro*)
		# The log option makes no sense (and is not permitted) with ro
		MOPTS=ro
		FSTAB=$(echo "${FSTAB}" | sed -e "s/log//" -e "s/,,/,/g")
				;;
	*ro*)	MOPTS=ro	;;
	*rw,*log|*log*,rw)	;;
	*rw*)	MOPTS="rw"	;;
	*log*)	FSTAB="rw,${FSTAB}" ;;
	?*)	FSTAB="rw,${FSTAB}" MOPTS="rw" ;;
	esac
	case "${FSTAB}" in
	*noauto*)	M=:	;;
	esac

	${M} -o ${MOPTS} /dev/"${DK}" "${MOUNT}" || exit 1

	if [ -n "${FSTAB}" ]
	then
		if test -n "${LABEL}" && grep -s >/dev/null "^NAME=" /etc/fstab
		then
			F1=NAME="${LABEL}"
		else
			F1=/dev/"${DK}"
		fi
		case "${FSTAB}" in
		*noauto*)	F6=0;;
		*)		F6=2;;
		esac
		echo "${F1}	${MOUNT}	${TYPE}	${FSTAB}	 1 ${F6}" >> /etc/fstab
	fi
fi


Home | Main Index | Thread Index | Old Index