Thanks for the comments. This is rdiff-backup, not rsync, and it has the notion of considering the modified mirror dirty until it finishes, and it will roll back on restart. I am not clear how well it does about verifying contents (or timestamps before the last full-backup timestamp?). I am also not clear if it's fsyncing each file before putting it in the log. That's interesting about working around ext4 issues. The code also has (bizarre) calls to fsync the directory that a file is in, after fsyncing the file. I think what's really killing my performance is that cache flush on these disks is expensive, and that's part of fsync. So probably we need a way to call sync(2) and guarantee that everything that was dirty at call time is written before return, like fsync, and to do that after writing the data and before writing the commit file. The real issue is ordering and making sure all the data and per-file metadata is on disk before writing the file that says the backup succeeded, and I don't see that we/posix have a good way to express that, other than sync(2) and wait 30s, which isn't so bad. With a remote-over-ssh target, there are fsync calls on files opened but not written to, and with a non-WAPBL disk these are fast. I've brought this up on the rdiff-backup list; it appears the maintainer has gone missing. Obligatory actual netbsd tech-kern content: It seems like we really need a sync_synchronous(2) system call that guarantees that all file system operations that have completed (syscall returned) before the issuance of the sync_synchronous call are on disk before sync_synchronous returns. It seems odd that for sync, there is no waiting, fsync seems to wait, and fsync_range can flush or not flush caches, more or less.
Attachment:
pgpdMW70rlJV1.pgp
Description: PGP signature