tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: adding linux syscall fallocate



>You don't have to know where to write. Strictly speaking you only need to know that the space is reserved. We do
>have enough magic block numbers that could serve as a marker. E.g. anything up to the end of the first super block
>isn't allocatable. And there are enough spare fields in the fs, cg and csum structs.

One thing that I am missing here is: if you would keep info about reservation as number of magic block i.e. M inside blocks pointers, and you would reserve lets say 1000 blocks for file from 1200 available on storage medium, you will have enough block for this reservation but later another one will came call for another file which will require another 1000 blocks and this one should fail because we do not have enough space, so knowledge about history of allocations is require.

If you request block-allocator from fallocate to reserve these block, to make sure that they will be available as a reservation, you decrements number of available blk on FS, but I think info about this allocation should be stored somewhere? If we need to keep that, it will need some additional list or pool inside Cylinder Group to store these allocated but not assigned blocks? This info needs to be permanently written somewhere I think it is variable length plus we can have multiple requests/free that would be not contiguous in blocks so I have some doubts if we can just put them as a range with start-end block and call fallocated-blocks.

I remember when ext4 come with Linux fallocate, and they support only extend based files because it is obvious how to do that on extends, however for some reason they didn't support block-based old files, and I was thinking that was because of that reason that would require changes on-disk layout, and historically posix_fallocate was slow because it zeroed blocks.

Thanks
Maciej

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, November 19, 2019 4:00 PM, Christoph Badura <bad%bsd.de@localhost> wrote:

> On Mon, Nov 18, 2019 at 02:26:17PM -0800, Jason Thorpe wrote:
>
> > > On Nov 18, 2019, at 1:13 PM, Mouse mouse%Rodents-Montreal.ORG@localhost wrote:
> > > All you need is a second magic block number. Block number zero is
> > > already reserved for holes. Making, say, block number 1, or -1, or
> > > some such, reserved to represent "block-of-zeros semantics for which
> > > all backing data has been accounted as allocated", so that writing such
> > > a block is guaranteed to have space available?
> >
> > If you’re using a single magic block number for “allocated,
> > but uninitialized”, how are you supposed to know where to write
>
> You don't have to know where to write. Strictly speaking you only need
> to know that the space is reserved. We do have enough magic block
> numbers that could serve as a marker. E.g. anything up to the end of
> the first super block isn't allocatable. And there are enough spare
> fields in the fs, cg and csum structs.
>
> However, we should look at how FreeBSD has implemented posix_fallocate
> and see if we can incorporate that code. Or at least come up with a
> compatable implementation so that there is a possibility of moving disks
> between these systems.
>
> --chris




Home | Main Index | Thread Index | Old Index