X-Git-Url: https://err.no/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=Documentation%2Fblock%2Fbiodoc.txt;h=dc3f49e3e5392891f10d700567363b3446358b67;hb=dd6d1844af33acb4edd0a40b1770d091a22c94be;hp=3646a0aaea8263e508a12e151b3def2c4ff6e24a;hpb=fff9289b219f48cb2296714fea3d71f516991f9f;p=linux-2.6 diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index 3646a0aaea..dc3f49e3e5 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt @@ -135,7 +135,7 @@ Some new queue property settings: Sets two variables that limit the size of the request. - The request queue's max_sectors, which is a soft size in - in units of 512 byte sectors, and could be dynamically varied + units of 512 byte sectors, and could be dynamically varied by the core kernel. - The request queue's max_hw_sectors, which is a hard limit @@ -183,7 +183,7 @@ it, the pci dma mapping routines and associated data structures have now been modified to accomplish a direct page -> bus translation, without requiring a virtual address mapping (unlike the earlier scheme of virtual address -> bus translation). So this works uniformly for high-memory pages (which -do not have a correponding kernel virtual address space mapping) and +do not have a corresponding kernel virtual address space mapping) and low-memory pages. Note: Please refer to DMA-mapping.txt for a discussion on PCI high mem DMA @@ -391,7 +391,7 @@ forced such requests to be broken up into small chunks before being passed on to the generic block layer, only to be merged by the i/o scheduler when the underlying device was capable of handling the i/o in one shot. Also, using the buffer head as an i/o structure for i/os that didn't originate -from the buffer cache unecessarily added to the weight of the descriptors +from the buffer cache unnecessarily added to the weight of the descriptors which were generated for each such chunk. The following were some of the goals and expectations considered in the @@ -403,14 +403,14 @@ i. Should be appropriate as a descriptor for both raw and buffered i/o - for raw i/o. ii. Ability to represent high-memory buffers (which do not have a virtual address mapping in kernel address space). -iii.Ability to represent large i/os w/o unecessarily breaking them up (i.e +iii.Ability to represent large i/os w/o unnecessarily breaking them up (i.e greater than PAGE_SIZE chunks in one shot) iv. At the same time, ability to retain independent identity of i/os from different sources or i/o units requiring individual completion (e.g. for latency reasons) v. Ability to represent an i/o involving multiple physical memory segments (including non-page aligned page fragments, as specified via readv/writev) - without unecessarily breaking it up, if the underlying device is capable of + without unnecessarily breaking it up, if the underlying device is capable of handling it. vi. Preferably should be based on a memory descriptor structure that can be passed around different types of subsystems or layers, maybe even @@ -477,9 +477,9 @@ With this multipage bio design: the same bi_io_vec array, but with the index and size accordingly modified) - A linked list of bios is used as before for unrelated merges (*) - this avoids reallocs and makes independent completions easier to handle. -- Code that traverses the req list needs to make a distinction between - segments of a request (bio_for_each_segment) and the distinct completion - units/bios (rq_for_each_bio). +- Code that traverses the req list can find all the segments of a bio + by using rq_for_each_segment. This handles the fact that a request + has multiple bios, each of which can have multiple segments. - Drivers which can't process a large bio in one shot can use the bi_idx field to keep track of the next bio_vec entry to process. (e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE) @@ -664,14 +664,14 @@ in lvm or md. 3.2.1 Traversing segments and completion units in a request -The macros bio_for_each_segment() and rq_for_each_bio() should be used for -traversing the bios in the request list (drivers should avoid directly -trying to do it themselves). Using these helpers should also make it easier -to cope with block changes in the future. +The macro rq_for_each_segment() should be used for traversing the bios +in the request list (drivers should avoid directly trying to do it +themselves). Using these helpers should also make it easier to cope +with block changes in the future. - rq_for_each_bio(bio, rq) - bio_for_each_segment(bio_vec, bio, i) - /* bio_vec is now current segment */ + struct req_iterator iter; + rq_for_each_segment(bio_vec, rq, iter) + /* bio_vec is now current segment */ I/O completion callbacks are per-bio rather than per-segment, so drivers that traverse bio chains on completion need to keep that in mind. Drivers @@ -740,12 +740,12 @@ Block now offers some simple generic functionality to help support command queueing (typically known as tagged command queueing), ie manage more than one outstanding command on a queue at any given time. - blk_queue_init_tags(request_queue_t *q, int depth) + blk_queue_init_tags(struct request_queue *q, int depth) Initialize internal command tagging structures for a maximum depth of 'depth'. - blk_queue_free_tags((request_queue_t *q) + blk_queue_free_tags((struct request_queue *q) Teardown tag info associated with the queue. This will be done automatically by block if blk_queue_cleanup() is called on a queue @@ -754,7 +754,7 @@ one outstanding command on a queue at any given time. The above are initialization and exit management, the main helpers during normal operations are: - blk_queue_start_tag(request_queue_t *q, struct request *rq) + blk_queue_start_tag(struct request_queue *q, struct request *rq) Start tagged operation for this request. A free tag number between 0 and 'depth' is assigned to the request (rq->tag holds this number), @@ -762,7 +762,7 @@ normal operations are: for this queue is already achieved (or if the tag wasn't started for some other reason), 1 is returned. Otherwise 0 is returned. - blk_queue_end_tag(request_queue_t *q, struct request *rq) + blk_queue_end_tag(struct request_queue *q, struct request *rq) End tagged operation on this request. 'rq' is removed from the internal book keeping structures. @@ -781,9 +781,9 @@ queue. For instance, on IDE any tagged request error needs to clear both the hardware and software block queue and enable the driver to sanely restart all the outstanding requests. There's a third helper to do that: - blk_queue_invalidate_tags(request_queue_t *q) + blk_queue_invalidate_tags(struct request_queue *q) - Clear the internal block tag queue and readd all the pending requests + Clear the internal block tag queue and re-add all the pending requests to the request queue. The driver will receive them again on the next request_fn run, just like it did the first time it encountered them. @@ -890,7 +890,7 @@ Aside: Kvec i/o: - Ben LaHaise's aio code uses a slighly different structure instead + Ben LaHaise's aio code uses a slightly different structure instead of kiobufs, called a kvec_cb. This contains an array of tuples (very much like the networking code), together with a callback function and data pointer. This is embedded into a brw_cb structure when passed @@ -946,6 +946,13 @@ elevator_merged_fn called when a request in the scheduler has been scheduler for example, to reposition the request if its sorting order has changed. +elevator_allow_merge_fn called whenever the block layer determines + that a bio can be merged into an existing + request safely. The io scheduler may still + want to stop a merge at this point if it + results in some sort of conflict internally, + this hook allows it to do that. + elevator_dispatch_fn fills the dispatch queue with ready requests. I/O schedulers are free to postpone requests by not filling the dispatch queue unless @force @@ -988,7 +995,7 @@ elevator_exit_fn Allocate and free any elevator specific storage for a queue. 4.2 Request flows seen by I/O schedulers -All requests seens by I/O schedulers strictly follow one of the following three +All requests seen by I/O schedulers strictly follow one of the following three flows. set_req_fn -> @@ -1013,7 +1020,7 @@ Characteristics: i. Binary tree AS and deadline i/o schedulers use red black binary trees for disk position sorting and searching, and a fifo linked list for time-based searching. This -gives good scalability and good availablility of information. Requests are +gives good scalability and good availability of information. Requests are almost always dispatched in disk sort order, so a cache is kept of the next request in sort order to prevent binary tree lookups.