<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-text-flowed" style="font-family: -moz-fixed;
font-size: 12px;" lang="x-unicode">hi,
<br>
<br>
Draft of the design doc:
<br>
<br>
Main motivation for the design of this feature is to reduce
network
round trips by sending more<br>
than one fop in a network operation,
preferably without introducing new rpcs.
<br>
<br>
There are new 2 new xlators compound-fop-sender,
compound-fop-receiver.
<br>
compound-fop-sender is going to be loaded on top of each
client-xlator on the
<br>
mount/client and compound-fop-receiver is going to be loaded below
<br>
server-xlator on the bricks. On the mount/client side from the
caller xlator
<br>
till compund-fop-encoder xlator, the xlators can choose to
implement this extra
<br>
compound fop handling. Once it reaches "compound-fop-sender" it
will try to
<br>
choose a base fop on which it encodes the other fop in the
base-fop's xdata,
<br>
and winds the base fop to client xlator(). client xlator sends the
base fop
<br>
with encoded xdata to server xlator on the brick using rpc of the
base fop.
<br>
Once server xlator does resolve_and_resume() it will wind the base
fop to
<br>
compound-fop-receiver xlator. This fop will decode the extra fop
from xdata of
<br>
the base-fop. Based on the order encoded in the xdata it executes
separate fops
<br>
one after the other and stores the cbk response arguments of both
the
<br>
operations. It again encodes the response of the extra fop on to
the base fop's
<br>
response xdata and unwind the fop to server xlator. Sends the
response using
<br>
base-rpc's response structure. Client xlator will unwind the base
fop to
<br>
compound-fop-sender, which will decode the response to the
compound fop's
<br>
response arguments of the compound fop and unwind to the parent
xlators.
<br>
<br>
I will take an example of fxattrop+write operation that we want to
implement in
<br>
afr as an example to explain how things may look.
<br>
<br>
compound_fop_sender_fxattrop_write(call_frame_t *frame, xlator_t
*this, fd_t * fd,
<br>
       gf_xattrop_flags_t flags,
<br>
       dict_t * fxattrop_dict,
<br>
       dict_t * fxattrop_xdata,
<br>
       struct iovec * vector,
<br>
       int32_t count,
<br>
       off_t off,
<br>
       uint32_t flags,
<br>
       struct iobref * iobref,
<br>
       dict_t * writev_xdata)
<br>
) {
<br>
       0) Remember the compound-fop
<br>
       take base-fop as write()
<br>
       in wriev_xdata add the following key,value pairs
<br>
       1) "xattrop-flags", flags
<br>
       2) for-each-fxattrop_dict key ->
"fxattrop-dict-<actual-key>", value
<br>
       3) for-each-fxattrop_xdata key ->
"fxattrop-xdata-<actual-key>", value
<br>
       4) "order" -> "fxattrop, writev"
<br>
       5) "compound-fops" -> "fxattrop"
<br>
       6) Wind writev()
<br>
}
<br>
<br>
compound_fop_sender_fxattrop_write_cbk(...)
<br>
{
<br>
       /<b class="moz-txt-star"><span class="moz-txt-tag">*</span>decode
the response args and call parent_fxattrop_write_cbk<span
class="moz-txt-tag">*</span></b>/
<br>
}
<br>
<br>
<compound_fop_sender_parent>_fxattrop_write_cbk
(call_frame_t *frame, void *cookie,
<br>
                                       xlator_t *this, int32_t
fxattrop_op_ret,
<br>
                                       int32_t fxattrop_op_errno,
<br>
                                       dict_t *fxattrop_dict,
<br>
                                       dict_t *fxattrop_xdata,
<br>
                                       int32_t writev_op_ret,
int32_t writev_op_errno,
<br>
                                       struct iatt
*writev_prebuf,
<br>
                                       struct iatt
*writev_postbuf,
<br>
                                       dict_t *writev_xdata)
<br>
{
<br>
/**/
<br>
}
<br>
<br>
compound_fop_receiver_writev(call_frame_t *frame, xlator_t *this,
fd_t * fd,
<br>
       struct iovec * vector,
<br>
       int32_t count,
<br>
       off_t off,
<br>
       uint32_t flags,
<br>
       struct iobref * iobref,
<br>
       dict_t * writev_xdata)
<br>
{
<br>
       0) Check if writev_xdata has "compound-fop" else
default_writev()
<br>
       2) decode writev_xdata from above encoding -> flags,
fxattrop_dict, fxattrop-xdata
<br>
       3) get "order"
<br>
       4) Store all the above in 'local'
<br>
       5) wind fxattrop() with
compound_receiver_fxattrop_cbk_writev_wind() as cbk
<br>
}
<br>
<br>
compound_receiver_fxattrop_cbk_writev_wind (call_frame_t *frame,
void *cookie,
<br>
                                           xlator_t *this,
int32_t op_ret,
<br>
                                           int32_t op_errno,
dict_t *dict,
<br>
                                           dict_t *xdata)
<br>
{
<br>
       0) store fxattrop cbk_args
<br>
       1) Perform writev() with writev_params with
compound_receiver_writev_cbk() as the 'cbk'
<br>
}
<br>
<br>
compound_writev_cbk (call_frame_t *frame, void *cookie, xlator_t
*this,
<br>
                    int32_t op_ret, int32_t op_errno, struct iatt
*prebuf,
<br>
                    struct iatt *postbuf, dict_t *xdata)
<br>
{
<br>
       0) store writev cbk_args
<br>
       1) Encode fxattrop response to writev_xdata with similar
encoding in the compound_fop_sender_fxattrop_write()
<br>
       2) unwind writev()
<br>
}
<br>
<br>
This example is just to show how things may look, but the actual
implementation
<br>
may just have all base-fops calling common function to perform the
operations
<br>
in the order given in the receriver xl. Yet to think about that.
It is probably better to Encode
<br>
fop-number from glusterfs_fop_t rather than the fop-string in the
dictionary.
<br>
<br>
This is phase-1 of the change because we don't want to change RPCs
<br>
in phase-2 we can implement the compound fops that are commonly
used by lot of translators throughout the stack so that
quota/bitrot/geo-rep/barrier etc handle them
<br>
in phase-3 may be just in time for 4.0 we can convert them to on
the wire RPCs
<br>
<br>
Thanks to Raghavendra G, krutika, Ravi, Anuradha for the
discussions
<br>
<br>
Pranith
<br>
</div>
</body>
</html>