Commit graph

251 commits

Author SHA1 Message Date
Chaitanya Huilgol
ba988f87f5 libceph: tcp_nodelay support
TCP_NODELAY socket option set on connection sockets,
disables Nagle’s algorithm and improves latency characteristics.
tcp_nodelay(default)/notcp_nodelay option flags provided to
enable/disable setting the socket option.

Signed-off-by: Chaitanya Huilgol <chaitanya.huilgol@sandisk.com>
[idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments]
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19 13:31:40 +03:00
Yan, Zheng
03f4fcb028 ceph: handle SESSION_FORCE_RO message
mark session as readonly and wake up all cap waiters.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:37 +03:00
Ilya Dryomov
7a6fdeb2b1 libceph: nuke pool op infrastructure
On Mon, Dec 22, 2014 at 5:35 PM, Sage Weil <sage@newdream.net> wrote:
> On Mon, 22 Dec 2014, Ilya Dryomov wrote:
>> Actually, pool op stuff has been unused for over two years - looks like
>> it was added for rbd create_snap and that got ripped out in 2012.  It's
>> unlikely we'd ever need to manage pools or snaps from the kernel client
>> so I think it makes sense to nuke it.  Sage?
>
> Yep!

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19 13:31:37 +03:00
Ilya Dryomov
d7d5a007b1 libceph: fix sparse endianness warnings
The only real issue is the one in auth_x.c and it came with
3.19-rc1 merge.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-01-08 20:36:57 +03:00
Ilya Dryomov
84a1d2d1ec libceph: fixup includes in pagelist.h
pagelist.h needs to include linux/types.h and asm/byteorder.h and not
rely on other headers pulling yet another set of headers.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:53 +03:00
Yan, Zheng
01deead041 ceph: use getattr request to fetch inline data
Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data
in getattr reply will be copied to the page.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng
31c542a199 ceph: add inline data to pagecache
Request reply and cap message can contain inline data. add inline data
to the page cache if there is Fc cap.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng
715e4cd405 libceph: specify position of extent operation
allow specifying position of extent operation in multi-operations
osd request. This is required for cephfs to convert inline data to
normal data (compare xattr, then write object).

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng
d74b50bed0 libceph: add SETXATTR/CMPXATTR osd operations support
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:51 +03:00
Yan, Zheng
a3fc98005c libceph: require cephx message signature by default
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:51 +03:00
John Spray
d4e1a4e0db libceph: update ceph_msg_header structure
2 bytes of what was reserved space is now used by userspace for the
compat_version field.

Signed-off-by: John Spray <john.spray@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-12-17 20:09:50 +03:00
Yan, Zheng
33d0733796 libceph: message signature support
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:50 +03:00
Ilya Dryomov
4965fc38c4 libceph: nuke ceph_kvfree()
Use kvfree() from linux/mm.h instead, which is identical.  Also fix the
ceph_buffer comment: we will allocate with kmalloc() up to 32k - the
value of PAGE_ALLOC_COSTLY_ORDER, but that really is just an
implementation detail so don't mention it at all.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:50 +03:00
Yan, Zheng
9280be24dc ceph: fix file lock interruption
When a lock operation is interrupted, current code sends a unlock request to
MDS to undo the lock operation. This method does not work as expected because
the unlock request can drop locks that have already been acquired.

The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR
requests to interrupt blocked file lock request. These requests do not drop
locks that have alread been acquired, they only interrupt blocked file lock
request.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:49 +03:00
Ilya Dryomov
70b5bfa360 libceph: sync osd op definitions in rados.h
Bring in missing osd ops and strings, use macros to eliminate multiple
points of maintenance.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14 12:57:02 -07:00
Fabian Frederick
eb179d3975 libceph: remove redundant declaration
ceph_release_page_vector was defined twice in libceph.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-10-14 12:57:02 -07:00
Yan, Zheng
e4339d28f6 libceph: reference counting pagelist
this allow pagelist to present data that may be sent multiple times.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14 12:56:48 -07:00
Ilya Dryomov
2d05f082cb libceph: nuke ceph_osdc_unregister_linger_request()
Remove now unused ceph_osdc_unregister_linger_request().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:45 +04:00
Ilya Dryomov
c9f9b93ddf libceph: introduce ceph_osdc_cancel_request()
Introduce ceph_osdc_cancel_request() intended for canceling requests
from the higher layers (rbd and cephfs).  Because higher layers are in
charge and are supposed to know what and when they are canceling, the
request is not completed, only unref'ed and removed from the libceph
data structures.

__cancel_request() is no longer called before __unregister_request(),
because __unregister_request() unconditionally revokes r_request and
there is no point in trying to do it twice.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:44 +04:00
Ilya Dryomov
9e94af202a libceph: move and add dout()s to ceph_osdc_request_{get,put}()
Add dout()s to ceph_osdc_request_{get,put}().  Also move them to .c and
turn kref release callback into a static function.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:43 +04:00
Ilya Dryomov
0215e44bb3 libceph: move and add dout()s to ceph_msg_{get,put}()
Add dout()s to ceph_msg_{get,put}().  Also move them to .c and turn
kref release callback into a static function.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:43 +04:00
Ilya Dryomov
1d0326b13b libceph: rename ceph_osd_request::r_linger_osd to r_linger_osd_item
So that:

req->r_osd_item --> osd->o_requests list
req->r_linger_osd_item --> osd->o_linger_requests list

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:42 +04:00
Linus Torvalds
6d87c225f5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "This has a mix of bug fixes and cleanups.

  Alex's patch fixes a rare race in RBD.  Ilya's patches fix an ENOENT
  check when a second rbd image is mapped and a couple memory leaks.
  Zheng fixes several issues with fragmented directories and multiple
  MDSs.  Josh fixes a spin/sleep issue, and Josh and Guangliang's
  patches fix setting and unsetting RBD images read-only.

  Naturally there are several other cleanups mixed in for good measure"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
  rbd: only set disk to read-only once
  rbd: move calls that may sleep out of spin lock range
  rbd: add ioctl for rbd
  ceph: use truncate_pagecache() instead of truncate_inode_pages()
  ceph: include time stamp in every MDS request
  rbd: fix ida/idr memory leak
  rbd: use reference counts for image requests
  rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync()
  rbd: make sure we have latest osdmap on 'rbd map'
  libceph: add ceph_monc_wait_osdmap()
  libceph: mon_get_version request infrastructure
  libceph: recognize poolop requests in debugfs
  ceph: refactor readpage_nounlock() to make the logic clearer
  mds: check cap ID when handling cap export message
  ceph: remember subtree root dirfrag's auth MDS
  ceph: introduce ceph_fill_fragtree()
  ceph: handle cap import atomically
  ceph: pre-allocate ceph_cap struct for ceph_add_cap()
  ceph: update inode fields according to issued caps
  rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  ...
2014-06-12 23:06:23 -07:00
Ilya Dryomov
6044cde6f2 libceph: add ceph_monc_wait_osdmap()
Add ceph_monc_wait_osdmap(), which will block until the osdmap with the
specified epoch is received or timeout occurs.

Export both of these as they are going to be needed by rbd.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-06-06 09:29:57 +08:00
Ilya Dryomov
513a8243d6 libceph: mon_get_version request infrastructure
Add support for mon_get_version requests to libceph.  This reuses much
of the ceph_mon_generic_request infrastructure, with one exception.
Older OSDs don't set mon_get_version reply hdr->tid even if the
original request had a non-zero tid, which makes it impossible to
lookup ceph_mon_generic_request contexts by tid in get_generic_reply()
for such replies.  As a workaround, we allocate a reply message on the
reply path.  This can probably interfere with revoke, but I don't see
a better way.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-06-06 09:29:57 +08:00
Yan, Zheng
f98a128a55 ceph: update inode fields according to issued caps
Cap message and request reply from non-auth MDS may carry stale
information (corresponding locks are in LOCK states) even they
have the newest inode version. So client should update inode fields
according to issued caps.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-06 09:29:52 +08:00
Al Viro
2b777c9dd9 ceph_sync_read: stop poking into iov_iter guts
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:39:42 -04:00
Ilya Dryomov
18cb95af2d libceph: enable PRIMARY_AFFINITY feature bit
Announce our support for osdmaps with non-default primary affinity
values.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:20 -07:00
Ilya Dryomov
8008ab1080 libceph: return primary from ceph_calc_pg_acting()
In preparation for adding support for primary_temp, stop assuming
primaryness: add a primary out parameter to ceph_calc_pg_acting() and
change call sites accordingly.  Primary is now specified separately
from the order of osds in the set.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:14 -07:00
Ilya Dryomov
ac972230e2 libceph: switch ceph_calc_pg_acting() to new helpers
Switch ceph_calc_pg_acting() to new helpers: pg_to_raw_osds(),
raw_to_up_osds() and apply_temps().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:13 -07:00
Ilya Dryomov
2abebdbca7 libceph: ceph_can_shift_osds(pool) and pool type defines
Bring in pg_pool_t::can_shift_osds() counterpart along with pool type
defines.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:08 -07:00
Ilya Dryomov
246138fa67 libceph: ceph_osd_{exists,is_up,is_down}(osd) definitions
Sync up with ceph.git definitions.  Bring in ceph_osd_is_down().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:07 -07:00
Ilya Dryomov
ddf3a21a03 libceph: enable OSDMAP_ENC feature bit
Announce our support for "new" (v7 - split and separately versioned
client and osd sections) osdmap enconding.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:05 -07:00
Ilya Dryomov
2cfa34f2d6 libceph: primary_affinity infrastructure
Add primary_affinity infrastructure.  primary_affinity values are
stored in an max_osd-sized array, hanging off ceph_osdmap, similar to
a osd_weight array.

Introduce {get,set}_primary_affinity() helpers, primarily to return
CEPH_OSD_DEFAULT_PRIMARY_AFFINITY when no affinity has been set and to
abstract out osd_primary_affinity array allocation and initialization.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:02 -07:00
Ilya Dryomov
9686f94c8c libceph: primary_temp infrastructure
Add primary_temp mappings infrastructure.  struct ceph_pg_mapping is
overloaded, primary_temp mappings are stored in an rb-tree, rooted at
ceph_osdmap, in a manner similar to pg_temp mappings.

Dump primary_temp mappings to /sys/kernel/debug/ceph/<client>/osdmap,
one 'primary_temp <pgid> <osd>' per line, e.g:

    primary_temp 2.6 4

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:07:58 -07:00
Ilya Dryomov
35a935d75d libceph: generalize ceph_pg_mapping
In preparation for adding support for primary_temp mappings, generalize
struct ceph_pg_mapping so it can hold mappings other than pg_temp.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:07:57 -07:00
Ilya Dryomov
a2505d63ee libceph: split osdmap allocation and decode steps
Split osdmap allocation and initialization into a separate function,
ceph_osdmap_decode().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:07:37 -07:00
Ilya Dryomov
07bd7de47a crush: support chooseleaf_vary_r tunable (tunables3) by default
Add TUNABLES3 feature (chooseleaf_vary_r tunable) to a set of features
supported by default.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04 21:07:29 -07:00
Yan, Zheng
eb13e832f8 ceph: use fl->fl_file as owner identifier of flock and posix lock
flock and posix lock should use fl->fl_file instead of process ID
as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner
is usually equal to fl->fl_file, but it also can be a customized
value). The process ID of who holds the lock is just for F_GETLK
fcntl(2).

The fix is rename the 'pid' fields of struct ceph_mds_request_args
and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages.
We also set the most significant bit of the 'owner' field. MDS can
use that bit to distinguish between old and new clients.

The MDS counterpart of this patch modifies the flock code to not
take the 'pid_namespace' into consideration when checking conflict
locks.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-04 21:07:11 -07:00
Yan, Zheng
19913b4eac ceph: add get_name() NFS export callback
Use the newly introduced LOOKUPNAME MDS request to connect child
inode to its parent directory.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-03 10:33:53 +08:00
Ilya Dryomov
7cc69d42e6 libceph: bump CEPH_OSD_MAX_OP to 3
Our longest osd request now contains 3 ops: copyup+hint+write.

Also, CEPH_OSD_MAX_OP value in a BUG_ON in rbd_osd_req_callback() was
hard-coded to 2.  Fix it, and switch to rbd_assert while at it.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-03 10:33:52 +08:00
Ilya Dryomov
c647b8a8c6 libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op
This is primarily for rbd's benefit and is supposed to combat
fragmentation:

"... knowing that rbd images have a 4m size, librbd can pass a hint
that will let the osd do the xfs allocation size ioctl on new files so
that they are allocated in 1m or 4m chunks.  We've seen cases where
users with rbd workloads have very high levels of fragmentation in xfs
and this would mitigate that and probably have a pretty nice
performance benefit."

SETALLOCHINT is considered advisory, so our backwards compatibility
mechanism here is to set FAILOK flag for all SETALLOCHINT ops.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-03 10:33:51 +08:00
Ilya Dryomov
7b25bf5f02 libceph: encode CEPH_OSD_OP_FLAG_* op flags
Encode ceph_osd_op::flags field so that it gets sent over the wire.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-03 10:33:51 +08:00
Ilya Dryomov
9d521470a4 libceph: a per-osdc crush scratch buffer
With the addition of erasure coding support in the future, scratch
variable-length array in crush_do_rule_ary() is going to grow to at
least 200 bytes on average, on top of another 128 bytes consumed by
rawosd/osd arrays in the call chain.  Replace it with a buffer inside
struct osdmap and a mutex.  This shouldn't result in any contention,
because all osd requests were already serialized by request_mutex at
that point; the only unlocked caller was ceph_ioctl_get_dataloc().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-03 10:33:50 +08:00
Yan, Zheng
bcdfeb2eb4 ceph: remove xattr when null value is given to setxattr()
For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE
to distinguish null value case from the zero-length value case.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17 12:37:09 -08:00
Linus Torvalds
f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Linus Torvalds
d891ea23d5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "This is a big batch.  From Ilya we have:

   - rbd support for more than ~250 mapped devices (now uses same scheme
     that SCSI does for device major/minor numbering)
   - crush updates for new mapping behaviors (will be needed for coming
     erasure coding support, among other things)
   - preliminary support for tiered storage pools

  There is also a big series fixing a pile cephfs bugs with clustered
  MDSs from Yan Zheng, ACL support for cephfs from Guangliang Zhao, ceph
  fscache improvements from Li Wang, improved behavior when we get
  ENOSPC from Josh Durgin, some readv/writev improvements from
  Majianpeng, and the usual mix of small cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (76 commits)
  ceph: cast PAGE_SIZE to size_t in ceph_sync_write()
  ceph: fix dout() compile warnings in ceph_filemap_fault()
  libceph: support CEPH_FEATURE_OSD_CACHEPOOL feature
  libceph: follow redirect replies from osds
  libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}
  libceph: follow {read,write}_tier fields on osd request submission
  libceph: add ceph_pg_pool_by_id()
  libceph: CEPH_OSD_FLAG_* enum update
  libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()
  libceph: introduce and start using oid abstraction
  libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN
  libceph: move ceph_file_layout helpers to ceph_fs.h
  libceph: start using oloc abstraction
  libceph: dout() is missing a newline
  libceph: add ceph_kv{malloc,free}() and switch to them
  libceph: support CEPH_FEATURE_EXPORT_PEER
  ceph: add imported caps when handling cap export message
  ceph: add open export target session helper
  ceph: remove exported caps when handling cap import message
  ceph: handle session flush message
  ...
2014-01-28 11:02:23 -08:00
Ilya Dryomov
80e163a58c libceph: support CEPH_FEATURE_OSD_CACHEPOOL feature
Announce our (limited, see previous commit) support for CACHEPOOL
feature.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-27 23:57:57 +02:00
Ilya Dryomov
205ee1187a libceph: follow redirect replies from osds
Follow redirect replies from osds, for details see ceph.git commit
fbbe3ad1220799b7bb00ea30fce581c5eadaf034.

v1 (current) version of redirect reply consists of oloc and oid, which
expands to pool, key, nspace, hash and oid.  However, server-side code
that would populate anything other than pool doesn't exist yet, and
hence this commit adds support for pool redirects only.  To make sure
that future server-side updates don't break us, we decode all fields
and, if any of key, nspace, hash or oid have a non-default value, error
out with "corrupt osd_op_reply ..." message.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-27 23:57:53 +02:00
Ilya Dryomov
3c972c95c6 libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}
Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before
introducing r_target_{oloc,oid} needed for redirects.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-27 23:57:49 +02:00