How to alleviate the influence caused by excessive index shard? The following are emergency operations, which belong to rapid hemostasis and pain relief. Some operations are high-risk, so they must be used with caution.

1. Adjust several OP timeout parameters of the OSD node

The following parameters are only use cases and can be adjusted according to your online situation, but should not be too large.

osd_op_thread_timeout = 90 #default is 15 osd_op_thread_suicide_timeout = 300 #default is 150 filestore_op_thread_timeout = 180 #default is 60 filestore_op_thread_suicide_timeout = 300 #default is 180 Osd_scrub_thread_suicide_timeout = 300 # Add this if there is an op timeout caused by ScrubCopy the code

2. Compress the OMAP directory of the OSD node

If the OSD node can be stopped, compact operations can be performed on the OSD node. Ceph 0.94.6 or later is recommended. Github.com/ceph/ceph/p…

Ceph osd set noout 2. D /ceph stop OSD.< osD-id > 3. Ps - check the osd process ef | grep "id < osd_id >" 4. Leveldb_compact_on_mount = true 5. Run the following command to start the osd service: systemctl start osd- osd@< osD-id > or /etc/init.d/ceph start osd. Confirmation process in the operation of the ps - ef | grep "id < osd - id >" 7. Run the ceph -s command to view the result, and run the tailf command to view OSD logs. Wait until all PGS are active+clean before continuing 8. Du -sh /var/lib/ceph/osd/ceph-$id/current/omap 9. Delete the leveldb_compact_on_mount configuration temporarily added to osd 10. Cancel the noout operation (depending on the actual situation, it is recommended to keep the noout operation online): Ceph OSD unset nooutCopy the code

3. Perform the reshard operation on the bucket

To reshard a bucket, you can adjust the number of shards in the bucket and redistribute index data. Only cepH 0.94.10 or above is supported, so you need to stop bucket reading and writing, and there is a risk of data loss. Use with caution, and I will not be responsible for any problems.

Note that the following operations must ensure that all operations related to the corresponding bucket have been stopped. Run the following command to backup the index radosgw-admin bi list of the bucket --bucket=<bucket_name> > <bucket_name>.list. Backup run the following command to restore the data Radosgw -admin bi put --bucket=<bucket_name> < <bucket_name>.list.backup check the index id of the bucket root@demo:/home/user# radosgw-admin bucket stats --bucket=bucket-maillist { "bucket": "bucket-maillist", "pool": "default.rgw.buckets.data", "index_pool": "default.rgw.buckets.index", "id": "0A6967a5-2C76-427A-99c6-8a788CA25034.54133.1 ", # pay attention to id "marker": "0 a6967a5-2 c76 c6-427 - a - 99-8 a788ca25034. 54133.1", "owner" : "user", "ver" : "0 #, 1 # 1", "master_ver" : "# 0 # 0, 1 0" and "mtime" : "The 2017-08-23 13:42:59. 007081", "max_marker" : "0 #, 1 #", "usage" : {}, "bucket_quota" : {" enabled ": false," max_size_kb ": -1, "max_objects": -1}} Reshard Change the shard of bucket-maillist to 4 using the command The instance ids of osd and new buckets are displayed root@demo:/home/user# radosgwa-admin bucket reshard --bucket="bucket-maillist" --num-shards=4 *** NOTICE: operation will not remove old bucket index objects *** *** these will need to be removed manually *** old bucket Instance ID: 0a6967a5-2C76-427a-99c6-8a788CA25034.54133.1 New bucket instance ID: 0 a6967a5-2 c76 c6-427 - a - 99-8 a788ca25034. 54147.1 the total entries: 3 run the following command to delete the old instance id root@demo:/home/user# radosgw-admin bi purge --bucket=" buck-maillist" --bucket-id=0a6967a5-2c76-427a-99c6-8a788ca25034.54133.1 check the final result root@demo:/home/user# radosgw-admin bucket stats --bucket=bucket-maillist { "bucket": "bucket-maillist", "pool": "default.rgw.buckets.data", "index_pool": "Is the default. RGW. Buckets. The index", "id" : "0 a6967a5-2 c76 c6-427 - a - 99-8 a788ca25034. 54147.1", # id has changed "marker" : "0 a6967a5-2 c76 c6-427 - a - 99-8 a788ca25034. 54133.1", "owner" : "user", "ver" : "0 #, 1 # 2 # 1, 2, 1, 3 # 2", "master_ver" : "0 # 0, 1 #, 2 #, 3 # 0 0 0" and "mtime" : "the 2017-08-23 14:02:19. 961205", "max_marker" : "0 #, 1 #, 2 #, 3 #", "usage" : {" RGW. Main ": { "size_kb": 50, "size_kb_actual": 60, "num_objects": 3 } }, "bucket_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 } }Copy the code

4. Disable pool scrub and deep-scrub

Jewel later versions are available

Run the following command to enable noscrub and deep-scrub from the pool # ceph osd pool set <pool-name> noscrub 1 # ceph osd pool set <pool-name> nodeep-scrub # 1 with the following command to confirm the configuration ceph osd dump | grep < pool - name > 11 'pool - the name' pool replicated the size 3 min_size crush_ruleset 2 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 800 flags hashpspool,noscrub,nodeep-scrub stripe_width 0 # ceph osd pool set <pool-name> noscrub 0 # ceph osd pool set <pool-name> nodeep-scrub 0Copy the code

5. Rectify the OSD startup failure caused by large OMAP

Next to tell