Blog
Search…
Ceph Ansible baremetal deployment
How many times you tried to install Ceph? How many fails with no reason?
All Ceph operator should agree with me when i say that Ceph installer doesn't really works as expected so far.
Yes, i'm talking about ceph-deploy and the main reason why i'm posting this guide about deploying Ceph with Ansible.
At this post, i will show how to install a Ceph cluster with Ansible on baremetal servers.
My configuration is as follows:
  1. 1.
    3 x ceph monitors 8GB of RAM each one
  2. 2.
    3 x OSD nodes 16GB of RAM and 3x100 GB of Disk
  3. 3.
    1 x RadosGateway node 8GB of RAM
First, download Ceph-Ansible playbooks
1
git clone https://github.com/ceph/ceph-ansible/
2
Cloning into 'ceph-ansible'...
3
remote: Counting objects: 5764, done.
4
remote: Compressing objects: 100% (38/38), done.
5
remote: Total 5764 (delta 7), reused 0 (delta 0), pack-reused 5726
6
Receiving objects: 100% (5764/5764), 1.12 MiB | 1.06 MiB/s, done.
7
Resolving deltas: 100% (3465/3465), done.
8
Checking connectivity... done.
Copied!
Move to the newly created folder called ceph-ansible
1
cd ceph-ansible/
Copied!
Copy sample vars files, we will configure our environment in these variable files.
1
cp site.yml.sample site.yml
2
cp group_vars/all.sample group_vars/all
3
cp group_vars/mons.sample group_vars/mons
4
cp group_vars/osds.sample group_vars/osds
Copied!
Next step is configure the inventory with our servers, i don\'t really like use /etc/ansible/host file, i prefer create a new file per environment inside playbook\'s folder.
Create a file with the following content, use you own IPs to match your servers on the desired role inside the cluster
1
[[email protected] ~]# vi inventory_hosts
2
3
[mons]
4
192.168.1.48
5
192.168.1.49
6
192.168.1.52
7
8
[osds]
9
192.168.1.50
10
192.168.1.53
11
192.168.1.54
12
13
[rgws]
14
192.168.1.55
Copied!
Test connectivity to you servers pinging them through Ansible ping module
1
[[email protected] ~]# ansible -m ping -i inventory_hosts all
2
192.168.1.48 | success >> {
3
"changed": false,
4
"ping": "pong"
5
}
6
7
192.168.1.50 | success >> {
8
"changed": false,
9
"ping": "pong"
10
}
11
12
192.168.1.55 | success >> {
13
"changed": false,
14
"ping": "pong"
15
}
16
17
192.168.1.53 | success >> {
18
"changed": false,
19
"ping": "pong"
20
}
21
22
192.168.1.49 | success >> {
23
"changed": false,
24
"ping": "pong"
25
}
26
27
192.168.1.54 | success >> {
28
"changed": false,
29
"ping": "pong"
30
}
31
32
192.168.1.52 | success >> {
33
"changed": false,
34
"ping": "pong"
35
}
Copied!
Edit site.yml file, i will remove/comment mds nodes since i\'m not going to use them.
1
[[email protected] ~]# vi site.yml
2
3
- hosts: mons
4
become: True
5
roles:
6
- ceph-mon
7
8
- hosts: agents
9
become: True
10
roles:
11
- ceph-agent
12
13
- hosts: osds
14
become: True
15
roles:
16
- ceph-osd
17
18
#- hosts: mdss
19
# become: True
20
# roles:
21
# - ceph-mds
22
23
- hosts: rgws
24
become: True
25
roles:
26
- ceph-rgw
27
28
- hosts: restapis
29
become: True
30
roles:
31
- ceph-restapi
Copied!
Edit main variable file, here we are going to configure our environment
1
[[email protected] ~]# vi group_vars/all
Copied!
Here we configure from where ceph packages are going to be installed, for now we use upstream code with the stable release Infernalis.
1
## Configure package origin
2
ceph_origin: upstream
3
ceph_stable: true
4
ceph_stable_release: infernalis
Copied!
Configure interface on which monitor will be listening
1
## Monitor options
2
monitor_interface: eth2
Copied!
Here we configure some OSD options, like journal size and what networks will be used by public and cluster data replication
1
## OSD options
2
journal_size: 1024
3
public_network: 192.168.1.0/24
4
cluster_network: 192.168.200.0/24
Copied!
Edit osds variable file
1
[[email protected] ~]# vi group_vars/osds
Copied!
I will use auto discovery option to allow ceph ansible select empy or not used devices in my servers to create OSDs.
1
# Declare devices
2
osd_auto_discovery: True
3
journal_collocation: True
Copied!
| Of course you can use other options, i\'ll highly suggest you to read variable comments, as they provide valuable information about usage. | We\'re ready to deploy ceph with ansible with our custom inventory_hosts file.
1
[[email protected] ~]# ansible-playbook site.yml -i inventory_hosts
Copied!
After a while, you will have a fully functional ceph cluster.
| Maybe you find some issues or bugs when running the playbooks. | There is a lot of efforts to fix issues on upstream repository. If a new bug is encountered, please, post a issue right here. | https://github.com/ceph/ceph-ansible/issues
You can check your cluster status with ceph -s. we can see all OSDs are up and pgs active/clean.
1
[[email protected] ~]# ceph -s
2
cluster 5ff692ab-2150-41a4-8b6d-001a4da21c9c
3
health HEALTH_OK
4
monmap e1: 3 mons at {ceph-mon1=192.168.200.141:6789/0,ceph-mon2=192.168.200.180:6789/0,ceph-mon3=192.168.200.232:6789/0}
5
election epoch 6, quorum 0,1,2 ceph-mon1,ceph-mon2,ceph-mon3
6
osdmap e10: 9 osds: 9 up, 9 in
7
flags sortbitwise
8
pgmap v32: 64 pgs, 1 pools, 0 bytes data, 0 objects
9
102256 kB used, 896 GB / 896 GB avail
10
64 active+clean
Copied!
| We are going to do some tests. | Create a pool
1
[[email protected] ~]# ceph osd pool create test 128 128
2
pool 'test' created
Copied!
Create a file big file
1
[[email protected] ~]# dd if=/dev/zero of=/tmp/sample.txt bs=2M count=1000
2
1000+0 records in
3
1000+0 records out
4
2097152000 bytes (2.1 GB) copied, 16.7386 s, 125 MB/s
Copied!
Upload the file to rados
1
[[email protected] ~]# rados -p test put sample /tmp/sample.txt
Copied!
Check om which placement groups your file is saved
1
[[email protected] ~]# ceph osd map test sample
2
osdmap e13 pool 'test' (1) object 'sample' -> pg 1.bddbf0b9 (1.39) -> up ([1,0], p1) acting ([1,0], p1)
Copied!
Query the placement group where you file was uploaded, a similar output will prompts
1
[[email protected] ~]# ceph pg 1.39 query
2
{
3
"state": "active+clean",
4
"snap_trimq": "[]",
5
"epoch": 13,
6
"up": [
7
1,
8
0
9
],
10
"acting": [
11
1,
12
0
13
],
14
"actingbackfill": [
15
"0",
16
"1"
17
],
18
"info": {
19
"pgid": "1.39",
20
"last_update": "13'500",
21
"last_complete": "13'500",
22
"log_tail": "0'0",
23
"last_user_version": 500,
24
"last_backfill": "MAX",
25
"last_backfill_bitwise": 0,
26
"purged_snaps": "[]",
27
"history": {
28
"epoch_created": 11,
29
"last_epoch_started": 12,
30
"last_epoch_clean": 13,
31
"last_epoch_split": 0,
32
"last_epoch_marked_full": 0,
33
"same_up_since": 11,
34
"same_interval_since": 11,
35
"same_primary_since": 11,
36
"last_scrub": "0'0",
37
"last_scrub_stamp": "2016-03-16 21:13:08.883121",
38
"last_deep_scrub": "0'0",
39
"last_deep_scrub_stamp": "2016-03-16 21:13:08.883121",
40
"last_clean_scrub_stamp": "0.000000"
41
},
42
"stats": {
43
"version": "13'500",
44
"reported_seq": "505",
45
"reported_epoch": "13",
46
"state": "active+clean",
47
"last_fresh": "2016-03-16 21:24:40.930724",
48
"last_change": "2016-03-16 21:14:09.874086",
49
"last_active": "2016-03-16 21:24:40.930724",
50
"last_peered": "2016-03-16 21:24:40.930724",
51
"last_clean": "2016-03-16 21:24:40.930724",
52
"last_became_active": "0.000000",
53
"last_became_peered": "0.000000",
54
"last_unstale": "2016-03-16 21:24:40.930724",
55
"last_undegraded": "2016-03-16 21:24:40.930724",
56
"last_fullsized": "2016-03-16 21:24:40.930724",
57
"mapping_epoch": 11,
58
"log_start": "0'0",
59
"ondisk_log_start": "0'0",
60
"created": 11,
61
"last_epoch_clean": 13,
62
"parent": "0.0",
63
"parent_split_bits": 0,
64
"last_scrub": "0'0",
65
"last_scrub_stamp": "2016-03-16 21:13:08.883121",
66
"last_deep_scrub": "0'0",
67
"last_deep_scrub_stamp": "2016-03-16 21:13:08.883121",
68
"last_clean_scrub_stamp": "0.000000",
69
"log_size": 500,
70
"ondisk_log_size": 500,
71
"stats_invalid": "0",
72
"stat_sum": {
73
"num_bytes": 2097152000,
74
"num_objects": 1,
75
"num_object_clones": 0,
76
"num_object_copies": 2,
77
"num_objects_missing_on_primary": 0,
78
"num_objects_degraded": 0,
79
"num_objects_misplaced": 0,
80
"num_objects_unfound": 0,
81
"num_objects_dirty": 1,
82
"num_whiteouts": 0,
83
"num_read": 0,
84
"num_read_kb": 0,
85
"num_write": 500,
86
"num_write_kb": 2048000,
87
"num_scrub_errors": 0,
88
"num_shallow_scrub_errors": 0,
89
"num_deep_scrub_errors": 0,
90
"num_objects_recovered": 0,
91
"num_bytes_recovered": 0,
92
"num_keys_recovered": 0,
93
"num_objects_omap": 0,
94
"num_objects_hit_set_archive": 0,
95
"num_bytes_hit_set_archive": 0,
96
"num_flush": 0,
97
"num_flush_kb": 0,
98
"num_evict": 0,
99
"num_evict_kb": 0,
100
"num_promote": 0,
101
"num_flush_mode_high": 0,
102
"num_flush_mode_low": 0,
103
"num_evict_mode_some": 0,
104
"num_evict_mode_full": 0
105
},
106
"up": [
107
1,
108
0
109
],
110
"acting": [
111
1,
112
0
113
],
114
"blocked_by": [],
115
"up_primary": 1,
116
"acting_primary": 1
117
},
118
"empty": 0,
119
"dne": 0,
120
"incomplete": 0,
121
"last_epoch_started": 12,
122
"hit_set_history": {
123
"current_last_update": "0'0",
124
"history": []
125
}
126
},
127
"peer_info": [
128
{
129
"peer": "0",
130
"pgid": "1.39",
131
"last_update": "13'500",
132
"last_complete": "13'500",
133
"log_tail": "0'0",
134
"last_user_version": 0,
135
"last_backfill": "MAX",
136
"last_backfill_bitwise": 0,
137
"purged_snaps": "[]",
138
"history": {
139
"epoch_created": 11,
140
"last_epoch_started": 12,
141
"last_epoch_clean": 13,
142
"last_epoch_split": 0,
143
"last_epoch_marked_full": 0,
144
"same_up_since": 0,
145
"same_interval_since": 0,
146
"same_primary_since": 0,
147
"last_scrub": "0'0",
148
"last_scrub_stamp": "2016-03-16 21:13:08.883121",
149
"last_deep_scrub": "0'0",
150
"last_deep_scrub_stamp": "2016-03-16 21:13:08.883121",
151
"last_clean_scrub_stamp": "0.000000"
152
},
153
"stats": {
154
"version": "0'0",
155
"reported_seq": "0",
156
"reported_epoch": "0",
157
"state": "inactive",
158
"last_fresh": "0.000000",
159
"last_change": "0.000000",
160
"last_active": "0.000000",
161
"last_peered": "0.000000",
162
"last_clean": "0.000000",
163
"last_became_active": "0.000000",
164
"last_became_peered": "0.000000",
165
"last_unstale": "0.000000",
166
"last_undegraded": "0.000000",
167
"last_fullsized": "0.000000",
168
"mapping_epoch": 0,
169
"log_start": "0'0",
170
"ondisk_log_start": "0'0",
171
"created": 0,
172
"last_epoch_clean": 0,
173
"parent": "0.0",
174
"parent_split_bits": 0,
175
"last_scrub": "0'0",
176
"last_scrub_stamp": "0.000000",
177
"last_deep_scrub": "0'0",
178
"last_deep_scrub_stamp": "0.000000",
179
"last_clean_scrub_stamp": "0.000000",
180
"log_size": 0,
181
"ondisk_log_size": 0,
182
"stats_invalid": "0",
183
"stat_sum": {
184
"num_bytes": 0,
185
"num_objects": 0,
186
"num_object_clones": 0,
187
"num_object_copies": 0,
188
"num_objects_missing_on_primary": 0,
189
"num_objects_degraded": 0,
190
"num_objects_misplaced": 0,
191
"num_objects_unfound": 0,
192
"num_objects_dirty": 0,
193
"num_whiteouts": 0,
194
"num_read": 0,
195
"num_read_kb": 0,
196
"num_write": 0,
197
"num_write_kb": 0,
198
"num_scrub_errors": 0,
199
"num_shallow_scrub_errors": 0,
200
"num_deep_scrub_errors": 0,
201
"num_objects_recovered": 0,
202
"num_bytes_recovered": 0,
203
"num_keys_recovered": 0,
204
"num_objects_omap": 0,
205
"num_objects_hit_set_archive": 0,
206
"num_bytes_hit_set_archive": 0,
207
"num_flush": 0,
208
"num_flush_kb": 0,
209
"num_evict": 0,
210
"num_evict_kb": 0,
211
"num_promote": 0,
212
"num_flush_mode_high": 0,
213
"num_flush_mode_low": 0,
214
"num_evict_mode_some": 0,
215
"num_evict_mode_full": 0
216
},
217
"up": [],
218
"acting": [],
219
"blocked_by": [],
220
"up_primary": -1,
221
"acting_primary": -1
222
},
223
"empty": 0,
224
"dne": 0,
225
"incomplete": 0,
226
"last_epoch_started": 12,
227
"hit_set_history": {
228
"current_last_update": "0'0",
229
"history": []
230
}
231
}
232
],
233
"recovery_state": [
234
{
235
"name": "Started\/Primary\/Active",
236
"enter_time": "2016-03-16 21:13:36.769083",
237
"might_have_unfound": [],
238
"recovery_progress": {
239
"backfill_targets": [],
240
"waiting_on_backfill": [],
241
"last_backfill_started": "MIN",
242
"backfill_info": {
243
"begin": "MIN",
244
"end": "MIN",
245
"objects": []
246
},
247
"peer_backfill_info": [],
248
"backfills_in_flight": [],
249
"recovering": [],
250
"pg_backend": {
251
"pull_from_peer": [],
252
"pushing": []
253
}
254
},
255
"scrub": {
256
"scrubber.epoch_start": "0",
257
"scrubber.active": 0,
258
"scrubber.waiting_on": 0,
259
"scrubber.waiting_on_whom": []
260
}
261
},
262
{
263
"name": "Started",
264
"enter_time": "2016-03-16 21:13:09.216260"
265
}
266
],
267
"agent_state": {}
268
}
Copied!
That\'s all for now.
Regards, Eduardo Gonzalez
Last modified 1yr ago
Copy link