123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109 |
- 1. each file can choose the replication factor
- 2. replication granularity is in volume level
- 3. if not enough spaces, we can automatically decrease some volume's the replication factor, especially for cold data
- 4. plan to support migrating data to cheaper storage
- 5. plan to manual volume placement, access-based volume placement, auction based volume placement
- When a new volume server is started, it reports
- 1. how many volumes it can hold
- 2. current list of existing volumes and each volume's replication type
- Each volume server remembers:
- 1. current volume ids
- 2. replica locations are read from the master
- The master assign volume ids based on
- 1. replication factor
- data center, rack
- 2. concurrent write support
- On master, stores the replication configuration
- {
- replication:{
- {type:"00", min_volume_count:3, weight:10},
- {type:"01", min_volume_count:2, weight:20},
- {type:"10", min_volume_count:2, weight:20},
- {type:"11", min_volume_count:3, weight:30},
- {type:"20", min_volume_count:2, weight:20}
- },
- port:9333,
- }
- Or manually via command line
- 1. add volume with specified replication factor
- 2. add volume with specified volume id
- If duplicated volume ids are reported from different volume servers,
- the master determines the replication factor of the volume,
- if less than the replication factor, the volume is in readonly mode
- if more than the replication factor, the volume will purge the smallest/oldest volume
- if equal, the volume will function as usual
- Use cases:
- on volume server
- 1. weed volume -mserver="xx.xx.xx.xx:9333" -publicUrl="good.com:8080" -dir="/tmp" -volumes=50
- on weed master
- 1. weed master -port=9333
- generate a default json configuration file if doesn't exist
-
- Bootstrap
- 1. at the very beginning, the system has no volumes at all.
- When data node starts:
- 1. each data node send to master its existing volumes and max volume blocks
- 2. master remembers the topology/data_center/rack/data_node/volumes
- for each replication level, stores
- volume id ~ data node
- writable volume ids
- If any "assign" request comes in
- 1. find a writable volume with the right replicationLevel
- 2. if not found, grow the volumes with the right replication level
- 3. return a writable volume to the user
-
- for data node:
- 0. detect existing volumes DONE
- 1. onStartUp, and periodically, send existing volumes and maxVolumeCount store.Join(), DONE
- 2. accept command to grow a volume( id + replication level) DONE
- /admin/assign_volume?volume=some_id&replicationType=01
- 3. accept setting volumeLocationList DONE
- /admin/set_volume_locations_list?volumeLocationsList=[{Vid:xxx,Locations:[loc1,loc2,loc3]}]
- 4. for each write, pass the write to the next location, (Step 2)
- POST method should accept an index, like ttl, get decremented every hop
- for master:
- 1. accept data node's report of existing volumes and maxVolumeCount ALREADY EXISTS /dir/join
- 2. periodically refresh for active data nodes, and adjust writable volumes
- 3. send command to grow a volume(id + replication level) DONE
- 5. accept lookup for volume locations ALREADY EXISTS /dir/lookup
- 6. read topology/datacenter/rack layout
- An algorithm to allocate volumes evenly, but may be inefficient if free volumes are plenty:
- input: replication=xyz
- algorithm:
- ret_dcs = []
- foreach dc that has y+z+1 volumes{
- ret_racks = []
- foreach rack with z+1 volumes{
- ret = select z+1 servers with 1 volume
- if ret.size()==z+1 {
- ret_racks.append(ret)
- }
- }
- randomly pick one rack from ret_racks
- ret += select y racks with 1 volume each
- if ret.size()==y+z+1{
- ret_dcs.append(ret)
- }
- }
- randomly pick one dc from ret_dcs
- ret += select x data centers with 1 volume each
- A simple replica placement algorithm, but may fail when free volume slots are not plenty:
- ret := []volumes
- dc = randomly pick 1 data center with y+z+1 volumes
- rack = randomly pick 1 rack with z+1 volumes
- ret = ret.append(randomly pick z+1 volumes)
- ret = ret.append(randomly pick y racks with 1 volume)
- ret = ret.append(randomly pick x data centers with 1 volume)
- TODO:
- 1. replicate content to the other server if the replication type needs replicas
|