replication.txt 4.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
  1. 1. each file can choose the replication factor
  2. 2. replication granularity is in volume level
  3. 3. if not enough spaces, we can automatically decrease some volume's the replication factor, especially for cold data
  4. 4. plan to support migrating data to cheaper storage
  5. 5. plan to manual volume placement, access-based volume placement, auction based volume placement
  6. When a new volume server is started, it reports
  7. 1. how many volumes it can hold
  8. 2. current list of existing volumes and each volume's replication type
  9. Each volume server remembers:
  10. 1. current volume ids
  11. 2. replica locations are read from the master
  12. The master assign volume ids based on
  13. 1. replication factor
  14. data center, rack
  15. 2. concurrent write support
  16. On master, stores the replication configuration
  17. {
  18. replication:{
  19. {type:"00", min_volume_count:3, weight:10},
  20. {type:"01", min_volume_count:2, weight:20},
  21. {type:"10", min_volume_count:2, weight:20},
  22. {type:"11", min_volume_count:3, weight:30},
  23. {type:"20", min_volume_count:2, weight:20}
  24. },
  25. port:9333,
  26. }
  27. Or manually via command line
  28. 1. add volume with specified replication factor
  29. 2. add volume with specified volume id
  30. If duplicated volume ids are reported from different volume servers,
  31. the master determines the replication factor of the volume,
  32. if less than the replication factor, the volume is in readonly mode
  33. if more than the replication factor, the volume will purge the smallest/oldest volume
  34. if equal, the volume will function as usual
  35. Use cases:
  36. on volume server
  37. 1. weed volume -mserver="xx.xx.xx.xx:9333" -publicUrl="good.com:8080" -dir="/tmp" -volumes=50
  38. on weed master
  39. 1. weed master -port=9333
  40. generate a default json configuration file if doesn't exist
  41. Bootstrap
  42. 1. at the very beginning, the system has no volumes at all.
  43. When data node starts:
  44. 1. each data node send to master its existing volumes and max volume blocks
  45. 2. master remembers the topology/data_center/rack/data_node/volumes
  46. for each replication level, stores
  47. volume id ~ data node
  48. writable volume ids
  49. If any "assign" request comes in
  50. 1. find a writable volume with the right replicationLevel
  51. 2. if not found, grow the volumes with the right replication level
  52. 3. return a writable volume to the user
  53. for data node:
  54. 0. detect existing volumes DONE
  55. 1. onStartUp, and periodically, send existing volumes and maxVolumeCount store.Join(), DONE
  56. 2. accept command to grow a volume( id + replication level) DONE
  57. /admin/assign_volume?volume=some_id&replicationType=01
  58. 3. accept setting volumeLocationList DONE
  59. /admin/set_volume_locations_list?volumeLocationsList=[{Vid:xxx,Locations:[loc1,loc2,loc3]}]
  60. 4. for each write, pass the write to the next location, (Step 2)
  61. POST method should accept an index, like ttl, get decremented every hop
  62. for master:
  63. 1. accept data node's report of existing volumes and maxVolumeCount ALREADY EXISTS /dir/join
  64. 2. periodically refresh for active data nodes, and adjust writable volumes
  65. 3. send command to grow a volume(id + replication level) DONE
  66. 5. accept lookup for volume locations ALREADY EXISTS /dir/lookup
  67. 6. read topology/datacenter/rack layout
  68. An algorithm to allocate volumes evenly, but may be inefficient if free volumes are plenty:
  69. input: replication=xyz
  70. algorithm:
  71. ret_dcs = []
  72. foreach dc that has y+z+1 volumes{
  73. ret_racks = []
  74. foreach rack with z+1 volumes{
  75. ret = select z+1 servers with 1 volume
  76. if ret.size()==z+1 {
  77. ret_racks.append(ret)
  78. }
  79. }
  80. randomly pick one rack from ret_racks
  81. ret += select y racks with 1 volume each
  82. if ret.size()==y+z+1{
  83. ret_dcs.append(ret)
  84. }
  85. }
  86. randomly pick one dc from ret_dcs
  87. ret += select x data centers with 1 volume each
  88. A simple replica placement algorithm, but may fail when free volume slots are not plenty:
  89. ret := []volumes
  90. dc = randomly pick 1 data center with y+z+1 volumes
  91. rack = randomly pick 1 rack with z+1 volumes
  92. ret = ret.append(randomly pick z+1 volumes)
  93. ret = ret.append(randomly pick y racks with 1 volume)
  94. ret = ret.append(randomly pick x data centers with 1 volume)
  95. TODO:
  96. 1. replicate content to the other server if the replication type needs replicas