PROTOCOL 20 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634
  1. # Gearman server and library
  2. # Copyright (C) 2008 Brian Aker, Eric Day
  3. # All rights reserved.
  4. #
  5. # Use and distribution licensed under the BSD license. See
  6. # the COPYING file in this directory for full text.
  7. Gearman Protocol
  8. ----------------
  9. The Gearman protocol operates over TCP, port 4730 by default. It
  10. previously operated on port 7003, but this conflicted with the AFS
  11. port range and the new port (4730) was assigned by IANA. Communication
  12. happens between either a client and job server, or between a worker
  13. and job server. In either case, the protocol consists of packets
  14. containing requests and responses. All packets sent to a job server
  15. are considered requests, and all packets sent from a job server are
  16. considered responses. A simple configuration may look like:
  17. ---------- ---------- ---------- ----------
  18. | Client | | Client | | Client | | Client |
  19. ---------- ---------- ---------- ----------
  20. \ / \ /
  21. \ / \ /
  22. -------------- --------------
  23. | Job Server | | Job Server |
  24. -------------- --------------
  25. | |
  26. ----------------------------------------------
  27. | | | |
  28. ---------- ---------- ---------- ----------
  29. | Worker | | Worker | | Worker | | Worker |
  30. ---------- ---------- ---------- ----------
  31. Initially, the workers register functions they can perform with each
  32. job server. Clients will then connect to a job server and issue a
  33. request to a job to be run. The job server then notifies each worker
  34. that can perform that job (based on the function it registered) that
  35. a new job is ready. The first worker to wake up and retrieve the job
  36. will then execute it.
  37. All communication between workers or clients and the job server
  38. are binary. There is also a line-based text protocol used by
  39. administrative clients. This part of the protocol is text based so a
  40. custom administrative utility is not required (instead, 'telnet' or
  41. 'nc' can be used). This is documented under "Administrative Protocol".
  42. Binary Packet
  43. -------------
  44. Requests and responses are encapsulated by a binary packet. A binary
  45. packet consists of a header which is optionally followed by data. The
  46. header is:
  47. 4 byte magic code - This is either "\0REQ" for requests or "\0RES"
  48. for responses.
  49. 4 byte type - A big-endian (network-order) integer containing
  50. an enumerated packet type. Possible values are:
  51. # Name Magic Type
  52. 1 CAN_DO REQ Worker
  53. 2 CANT_DO REQ Worker
  54. 3 RESET_ABILITIES REQ Worker
  55. 4 PRE_SLEEP REQ Worker
  56. 5 (unused) - -
  57. 6 NOOP RES Worker
  58. 7 SUBMIT_JOB REQ Client
  59. 8 JOB_CREATED RES Client
  60. 9 GRAB_JOB REQ Worker
  61. 10 NO_JOB RES Worker
  62. 11 JOB_ASSIGN RES Worker
  63. 12 WORK_STATUS REQ Worker
  64. RES Client
  65. 13 WORK_COMPLETE REQ Worker
  66. RES Client
  67. 14 WORK_FAIL REQ Worker
  68. RES Client
  69. 15 GET_STATUS REQ Client
  70. 16 ECHO_REQ REQ Client/Worker
  71. 17 ECHO_RES RES Client/Worker
  72. 18 SUBMIT_JOB_BG REQ Client
  73. 19 ERROR RES Client/Worker
  74. 20 STATUS_RES RES Client
  75. 21 SUBMIT_JOB_HIGH REQ Client
  76. 22 SET_CLIENT_ID REQ Worker
  77. 23 CAN_DO_TIMEOUT REQ Worker
  78. 24 ALL_YOURS REQ Worker
  79. 25 WORK_EXCEPTION REQ Worker
  80. RES Client
  81. 26 OPTION_REQ REQ Client/Worker
  82. 27 OPTION_RES RES Client/Worker
  83. 28 WORK_DATA REQ Worker
  84. RES Client
  85. 29 WORK_WARNING REQ Worker
  86. RES Client
  87. 30 GRAB_JOB_UNIQ REQ Worker
  88. 31 JOB_ASSIGN_UNIQ RES Worker
  89. 32 SUBMIT_JOB_HIGH_BG REQ Client
  90. 33 SUBMIT_JOB_LOW REQ Client
  91. 34 SUBMIT_JOB_LOW_BG REQ Client
  92. 35 SUBMIT_JOB_SCHED REQ Client
  93. 36 SUBMIT_JOB_EPOCH REQ Client
  94. 4 byte size - A big-endian (network-order) integer containing
  95. the size of the data being sent after the header.
  96. Arguments given in the data part are separated by a NULL byte, and
  97. the last argument is determined by the size of data after the last
  98. NULL byte separator. All job handle arguments must not be longer than
  99. 64 bytes, including NULL terminator.
  100. Client/Worker Requests
  101. ----------------------
  102. These request types may be sent by either a client or a worker:
  103. ECHO_REQ
  104. When a job server receives this request, it simply generates a
  105. ECHO_RES packet with the data. This is primarily used for testing
  106. or debugging.
  107. Arguments:
  108. - Opaque data that is echoed back in response.
  109. Client/Worker Responses
  110. -----------------------
  111. These response types may be sent to either a client or a worker:
  112. ECHO_RES
  113. This is sent in response to a ECHO_REQ request. The server doesn't
  114. look at or modify the data argument, it just sends it back.
  115. Arguments:
  116. - Opaque data that is echoed back in response.
  117. ERROR
  118. This is sent whenever the server encounters an error and needs
  119. to notify a client or worker.
  120. Arguments:
  121. - NULL byte terminated error code string.
  122. - Error text.
  123. Client Requests
  124. ---------------
  125. These request types may only be sent by a client:
  126. SUBMIT_JOB, SUBMIT_JOB_BG,
  127. SUBMIT_JOB_HIGH, SUBMIT_JOB_HIGH_BG,
  128. SUBMIT_JOB_LOW, SUBMIT_JOB_LOW_BG
  129. A client issues one of these when a job needs to be run. The
  130. server will then assign a job handle and respond with a JOB_CREATED
  131. packet.
  132. If on of the BG versions is used, the client is not updated with
  133. status or notified when the job has completed (it is detached).
  134. The Gearman job server queue is implemented with three levels:
  135. normal, high, and low. Jobs submitted with one of the HIGH versions
  136. always take precedence, and jobs submitted with the normal versions
  137. take precedence over the LOW versions.
  138. Arguments:
  139. - NULL byte terminated function name.
  140. - NULL byte terminated unique ID.
  141. - Opaque data that is given to the function as an argument.
  142. SUBMIT_JOB_SCHED
  143. Just like SUBMIT_JOB_BG, but run job at given time instead of
  144. immediately. This is not currently used and may be removed.
  145. Arguments:
  146. - NULL byte terminated function name.
  147. - NULL byte terminated unique ID.
  148. - NULL byte terminated minute (0-59).
  149. - NULL byte terminated hour (0-23).
  150. - NULL byte terminated day of month (1-31).
  151. - NULL byte terminated month (1-12).
  152. - NULL byte terminated day of week (0-6, 0 = Monday).
  153. - Opaque data that is given to the function as an argument.
  154. SUBMIT_JOB_EPOCH
  155. Just like SUBMIT_JOB_BG, but run job at given time instead of
  156. immediately. This is not currently used and may be removed.
  157. Arguments:
  158. - NULL byte terminated function name.
  159. - NULL byte terminated unique ID.
  160. - NULL byte terminated epoch time.
  161. - Opaque data that is given to the function as an argument.
  162. GET_STATUS
  163. A client issues this to get status information for a submitted job.
  164. Arguments:
  165. - Job handle that was given in JOB_CREATED packet.
  166. OPTION_REQ
  167. A client issues this to set an option for the connection in the
  168. job server. Returns a OPTION_RES packet on success, or an ERROR
  169. packet on failure.
  170. Arguments:
  171. - Name of the option to set. Possibilities are:
  172. * "exceptions" - Forward WORK_EXCEPTION packets to the client.
  173. Client Responses
  174. ----------------
  175. These response types may only be sent to a client:
  176. JOB_CREATED
  177. This is sent in response to one of the SUBMIT_JOB* packets. It
  178. signifies to the client that a the server successfully received
  179. the job and queued it to be run by a worker.
  180. Arguments:
  181. - Job handle assigned by server.
  182. WORK_DATA, WORK_WARNING, WORK_STATUS, WORK_COMPLETE,
  183. WORK_FAIL, WORK_EXCEPTION
  184. For non-background jobs, the server forwards these packets from
  185. the worker to clients. See "Worker Requests" for more information
  186. and arguments.
  187. STATUS_RES
  188. This is sent in response to a GET_STATUS request. This is used by
  189. clients that have submitted a job with SUBMIT_JOB_BG to see if the
  190. job has been completed, and if not, to get the percentage complete.
  191. Arguments:
  192. - NULL byte terminated job handle.
  193. - NULL byte terminated known status, this is 0 (false) or 1 (true).
  194. - NULL byte terminated running status, this is 0 (false) or 1
  195. (true).
  196. - NULL byte terminated percent complete numerator.
  197. - Percent complete denominator.
  198. OPTION_RES
  199. Successful response to the OPTION_REQ request.
  200. Arguments:
  201. - Name of the option that was set, see OPTION_REQ for possibilities.
  202. Worker Requests
  203. ---------------
  204. These request types may only be sent by a worker:
  205. CAN_DO
  206. This is sent to notify the server that the worker is able to
  207. perform the given function. The worker is then put on a list to be
  208. woken up whenever the job server receives a job for that function.
  209. Arguments:
  210. - Function name.
  211. CAN_DO_TIMEOUT
  212. Same as CAN_DO, but with a timeout value on how long the job
  213. is allowed to run. After the timeout value, the job server will
  214. mark the job as failed and notify any listening clients.
  215. Arguments:
  216. - NULL byte terminated Function name.
  217. - Timeout value.
  218. CANT_DO
  219. This is sent to notify the server that the worker is no longer
  220. able to perform the given function.
  221. Arguments:
  222. - Function name.
  223. RESET_ABILITIES
  224. This is sent to notify the server that the worker is no longer
  225. able to do any functions it previously registered with CAN_DO or
  226. CAN_DO_TIMEOUT.
  227. Arguments:
  228. - None.
  229. PRE_SLEEP
  230. This is sent to notify the server that the worker is about to
  231. sleep, and that it should be woken up with a NOOP packet if a
  232. job comes in for a function the worker is able to perform.
  233. Arguments:
  234. - None.
  235. GRAB_JOB
  236. This is sent to the server to request any available jobs on the
  237. queue. The server will respond with either NO_JOB or JOB_ASSIGN,
  238. depending on whether a job is available.
  239. Arguments:
  240. - None.
  241. GRAB_JOB_UNIQ
  242. Just like GRAB_JOB, but return JOB_ASSIGN_UNIQ when there is a job.
  243. Arguments:
  244. - None.
  245. WORK_DATA
  246. This is sent to update the client with data from a running job. A
  247. worker should use this when it needs to send updates, send partial
  248. results, or flush data during long running jobs. It can also be
  249. used to break up a result so the worker does not need to buffer
  250. the entire result before sending in a WORK_COMPLETE packet.
  251. Arguments:
  252. - NULL byte terminated job handle.
  253. - Opaque data that is returned to the client.
  254. WORK_WARNING
  255. This is sent to update the client with a warning. It acts just
  256. like a WORK_DATA response, but should be treated as a warning
  257. instead of normal response data.
  258. Arguments:
  259. - NULL byte terminated job handle.
  260. - Opaque data that is returned to the client.
  261. WORK_STATUS
  262. This is sent to update the server (and any listening clients)
  263. of the status of a running job. The worker should send these
  264. periodically for long running jobs to update the percentage
  265. complete. The job server should store this information so a client
  266. who issued a background command may retrieve it later with a
  267. GET_STATUS request.
  268. Arguments:
  269. - NULL byte terminated job handle.
  270. - NULL byte terminated percent complete numerator.
  271. - Percent complete denominator.
  272. WORK_COMPLETE
  273. This is to notify the server (and any listening clients) that
  274. the job completed successfully.
  275. Arguments:
  276. - NULL byte terminated job handle.
  277. - Opaque data that is returned to the client as a response.
  278. WORK_FAIL
  279. This is to notify the server (and any listening clients) that
  280. the job failed.
  281. Arguments:
  282. - Job handle.
  283. WORK_EXCEPTION
  284. This is to notify the server (and any listening clients) that
  285. the job failed with the given exception.
  286. Arguments:
  287. - NULL byte terminated job handle.
  288. - Opaque data that is returned to the client as an exception.
  289. SET_CLIENT_ID
  290. This sets the worker ID in a job server so monitoring and reporting
  291. commands can uniquely identify the various workers, and different
  292. connections to job servers from the same worker.
  293. Arguments:
  294. - Unique string to identify the worker instance.
  295. ALL_YOURS
  296. Not yet implemented. This looks like it is used to notify a job
  297. server that this is the only job server it is connected to, so
  298. a job can be given directly to this worker with a JOB_ASSIGN and
  299. no worker wake-up is required.
  300. Arguments:
  301. - None.
  302. Worker Responses
  303. ----------------
  304. These response types may only be sent to a worker:
  305. NOOP
  306. This is used to wake up a sleeping worker so that it may grab a
  307. pending job.
  308. Arguments:
  309. - None.
  310. NO_JOB
  311. This is given in response to a GRAB_JOB request to notify the
  312. worker there are no pending jobs that need to run.
  313. Arguments:
  314. - None.
  315. JOB_ASSIGN
  316. This is given in response to a GRAB_JOB request to give the worker
  317. information needed to run the job. All communication about the
  318. job (such as status updates and completion response) should use
  319. the handle, and the worker should run the given function with
  320. the argument.
  321. Arguments:
  322. - NULL byte terminated job handle.
  323. - NULL byte terminated function name.
  324. - Opaque data that is given to the function as an argument.
  325. JOB_ASSIGN_UNIQ
  326. This is given in response to a GRAB_JOB_UNIQ request and acts
  327. just like JOB_ASSIGN but with the client assigned unique ID.
  328. Arguments:
  329. - NULL byte terminated job handle.
  330. - NULL byte terminated function name.
  331. - NULL byte terminated unique ID.
  332. - Opaque data that is given to the function as an argument.
  333. Administrative Protocol
  334. -----------------------
  335. The Gearman job server also supports a text-based protocol to pull
  336. information and run some administrative tasks. This runs on the same
  337. port as the binary protocol, and the server differentiates between
  338. the two by looking at the first character. If it is a NULL (\0),
  339. then it is binary, if it is non-NULL, that it attempts to parse it
  340. as a text command. The following commands are supported:
  341. workers
  342. This sends back a list of all workers, their file descriptors,
  343. their IPs, their IDs, and a list of registered functions they can
  344. perform. The list is terminated with a line containing a single
  345. '.' (period). The format is:
  346. FD IP-ADDRESS CLIENT-ID : FUNCTION ...
  347. Arguments:
  348. - None.
  349. status
  350. This sends back a list of all registered functions. Next to
  351. each function is the number of jobs in the queue, the number of
  352. running jobs, and the number of capable workers. The columns are
  353. tab separated, and the list is terminated with a line containing
  354. a single '.' (period). The format is:
  355. FUNCTION\tTOTAL\tRUNNING\tAVAILABLE_WORKERS
  356. Arguments:
  357. - None.
  358. maxqueue
  359. This sets the maximum queue size for a function. If no size is
  360. given, the default is used. If one size is given, it is applied to
  361. jobs regardless of priority. If three sizes are given, the sizes
  362. are used when testing high-priority, normal, and low-priority jobs,
  363. respectively. A zero or negative size indicates no limit. This
  364. command sends back a single line with "OK".
  365. Arguments:
  366. - Function name.
  367. - Optional maximum queue size (to apply one maximum at all priorities), or
  368. three optional maximum queue sizes (to enforce for high-, normal-, and
  369. low-priority job submissions).
  370. shutdown
  371. Shutdown the server. If the optional "graceful" argument is used,
  372. close the listening socket and let all existing connections
  373. complete.
  374. Arguments:
  375. - Optional "graceful" mode.
  376. version
  377. Send back the version of the server.
  378. Arguments:
  379. - None.
  380. The Perl version also has a 'gladiator' command that uses the
  381. 'Devel::Gladiator' Perl module and is used for debugging.
  382. Binary Protocol Example
  383. -----------------------
  384. This example will step through a simple interaction where a worker
  385. connects and registers for a function named "reverse", the client
  386. connects and submits a job for this function, and the worker performs
  387. this job and responds with a result. This shows every byte that needs
  388. to be sent over the wire in order for the job to be run to completion.
  389. Worker registration:
  390. Worker -> Job Server
  391. 00 52 45 51 \0REQ (Magic)
  392. 00 00 00 01 1 (Packet type: CAN_DO)
  393. 00 00 00 07 7 (Packet length)
  394. 72 65 76 65 72 73 65 reverse (Function)
  395. Worker check for job:
  396. Worker -> Job Server
  397. 00 52 45 51 \0REQ (Magic)
  398. 00 00 00 09 9 (Packet type: GRAB_JOB)
  399. 00 00 00 00 0 (Packet length)
  400. Job Server -> Worker
  401. 00 52 45 53 \0RES (Magic)
  402. 00 00 00 0a 10 (Packet type: NO_JOB)
  403. 00 00 00 00 0 (Packet length)
  404. Worker -> Job Server
  405. 00 52 45 51 \0REQ (Magic)
  406. 00 00 00 04 4 (Packet type: PRE_SLEEP)
  407. 00 00 00 00 0 (Packet length)
  408. Client job submission:
  409. Client -> Job Server
  410. 00 52 45 51 \0REQ (Magic)
  411. 00 00 00 07 7 (Packet type: SUBMIT_JOB)
  412. 00 00 00 0d 13 (Packet length)
  413. 72 65 76 65 72 73 65 00 reverse\0 (Function)
  414. 00 \0 (Unique ID)
  415. 74 65 73 74 test (Workload)
  416. Job Server -> Client
  417. 00 52 45 53 \0RES (Magic)
  418. 00 00 00 08 8 (Packet type: JOB_CREATED)
  419. 00 00 00 07 7 (Packet length)
  420. 48 3a 6c 61 70 3a 31 H:lap:1 (Job handle)
  421. Worker wakeup:
  422. Job Server -> Worker
  423. 00 52 45 53 \0RES (Magic)
  424. 00 00 00 06 6 (Packet type: NOOP)
  425. 00 00 00 00 0 (Packet length)
  426. Worker check for job:
  427. Worker -> Job Server
  428. 00 52 45 51 \0REQ (Magic)
  429. 00 00 00 09 9 (Packet type: GRAB_JOB)
  430. 00 00 00 00 0 (Packet length)
  431. Job Server -> Worker
  432. 00 52 45 53 \0RES (Magic)
  433. 00 00 00 0b 11 (Packet type: JOB_ASSIGN)
  434. 00 00 00 14 20 (Packet length)
  435. 48 3a 6c 61 70 3a 31 00 H:lap:1\0 (Job handle)
  436. 72 65 76 65 72 73 65 00 reverse\0 (Function)
  437. 74 65 73 74 test (Workload)
  438. Worker response for job:
  439. Worker -> Job Server
  440. 00 52 45 51 \0REQ (Magic)
  441. 00 00 00 0d 13 (Packet type: WORK_COMPLETE)
  442. 00 00 00 0c 12 (Packet length)
  443. 48 3a 6c 61 70 3a 31 00 H:lap:1\0 (Job handle)
  444. 74 73 65 74 tset (Response)
  445. Job server response to client:
  446. Job Server -> Client
  447. 00 52 45 53 \0RES (Magic)
  448. 00 00 00 0d 13 (Packet type: WORK_COMPLETE)
  449. 00 00 00 0c 12 (Packet length)
  450. 48 3a 6c 61 70 3a 31 00 H:lap:1\0 (Job handle)
  451. 74 73 65 74 tset (Response)
  452. At this point, the worker would then ask for more jobs to run (the
  453. "Check for job" state above), and the client could submit more
  454. jobs. Note that the client is full duplex and could have multiple
  455. jobs being run over a single socket at the same time. The result
  456. packets may not be sent in the same order the jobs were submitted
  457. and instead interleaved with other job result packets.