TODO 33 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890
  1. * Soon
  2. ** gnulib
  3. Bruno notes:
  4. > I haven't looked deeply, but it strikes me that gnulib/lib/bitset/array.c
  5. > does not make use of the 'ffsl' function, nor or the 'integer_length_l'
  6. > function. Maybe because in Bison, all bitsets are so dense that it does
  7. > not give a performance advantage?
  8. ** Cex
  9. *** Improve gnulib
  10. Don't do this (counterexample.c):
  11. // This is the fastest way to get the tail node from the gl_list API.
  12. gl_list_node_t
  13. list_get_end (gl_list_t list)
  14. {
  15. gl_list_node_t sentinel = gl_list_add_last (list, NULL);
  16. gl_list_node_t res = gl_list_previous_node (list, sentinel);
  17. gl_list_remove_node (list, sentinel);
  18. return res;
  19. }
  20. *** Ambiguous rewriting
  21. If the user is stupid enough to have equal rules, then the derivations are
  22. harder to read:
  23. Reduce/reduce conflict on tokens $end, "+", "⊕":
  24. 2 exp: exp "+" exp .
  25. 3 exp: exp "+" exp .
  26. Example exp "+" exp •
  27. First derivation exp ::=[ exp "+" exp • ]
  28. Example exp "+" exp •
  29. Second derivation exp ::=[ exp "+" exp • ]
  30. Do we care about this? In color, we use twice the same color here, but we
  31. could try to use the same color for the same rule.
  32. *** XML reports
  33. Show the counterexamples. This is going to be really hard and/or painful.
  34. Unless we play it dumb (little structure).
  35. ** Bistromathic
  36. - How about not evaluating incomplete lines when the text is not finished
  37. (as shells do).
  38. ** Questions
  39. *** Java
  40. - Should i18n be part of the Lexer? Currently it's a static method of
  41. Lexer.
  42. - is there a migration path that would allow to use TokenKinds in
  43. yylex?
  44. - define the tokens as an enum too.
  45. - promote YYEOF rather than EOF.
  46. ** YYerror
  47. https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-runtime/intl/plural.y;h=a712255af4f2f739c93336d4ff6556d932a426a5;hb=HEAD
  48. should be updated to not use YYERRCODE. Returning an undef token is good
  49. enough.
  50. ** Java
  51. *** calc.at
  52. Stop hard-coding "Calc". Adjust local.at (look for FIXME).
  53. ** A dev warning for b4_
  54. Maybe we should check for m4_ and b4_ leaking out of the m4 processing, as
  55. Autoconf does. It would have caught over-quotation issues.
  56. ** doc
  57. I feel it's ugly to use the GNU style to declare functions in the doc. It
  58. generates tons of white space in the page, and may contribute to bad page
  59. breaks.
  60. ** consistency
  61. token vs terminal.
  62. ** api.token.raw
  63. The YYUNDEFTOK could be assigned a semantic value so that yyerror could be
  64. used to report invalid lexemes.
  65. ** push parsers
  66. Consider deprecating impure push parsers. They add a lot of complexity, for
  67. a bad feature. On the other hand, that would make it much harder to sit
  68. push parsers on top of pull parser. Which is currently not relevant, since
  69. push parsers are measurably slower.
  70. ** %define parse.error formatted
  71. How about pushing Bistromathic's yyreport_syntax_error as another standard
  72. way to generate the error message, and leave to the user the task of
  73. providing the message formats? Currently in bistro, it reads:
  74. const char *
  75. error_format_string (int argc)
  76. {
  77. switch (argc)
  78. {
  79. default: /* Avoid compiler warnings. */
  80. case 0: return _("%@: syntax error");
  81. case 1: return _("%@: syntax error: unexpected %u");
  82. // TRANSLATORS: '%@' is a location in a file, '%u' is an
  83. // "unexpected token", and '%0e', '%1e'... are expected tokens
  84. // at this point.
  85. //
  86. // For instance on the expression "1 + * 2", you'd get
  87. //
  88. // 1.5: syntax error: expected - or ( or number or function or variable before *
  89. case 2: return _("%@: syntax error: expected %0e before %u");
  90. case 3: return _("%@: syntax error: expected %0e or %1e before %u");
  91. case 4: return _("%@: syntax error: expected %0e or %1e or %2e before %u");
  92. case 5: return _("%@: syntax error: expected %0e or %1e or %2e or %3e before %u");
  93. case 6: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e before %u");
  94. case 7: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e or %5e before %u");
  95. case 8: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e or %5e or %6e before %u");
  96. }
  97. }
  98. The message would have to be generated in a string, and pushed to yyerror.
  99. Which will be a pain in the neck in yacc.c.
  100. If we want to do that, we should think very carefully about the syntax of
  101. the format string.
  102. ** yyclearin does not invoke the lookahead token's %destructor
  103. https://lists.gnu.org/r/bug-bison/2018-02/msg00000.html
  104. Rici:
  105. > Modifying yyclearin so that it calls yydestruct seems like the simplest
  106. > solution to this issue, but it is conceivable that such a change would
  107. > break programs which already perform some kind of workaround in order to
  108. > destruct the lookahead symbol. So it might be necessary to use some kind of
  109. > compatibility %define, or to create a new replacement macro with a
  110. > different name such as yydiscardin.
  111. >
  112. > At a minimum, the fact that yyclearin does not invoke the %destructor
  113. > should be highlighted in the documentation, since it is not at all obvious.
  114. ** Issues in i18n
  115. Les catégories d'avertissements incluent :
  116. conflicts-sr conflits S/R (activé par défaut)
  117. conflicts-rr conflits R/R (activé par défaut)
  118. dangling-alias l'alias chaîne n'est pas attaché à un symbole
  119. deprecated construction obsolète
  120. empty-rule règle vide sans %empty
  121. midrule-values valeurs de règle intermédiaire non définies ou inutilisées
  122. precedence priorité et associativité inutiles
  123. yacc incompatibilités avec POSIX Yacc
  124. other tous les autres avertissements (activé par défaut)
  125. all tous les avertissements sauf « dangling-alias » et « yacc »
  126. no-CATEGORY désactiver les avertissements dans CATEGORIE
  127. none désactiver tous les avertissements
  128. error[=CATEGORY] traiter les avertissements comme des erreurs
  129. Line -1 and -3 should mention CATEGORIE, not CATEGORY.
  130. * Bison 3.8
  131. ** Rewrite glr.cc
  132. Get rid of scaffolding in glr.c.
  133. ** Unit rules / Injection rules (Akim Demaille)
  134. Maybe we could expand unit rules (or "injections", see
  135. https://homepages.cwi.nl/~daybuild/daily-books/syntax/2-sdf/sdf.html), i.e.,
  136. transform
  137. exp: arith | bool;
  138. arith: exp '+' exp;
  139. bool: exp '&' exp;
  140. into
  141. exp: exp '+' exp | exp '&' exp;
  142. when there are no actions. This can significantly speed up some grammars.
  143. I can't find the papers. In particular the book 'LR parsing: Theory and
  144. Practice' is impossible to find, but according to 'Parsing Techniques: a
  145. Practical Guide', it includes information about this issue. Does anybody
  146. have it?
  147. ** clean up (Akim Demaille)
  148. Do not work on these items now, as I (Akim) have branches with a lot of
  149. changes in this area (hitting several files), and no desire to have to fix
  150. conflicts. Addressing these items will happen after my branches have been
  151. merged.
  152. *** lalr.c
  153. Introduce a goto struct, and use it in place of from_state/to_state.
  154. Rename states1 as path, length as pathlen.
  155. Introduce inline functions for things such as nullable[*rp - ntokens]
  156. where we need to map from symbol number to nterm number.
  157. There are probably a significant part of the relations management that
  158. should be migrated on top of a bitsetv.
  159. *** closure
  160. It should probably take a "state*" instead of two arguments.
  161. *** traces
  162. The "automaton" and "set" categories are not so useful. We should probably
  163. introduce lr(0) and lalr, just the way we have ielr categories. The
  164. "closure" function is too verbose, it should probably have its own category.
  165. "set" can still be used for summarizing the important sets. That would make
  166. tests easy to maintain.
  167. *** complain.*
  168. Rename these guys as "diagnostics.*" (or "diagnose.*"), since that's the
  169. name they have in GCC, clang, etc. Likewise for the complain_* series of
  170. functions.
  171. *** ritem
  172. states/nstates, rules/nrules, ..., ritem/nritems
  173. Fix the latter.
  174. * D programming language
  175. There's a number of features that are missing, here sorted in _suggested_
  176. order of implementation.
  177. When copying code from other skeletons, keep the comments exactly as they
  178. are. Keep the same variable names. If you change the wording in one place,
  179. do it in the others too. In other words: make sure to keep the
  180. maintenance *simple* by avoiding any gratuitous difference.
  181. ** Rename the D example
  182. Move the current content of examples/d into examples/d/simple.
  183. ** Create a second example
  184. Duplicate examples/d/simple into examples/d/calc.
  185. ** Add location tracking to d/calc
  186. Look at the examples in the other languages to see how to do that.
  187. ** yysymbol_name
  188. The SymbolKind is an enum. For a given SymbolKind we want to get its string
  189. representation. Currently it's a separate table in the parser that does
  190. that:
  191. /* Symbol kinds. */
  192. public enum SymbolKind
  193. {
  194. S_YYEMPTY = -2, /* No symbol. */
  195. S_YYEOF = 0, /* "end of file" */
  196. S_YYerror = 1, /* error */
  197. S_YYUNDEF = 2, /* "invalid token" */
  198. S_EQ = 3, /* "=" */
  199. ...
  200. S_input = 14, /* input */
  201. S_line = 15, /* line */
  202. S_exp = 16, /* exp */
  203. };
  204. ...
  205. /* YYTNAME[SYMBOL-NUM] -- String name of the symbol SYMBOL-NUM.
  206. First, the terminals, then, starting at \a yyntokens_, nonterminals. */
  207. private static immutable string[] yytname_ =
  208. [
  209. "\"end of file\"", "error", "\"invalid token\"", "\"=\"", "\"+\"",
  210. "\"-\"", "\"*\"", "\"/\"", "\"(\"", "\")\"", "\"end of line\"",
  211. "\"number\"", "UNARY", "$accept", "input", "line", "exp", null
  212. ];
  213. ...
  214. So to get a symbol kind, one runs `yytname_[yykind]`.
  215. Is there a way to attach this conversion to string to SymbolKind? In Java
  216. for instance, we have:
  217. public enum SymbolKind
  218. {
  219. S_YYEOF(0), /* "end of file" */
  220. S_YYerror(1), /* error */
  221. S_YYUNDEF(2), /* "invalid token" */
  222. ...
  223. S_input(16), /* input */
  224. S_line(17), /* line */
  225. S_exp(18); /* exp */
  226. private final int yycode_;
  227. SymbolKind (int n) {
  228. this.yycode_ = n;
  229. }
  230. ...
  231. /* YYNAMES_[SYMBOL-NUM] -- String name of the symbol SYMBOL-NUM.
  232. First, the terminals, then, starting at \a YYNTOKENS_, nonterminals. */
  233. private static final String[] yynames_ = yynames_init();
  234. private static final String[] yynames_init()
  235. {
  236. return new String[]
  237. {
  238. i18n("end of file"), i18n("error"), i18n("invalid token"), "!", "+", "-", "*",
  239. "/", "^", "(", ")", "=", i18n("end of line"), i18n("number"), "NEG",
  240. "$accept", "input", "line", "exp", null
  241. };
  242. }
  243. /* The user-facing name of this symbol. */
  244. public final String getName() {
  245. return yynames_[yycode_];
  246. }
  247. };
  248. which allows to write more naturally `yykind.getName()` rather than
  249. `yytname_[yykind]`. Is there something comparable in (idiomatic) D?
  250. ** Change the return value of yylex
  251. Historically people were allowed to return any int from the scanner (which
  252. is convenient and allows `return '+'` from the scanner). Akim tends to see
  253. this as an error, we should restrict the return values to TokenKind (not to
  254. be confused with SymbolKind).
  255. In the case of D, without the history, we have the choice to support or not
  256. `int`. If we want to _keep_ `int`, is there a way, say via introspection,
  257. to support both signatures of yylex? If we don't keep `int`, just move to
  258. TokenKind.
  259. ** Documentation
  260. Write documentation about D support in doc/bison.texi. Imitate the Java
  261. documentation. You should be more succinct IMHO.
  262. ** Complete Symbols
  263. The current interface from the scanner to the parser is somewhat clumsy: the
  264. token kind is returned by yylex, but the value and location are stored in
  265. the scanner. This reflects the fact that the implementation of the parser
  266. uses three variables to deal with each parsed symbol: its kind, its value,
  267. its location.
  268. So today the scanner of examples/d/calc.d (no locations) looks like:
  269. if (input.front.isNumber)
  270. {
  271. import std.conv : parse;
  272. semanticVal_.ival = input.parse!int;
  273. return TokenKind.NUM;
  274. }
  275. and the generated parser:
  276. /* Read a lookahead token. */
  277. if (yychar == TokenKind.YYEMPTY)
  278. {
  279. yychar = yylex ();
  280. yylval = yylexer.semanticVal;
  281. }
  282. The parser class should feature a `Symbol` type which binds together kind,
  283. value and location, and the scanner should be able to return an instance of
  284. that type. Something like
  285. if (input.front.isNumber)
  286. {
  287. import std.conv : parse;
  288. return parser.Symbol (TokenKind.NUM, input.parse!int);
  289. }
  290. ** Token Constructors
  291. In the previous example it is possible to mix incorrectly kinds and values,
  292. and for instance:
  293. return parser.Symbol (TokenKind.NUM, "Hello, World!\n");
  294. attaches a string value to NUM kind (wrong, of course). When
  295. api.token.constructor is set, in C++, Bison generated "token constructors":
  296. parser.make_NUM. parser.make_PLUS, parser.make_STRING, etc. The previous
  297. example becomes
  298. return parser.make_NUM ("Hello, World!\n");
  299. which would easily be caught by the type checker.
  300. ** Lookahead Correction
  301. Add support for LAC to the D skeleton. It should not be too hard: look how
  302. this is done in lalr1.cc, and mock it.
  303. ** Push Parser
  304. Add support for push parser. Do not start a nice skeleton, just enhance the
  305. current one to support push parsers. This is going to be a tougher nut to
  306. crack.
  307. First, you need to understand well how the push parser is expected to work.
  308. To this end:
  309. - read the doc
  310. - look at examples/c/pushcalc
  311. - create an example of a Java push parser.
  312. - have a look at the generated parser in Java, which has the advantage of
  313. being already based on a parser object, instead of just a function.
  314. The C case is harder to read, but it may help too. Keep in mind that
  315. because there's no object to maintain state, the C push parser uses some
  316. struct (yypstate) to preserve this state. We don't need this in D, the
  317. parser object will suffice.
  318. I think working directly on the skeleton to add push-parser support is not
  319. the simplest path. I suggest that you (1) transform a generated parser into
  320. a push parser by hand, and then (2) transform lalr1.d to generate such a
  321. parser.
  322. Use `git commit` frequently to make sure you keep track of your progress.
  323. *** (1.a) Prepare pull parser by hand
  324. Copy again one of the D examples into say examples/d/pushcalc. Also
  325. check-in the generated parser to facilitate experimentation.
  326. - find local variables of yyparse should become members of the parser object
  327. (so that we preserve state from one call to the next).
  328. - do it in your generated D parser. We don't need an equivalent for
  329. yypstate, because we already have it: that the parser object itself.
  330. - have your *pull*-parser (i.e., the good old yy::parser::parse()) work
  331. properly this way. Write and run tests. That's one of the reasons I
  332. suggest using examples/d/calc as a starting point: it already has tests,
  333. you can/should add more.
  334. At this point you have a pull-parser which you prepared to turn into a
  335. push-parser.
  336. *** (1.b) Turn pull parser into push parser by hand
  337. - look again at how push parsers are implemented in Java/C to see what needs
  338. to change in yyparse so that the control is inverted: parse() will
  339. be *given* the tokens, instead of having to call yylex itself. When I say
  340. "look at C", I think your best option are (i) yacc.c (look for b4_push_if)
  341. and (ii) examples/c/pushcalc.
  342. - rename parse() as push_parse(Symbol yyla) (or push_parse(TokenKind, Value,
  343. Location)) that takes the symbol as argument. That's the push parser we
  344. are looking for.
  345. - define a new parse() function which has the same signature as the usual
  346. pull-parser, that repeatedly calls the push_parse function. Something
  347. like this:
  348. int parse ()
  349. {
  350. int status = 0;
  351. do {
  352. status = this->push_parse (yylex());
  353. } while (status == YYPUSH_MORE);
  354. return status;
  355. }
  356. - show me that parser, so that we can validate the approach.
  357. *** (2) Port that into the skeleton
  358. - once we agree on the API of the push parser, implement it into lalr1.d.
  359. You will probaby need help on this regard, but imitation, again, should
  360. help.
  361. - have example/d/pushcalc work properly and pass tests
  362. - add tests in the "real" test suite. Do that in tests/calc.at. I can
  363. help.
  364. - document
  365. ** GLR Parser
  366. This is very ambitious. That's the final boss. There are currently no
  367. "clean" implementation to get inspiration from.
  368. glr.c is very clean but:
  369. - is low-level C
  370. - is a different skeleton from yacc.c
  371. glr.cc is (currently) an ugly hack: a C++ shell around glr.c. Valentin
  372. Tolmer is currently rewriting glr.cc to be clean C++, but he is not
  373. finished. There will be a lot a common code between lalr1.cc and glr.cc, so
  374. eventually I would like them to be fused into a single skeleton, supporting
  375. both deterministic and generalized parsing.
  376. It would be great for D to also support this.
  377. The basic ideas of GLR are explained here:
  378. https://www.codeproject.com/Articles/5259825/GLR-Parsing-in-Csharp-How-to-Use-The-Most-Powerful
  379. * Better error messages
  380. The users are not provided with enough tools to forge their error messages.
  381. See for instance "Is there an option to change the message produced by
  382. YYERROR_VERBOSE?" by Simon Sobisch, on bison-help.
  383. See also
  384. https://www.cs.tufts.edu/~nr/cs257/archive/clinton-jefferey/lr-error-messages.pdf
  385. https://research.swtch.com/yyerror
  386. http://gallium.inria.fr/~fpottier/publis/fpottier-reachability-cc2016.pdf
  387. * Modernization
  388. Fix data/skeletons/yacc.c so that it defines YYPTRDIFF_T properly for modern
  389. and older C++ compilers. Currently the code defaults to defining it to
  390. 'long' for non-GCC compilers, but it should use the proper C++ magic to
  391. define it to the same type as the C ptrdiff_t type.
  392. * Completion
  393. Several features are not available in all the back-ends.
  394. - lac: D, Java (easy)
  395. - push parsers: glr.c, glr.cc, lalr1.cc (not very difficult)
  396. - token constructors: Java, C, D (a bit difficult)
  397. - glr: D, Java (super difficult)
  398. * Bugs
  399. ** Autotest has quotation issues
  400. tests/input.at:1730:AT_SETUP([%define errors])
  401. ->
  402. $ ./tests/testsuite -l | grep errors | sed q
  403. 38: input.at:1730 errors
  404. * Short term
  405. ** Get rid of YYPRINT and b4_toknum
  406. Besides yytoknum is wrong when api.token.raw is defined.
  407. ** Better design for diagnostics
  408. The current implementation of diagnostics is ad hoc, it grew organically.
  409. It works as a series of calls to several functions, with dependency of the
  410. latter calls on the former. For instance:
  411. complain (&sym->location,
  412. sym->content->status == needed ? complaint : Wother,
  413. _("symbol %s is used, but is not defined as a token"
  414. " and has no rules; did you mean %s?"),
  415. quote_n (0, sym->tag),
  416. quote_n (1, best->tag));
  417. if (feature_flag & feature_caret)
  418. location_caret_suggestion (sym->location, best->tag, stderr);
  419. We should rewrite this in a more FP way:
  420. 1. build a rich structure that denotes the (complete) diagnostic.
  421. "Complete" in the sense that it also contains the suggestions, the list
  422. of possible matches, etc.
  423. 2. send this to the pretty-printing routine. The diagnostic structure
  424. should be sufficient so that we can generate all the 'format' of
  425. diagnostics, including the fixits.
  426. If properly done, this diagnostic module can be detached from Bison and be
  427. put in gnulib. It could be used, for instance, for errors caught by
  428. xgettext.
  429. There's certainly already something alike in GCC. At least that's the
  430. impression I get from reading the "-fdiagnostics-format=FORMAT" part of this
  431. page:
  432. https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html
  433. ** Graphviz display code thoughts
  434. The code for the --graph option is over two files: print_graph, and
  435. graphviz. This is because Bison used to also produce VCG graphs, but since
  436. this is no longer true, maybe we could consider these files for fusion.
  437. An other consideration worth noting is that print_graph.c (correct me if I
  438. am wrong) should contain generic functions, whereas graphviz.c and other
  439. potential files should contain just the specific code for that output
  440. format. It will probably prove difficult to tell if the implementation is
  441. actually generic whilst only having support for a single format, but it
  442. would be nice to keep stuff a bit tidier: right now, the construction of the
  443. bitset used to show reductions is in the graphviz-specific code, and on the
  444. opposite side we have some use of \l, which is graphviz-specific, in what
  445. should be generic code.
  446. Little effort seems to have been given to factoring these files and their
  447. print{,-xml} counterpart. We would very much like to re-use the pretty format
  448. of states from .output for the graphs, etc.
  449. Since graphviz dies on medium-to-big grammars, maybe consider an other tool?
  450. ** push-parser
  451. Check it too when checking the different kinds of parsers. And be
  452. sure to check that the initial-action is performed once per parsing.
  453. ** m4 names
  454. b4_shared_declarations is no longer what it is. Make it
  455. b4_parser_declaration for instance.
  456. ** yychar in lalr1.cc
  457. There is a large difference bw maint and master on the handling of
  458. yychar (which was removed in lalr1.cc). See what needs to be
  459. back-ported.
  460. /* User semantic actions sometimes alter yychar, and that requires
  461. that yytoken be updated with the new translation. We take the
  462. approach of translating immediately before every use of yytoken.
  463. One alternative is translating here after every semantic action,
  464. but that translation would be missed if the semantic action
  465. invokes YYABORT, YYACCEPT, or YYERROR immediately after altering
  466. yychar. In the case of YYABORT or YYACCEPT, an incorrect
  467. destructor might then be invoked immediately. In the case of
  468. YYERROR, subsequent parser actions might lead to an incorrect
  469. destructor call or verbose syntax error message before the
  470. lookahead is translated. */
  471. /* Make sure we have latest lookahead translation. See comments at
  472. user semantic actions for why this is necessary. */
  473. yytoken = yytranslate_ (yychar);
  474. ** Get rid of fake #lines [Bison: ...]
  475. Possibly as simple as checking whether the column number is nonnegative.
  476. I have seen messages like the following from GCC.
  477. <built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
  478. ** Discuss about %printer/%destroy in the case of C++.
  479. It would be very nice to provide the symbol classes with an operator<<
  480. and a destructor. Unfortunately the syntax we have chosen for
  481. %destroy and %printer make them hard to reuse. For instance, the user
  482. is invited to write something like
  483. %printer { debug_stream() << $$; } <my_type>;
  484. which is hard to reuse elsewhere since it wants to use
  485. "debug_stream()" to find the stream to use. The same applies to
  486. %destroy: we told the user she could use the members of the Parser
  487. class in the printers/destructors, which is not good for an operator<<
  488. since it is no longer bound to a particular parser, it's just a
  489. (standalone symbol).
  490. * Various
  491. ** Rewrite glr.cc in C++ (Valentin Tolmer)
  492. As a matter of fact, it would be very interesting to see how much we can
  493. share between lalr1.cc and glr.cc. Most of the skeletons should be common.
  494. It would be a very nice source of inspiration for the other languages.
  495. Valentin Tolmer is working on this.
  496. ** yychar == YYEMPTY
  497. The code in yyerrlab reads:
  498. if (yychar <= YYEOF)
  499. {
  500. /* Return failure if at end of input. */
  501. if (yychar == YYEOF)
  502. YYABORT;
  503. }
  504. There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
  505. But I can't produce the situation where yychar is YYEMPTY here, is it
  506. really possible? The test suite does not exercise this case.
  507. This shows that it would be interesting to manage to install skeleton
  508. coverage analysis to the test suite.
  509. * From lalr1.cc to yacc.c
  510. ** Single stack
  511. Merging the three stacks in lalr1.cc simplified the code, prompted for
  512. other improvements and also made it faster (probably because memory
  513. management is performed once instead of three times). I suggest that
  514. we do the same in yacc.c.
  515. (Some time later): it's also very nice to have three stacks: it's more dense
  516. as we don't lose bits to padding. For instance the typical stack for states
  517. will use 8 bits, while it is likely to consume 32 bits in a struct.
  518. We need trustworthy benchmarks for Bison, for all our backends. Akim has a
  519. few things scattered around; we need to put them in the repo, and make them
  520. more useful.
  521. * Report
  522. ** Figures
  523. Some statistics about the grammar and the parser would be useful,
  524. especially when asking the user to send some information about the
  525. grammars she is working on. We should probably also include some
  526. information about the variables (I'm not sure for instance we even
  527. specify what LR variant was used).
  528. ** GLR
  529. How would Paul like to display the conflicted actions? In particular,
  530. what when two reductions are possible on a given lookahead token, but one is
  531. part of $default. Should we make the two reductions explicit, or just
  532. keep $default? See the following point.
  533. ** Disabled Reductions
  534. See 'tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
  535. what we want to do.
  536. ** Documentation
  537. Extend with error productions. The hard part will probably be finding
  538. the right rule so that a single state does not exhibit too many yet
  539. undocumented ''features''. Maybe an empty action ought to be
  540. presented too. Shall we try to make a single grammar with all these
  541. features, or should we have several very small grammars?
  542. * Extensions
  543. ** More languages?
  544. Well, only if there is really some demand for it.
  545. *** PHP
  546. https://github.com/scfc/bison-php/blob/master/data/lalr1.php
  547. *** Python
  548. https://lists.gnu.org/r/bison-patches/2013-09/msg00000.html and following
  549. ** Multiple start symbols
  550. Would be very useful when parsing closely related languages. The idea is to
  551. declare several start symbols, for instance
  552. %start stmt expr
  553. %%
  554. stmt: ...
  555. expr: ...
  556. and to generate parse(), parse_stmt() and parse_expr(). Technically, the
  557. above grammar would be transformed into
  558. %start yy_start
  559. %token YY_START_STMT YY_START_EXPR
  560. %%
  561. yy_start: YY_START_STMT stmt | YY_START_EXPR expr
  562. so that there are no new conflicts in the grammar (as would undoubtedly
  563. happen with yy_start: stmt | expr). Then adjust the skeletons so that this
  564. initial token (YY_START_STMT, YY_START_EXPR) be shifted first in the
  565. corresponding parse function.
  566. ** %include
  567. This is a popular demand. We already made many changes in the parser that
  568. should make this reasonably easy to implement.
  569. Bruce Mardle <marblypup@yahoo.co.uk>
  570. https://lists.gnu.org/r/bison-patches/2015-09/msg00000.html
  571. However, there are many other things to do before having such a feature,
  572. because I don't want a % equivalent to #include (which we all learned to
  573. hate). I want something that builds "modules" of grammars, and assembles
  574. them together, paying attention to keep separate bits separated, in pseudo
  575. name spaces.
  576. ** Push parsers
  577. There is demand for push parsers in Java and C++. And GLR I guess.
  578. ** Generate code instead of tables
  579. This is certainly quite a lot of work. See
  580. https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.4539.
  581. ** $-1
  582. We should find a means to provide an access to values deep in the
  583. stack. For instance, instead of
  584. baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
  585. we should be able to have:
  586. foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
  587. Or something like this.
  588. ** %if and the like
  589. It should be possible to have %if/%else/%endif. The implementation is
  590. not clear: should it be lexical or syntactic. Vadim Maslow thinks it
  591. must be in the scanner: we must not parse what is in a switched off
  592. part of %if. Akim Demaille thinks it should be in the parser, so as
  593. to avoid falling into another CPP mistake.
  594. (Later): I'm sure there's actually good case for this. People who need that
  595. feature can use m4/cpp on top of Bison. I don't think it is worth the
  596. trouble in Bison itself.
  597. ** XML Output
  598. There are couple of available extensions of Bison targeting some XML
  599. output. Some day we should consider including them. One issue is
  600. that they seem to be quite orthogonal to the parsing technique, and
  601. seem to depend mostly on the possibility to have some code triggered
  602. for each reduction. As a matter of fact, such hooks could also be
  603. used to generate the yydebug traces. Some generic scheme probably
  604. exists in there.
  605. XML output for GNU Bison and gcc
  606. http://www.cs.may.ie/~jpower/Research/bisonXML/
  607. XML output for GNU Bison
  608. http://yaxx.sourceforge.net/
  609. * Coding system independence
  610. Paul notes:
  611. Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
  612. 255). It also assumes that the 8-bit character encoding is
  613. the same for the invocation of 'bison' as it is for the
  614. invocation of 'cc', but this is not necessarily true when
  615. people run bison on an ASCII host and then use cc on an EBCDIC
  616. host. I don't think these topics are worth our time
  617. addressing (unless we find a gung-ho volunteer for EBCDIC or
  618. PDP-10 ports :-) but they should probably be documented
  619. somewhere.
  620. More importantly, Bison does not currently allow NUL bytes in
  621. tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
  622. the source code. This should get fixed.
  623. * Broken options?
  624. ** %token-table
  625. ** Skeleton strategy
  626. Must we keep %token-table?
  627. * Precedence
  628. ** Partial order
  629. It is unfortunate that there is a total order for precedence. It
  630. makes it impossible to have modular precedence information. We should
  631. move to partial orders (sounds like series/parallel orders to me).
  632. This is a prerequisite for modules.
  633. * Pre and post actions.
  634. From: Florian Krohm <florian@edamail.fishkill.ibm.com>
  635. Subject: YYACT_EPILOGUE
  636. To: bug-bison@gnu.org
  637. X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
  638. The other day I had the need for explicitly building the parse tree. I
  639. used %locations for that and defined YYLLOC_DEFAULT to call a function
  640. that returns the tree node for the production. Easy. But I also needed
  641. to assign the S-attribute to the tree node. That cannot be done in
  642. YYLLOC_DEFAULT, because it is invoked before the action is executed.
  643. The way I solved this was to define a macro YYACT_EPILOGUE that would
  644. be invoked after the action. For reasons of symmetry I also added
  645. YYACT_PROLOGUE. Although I had no use for that I can envision how it
  646. might come in handy for debugging purposes.
  647. All is needed is to add
  648. #if YYLSP_NEEDED
  649. YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
  650. #else
  651. YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
  652. #endif
  653. at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
  654. I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
  655. to bison. If you're interested, I'll work on a patch.
  656. * Better graphics
  657. Equip the parser with a means to create the (visual) parse tree.
  658. -----
  659. # LocalWords: Cex gnulib gl Bistromathic TokenKinds yylex enum YYEOF EOF
  660. # LocalWords: YYerror gettext af hb YYERRCODE undef calc FIXME dev yyerror
  661. # LocalWords: Autoconf YYUNDEFTOK lexemes parsers Bistromathic's yyreport
  662. # LocalWords: const argc yacc yyclearin lookahead destructor Rici incluent
  663. # LocalWords: yydestruct yydiscardin catégories d'avertissements sr activé
  664. # LocalWords: conflits défaut rr l'alias chaîne n'est attaché un symbole
  665. # LocalWords: obsolète règle vide midrule valeurs de intermédiaire ou avec
  666. # LocalWords: définies inutilisées priorité associativité inutiles POSIX
  667. # LocalWords: incompatibilités tous les autres avertissements sauf dans rp
  668. # LocalWords: désactiver CATEGORIE traiter comme des erreurs glr Akim bool
  669. # LocalWords: Demaille arith lalr goto struct pathlen nullable ntokens lr
  670. # LocalWords: nterm bitsetv ielr ritem nstates nrules nritems yysymbol EQ
  671. # LocalWords: SymbolKind YYEMPTY YYUNDEF YYTNAME NUM yyntokens yytname sed
  672. # LocalWords: nonterminals yykind yycode YYNAMES yynames init getName conv
  673. # LocalWords: TokenKind semanticVal ival yychar yylval yylexer Tolmer hoc
  674. # LocalWords: Sobisch YYPTRDIFF ptrdiff Autotest YYPRINT toknum yytoknum
  675. # LocalWords: sym Wother stderr FP fixits xgettext fdiagnostics Graphviz
  676. # LocalWords: graphviz VCG bitset xml bw maint yytoken YYABORT deps
  677. # LocalWords: YYACCEPT yytranslate nonnegative destructors yyerrlab repo
  678. # LocalWords: backends stmt expr yy Mardle baz qux Vadim Maslow CPP cpp
  679. # LocalWords: yydebug gcc UCHAR EBCDIC gung PDP NUL Pre Florian Krohm utf
  680. # LocalWords: YYACT YYLLOC YYLSP yyval yyvsp yylen yyloc yylsp endif
  681. # LocalWords: ispell american
  682. Local Variables:
  683. mode: outline
  684. coding: utf-8
  685. fill-column: 76
  686. ispell-dictionary: "american"
  687. End:
  688. Copyright (C) 2001-2004, 2006, 2008-2015, 2018-2021 Free Software
  689. Foundation, Inc.
  690. This file is part of Bison, the GNU Compiler Compiler.
  691. Permission is granted to copy, distribute and/or modify this document
  692. under the terms of the GNU Free Documentation License, Version 1.3 or
  693. any later version published by the Free Software Foundation; with no
  694. Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
  695. Texts. A copy of the license is included in the "GNU Free
  696. Documentation License" file as part of this distribution.