|
1 | | -# Design and open questions about libsyntax. |
| 1 | +# Design and open questions about libsyntax |
| 2 | + |
| 3 | + |
| 4 | +The high-level description of the architecture is in RFC.md. You might |
| 5 | +also want to dig through https://github.com/matklad/fall/ which |
| 6 | +contains some pretty interesting stuff build using similar ideas |
| 7 | +(warning: it is completely undocumented, poorly written and in general |
| 8 | +not the thing which I recommend to study (yes, this is |
| 9 | +self-contradictory)). |
| 10 | + |
| 11 | +## Tree |
| 12 | + |
| 13 | +The centerpiece of this whole endeavor is the syntax tree, in the |
| 14 | +`tree` module. Open questions: |
| 15 | + |
| 16 | +- how to best represent errors, to take advantage of the fact that |
| 17 | + they are rare, but to enable fully-persistent style structure |
| 18 | + sharing between tree nodes? |
| 19 | + |
| 20 | +- should we make red/green split from Roslyn more pronounced? |
| 21 | + |
| 22 | +- one can layout nodes in a single array in such a way that children |
| 23 | + of the node form a continuous slice. Seems nifty, but do we need it? |
| 24 | + |
| 25 | +- should we use SoA or AoS for NodeData? |
| 26 | + |
| 27 | +- should we split leaf nodes and internal nodes into separate arrays? |
| 28 | + Can we use it to save some bits here and there? (leaves don't need |
| 29 | + first_child field, for example). |
| 30 | + |
| 31 | + |
| 32 | +## Parser |
| 33 | + |
| 34 | +The syntax tree is produced using a three-staged process. |
| 35 | + |
| 36 | +First, a raw text is split into tokens with a lexer. Lexer has a |
| 37 | +peculiar signature: it is an `Fn(&str) -> Token`, where token is a |
| 38 | +pair of `SyntaxKind` (you should have read the `tree` module and RFC |
| 39 | +by this time! :)) and a len. That is, lexer chomps only the first |
| 40 | +token of the input. This forces the lexer to be stateless, and makes |
| 41 | +it possible to implement incremental relexing easily. |
| 42 | + |
| 43 | +Then, the bulk of work, the parser turns a stream of tokens into |
| 44 | +stream of events. Not that parser **does not** construct a tree right |
| 45 | +away. This is done for several reasons: |
| 46 | + |
| 47 | +* to decouple the actual tree data structure from the parser: you can |
| 48 | + build any datastructre you want from the stream of events |
| 49 | + |
| 50 | +* to make parsing fast: you can produce a list of events without |
| 51 | + allocations |
| 52 | + |
| 53 | +* to make it easy to tweak tree structure. Consider this code: |
| 54 | + |
| 55 | + ``` |
| 56 | + #[cfg(test)] |
| 57 | + pub fn foo() {} |
| 58 | + ``` |
| 59 | + |
| 60 | + Here, the attribute and the `pub` keyword must be the children of |
| 61 | + the `fn` node. However, when parsing them, we don't yet know if |
| 62 | + there would be a function ahead: it very well might be a `struct` |
| 63 | + there. If we use events, we generally don't care about this *in |
| 64 | + parser* and just spit them in order. |
| 65 | + |
| 66 | +* (Is this true?) to make incremental reparsing easier: you can reuse |
| 67 | + the same rope data structure for all of the original string, the |
| 68 | + tokens and the events. |
| 69 | + |
| 70 | + |
| 71 | +The parser also does not know about whitespace tokens: it's the job of |
| 72 | +the next layer to assign whitespace and comments to nodes. However, |
| 73 | +parser can remap contextual tokens, like `>>` or `union`, so it has |
| 74 | +access to the text. |
| 75 | + |
| 76 | +And at last, the TreeBuilder converts a flat stream of events into a |
| 77 | +tree structure. It also *should* be responsible for attaching comments |
| 78 | +and rebalancing the tree, but it does not do this yet :) |
| 79 | + |
| 80 | + |
| 81 | +## Error reporing |
| 82 | + |
| 83 | +TODO: describe how stuff like `skip_to_first` works |
| 84 | + |
| 85 | + |
| 86 | +## Validator |
| 87 | + |
| 88 | +Parser and lexer accept a lot of *invalid* code intentionally. The |
| 89 | +idea is to post-process the tree and to proper error reporting, |
| 90 | +literal conversion and quick-fix suggestions. There is no |
| 91 | +design/implementation for this yet. |
| 92 | + |
| 93 | + |
| 94 | +## AST |
| 95 | + |
| 96 | +Nothing yet, see `AstNode` in `fall`. |
0 commit comments