-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ty] Implement docstring rendering to markdown #21550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| let mut output = String::new(); | ||
| let mut first_line = true; | ||
| let mut block_indent = 0; | ||
| let mut in_doctest = false; | ||
| let mut starting_literal = false; | ||
| let mut in_literal = false; | ||
| let mut in_any_code = false; | ||
| for untrimmed_line in docstring.lines() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the worst parser I've written in a long while so if you tell me "Aria for the love of god make a Parser type with methods and enums" I will. Writing it this way was conducive to experimentation and thinking about the format.
Diagnostic diff on typing conformance testsNo changes detected when running ty on typing conformance tests ✅ |
|
I spent a while salivating over the ruff docstring formatter as a basis for this, but I ultimately concluded it would be a nightmare for both codebases to try to share that logic, and it's not that much logic. |
|
|
The main risk of shipping this implementation is that the previous implementation was so wildly conservative that it "couldn't" ever mess up whereas insufficient escaping or getting confused could complete garble things and eat the docs. On balance though this just looks so much better I think the risk is worth it (and shipping it will get me a lot of bug reports about what to fix :) |
| let mut starting_literal = false; | ||
| let mut in_literal = false; | ||
| let mut in_any_code = false; | ||
| for untrimmed_line in docstring.lines() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to use .universal_newlines here to get proper \r support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to signpost that better in the comments but this function assumes documentation_trim (in the same file) was already run, so our leading whitespace is only spaces, and our newlines are only \n (and unlike ruff we don't need to care about authentically preserving the flavour of whitespace).
crates/ty_ide/src/docstring.rs
Outdated
| /// | ||
| /// This code currently assumes docstrings are reStructuredText with significant | ||
| /// line-breaks and indentation. Some constructs like *italic*, **bold**, and `inline code` | ||
| /// are shared between the two formats. Other things like _italic_ are only markdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the two formats that you're referring to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reStructuredText and markdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's not clear to me from the comment
| // Add this line's indentation. | ||
| // We could subtract the block_indent here but in practice it's uglier | ||
| for _ in 0..line_indent { | ||
| // If we're not in a codeblock use non-breaking spaces to preserve the indent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say why this is necessary, given that it gets rendered as markdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes so the idea here is that docstrings really like using indentation for structure that we do not (care to) parse but we also don't want markdown turning into code-blocks. By replacing leading whitespace with outside of codeblocks, we preserve the native indent of the docstring and stuff we don't understand still is rendered clearly and pretty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Even if I had a perfect parser for this format, the only clear way to render this to markdown, imo, is by rendering these blocks as indented)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think I misread it as in a code block. This makes a ton of sense for indents outside code blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should use the unicode code point directly so that we don't rely on the markdown renderer supporting HTML?
|
|
||
| // Add this line's indentation. | ||
| // We could subtract the block_indent here but in practice it's uglier | ||
| for _ in 0..line_indent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work for code using tab indentation (which our formatter supports).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(another thing handled by documentation_trim)
crates/ty_ide/src/docstring.rs
Outdated
| /// | ||
| /// This is by no means perfect but it handles a lot of common formats well. | ||
| /// | ||
| /// This code currently assumes docstrings are reStructuredText with significant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for docstrings using the numpy or google format? will they still be rendered correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| // First thing's first, add a newline to start the new line | ||
| if !first_line { | ||
| // If we're not in a codeblock, add trailing space to the line to authentically wrap it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand this. Why is it necessary to add trailing whitespace to any line? This seems very undesired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's part of the approach I'm taking where I want to preserve the wrapping/indentation of the original docstring.
It prevents e.g. param1 and param2 from being merged together as a single line in:
'param1' -- The first parameter description
'param2' -- The second parameter description
This is a continuation of param2 description.
'param3' -- A parameter without type annotation
| /// line-breaks and indentation. Some constructs like *italic*, **bold**, and `inline code` | ||
| /// are shared between the two formats. Other things like _italic_ are only markdown | ||
| /// and should be escaped. | ||
| fn render_markdown(docstring: &str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a link to reStructuredText
| in_literal = true; | ||
| in_any_code = true; | ||
| block_indent = line_indent; | ||
| // TODO: should we not be this aggressive? Let it autodetect? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-code-block
Defaulting to Python doesn't seem unreasonable but we should respect any override
| } | ||
|
|
||
| // If we're not in a codeblock and we see something that signals a literal block, start one | ||
| if !in_any_code && let Some(without_lit) = line.strip_suffix("::") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to play with restructuredText but the way I read this paragraph
Literal code blocks (ref) are introduced by ending a paragraph with the special marker ::. The literal block must be indented
is that what comes after is only a literal block if there's at least one blank line, and text is indented.
I think your code correctly handles the empty line. But it ends up pushing an empty markdown code block if the next paragraph isn't correctly indented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha yes you caught me this part of the parser is distinctly janky. I'll tweak it to make it Spec Compliant (although I can't help but wonder if being sloppy will be the name of the game since we need to render whatever we find).
| let mut output = String::new(); | ||
| let mut first_line = true; | ||
| let mut block_indent = 0; | ||
| let mut in_doctest = false; | ||
| let mut starting_literal = false; | ||
| let mut in_literal = false; | ||
| let mut in_any_code = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about mixing the parsing of docstrings with the formatting logic. It seems harder to follow. I'd be inclined to parse the docstring into an intermediate representation and then do a second pass to format it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dang, ok if the captain of the performance team wants more intermediate allocations and representations who am I to complain 😄
(I was half expecting to be told to remove some intermediate Strings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol. I'm not too concerned here, given that we don't render markdown strings that frequently and having a more structured representation would ultimately be useful for Ruff too. Although I don't think we should aim to make it reusable for now.
crates/ty_ide/src/docstring.rs
Outdated
| if !in_any_code { | ||
| // This is plain text, apply some escapes for reST that don't map to markdown: | ||
| // * underscores aren't used for italics/bold in reST (don't mess up __dunders__) | ||
| output.push_str(&line.replace('_', "\\_")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found an in-retrospect-obvious place where this is broken: this mangles inline code __init__ (I kept seeing it without any code markup, so that was my primary concern)
…d-typevar * origin/main: (24 commits) [ty] Remove brittle constraint set reveal tests (#21568) [`ruff`] Catch more dummy variable uses (`RUF052`) (#19799) [ty] Use the same snapshot handling as other tests (#21564) [ty] suppress autocomplete suggestions during variable binding (#21549) Set severity for non-rule diagnostics (#21559) [ty] Add `with_type` convenience to display code (#21563) [ty] Implement docstring rendering to markdown (#21550) [ty] Reduce indentation of `TypeInferenceBuilder::infer_attribute_load` (#21560) Bump 0.14.6 (#21558) [ty] Improve debug messages when imports fail (#21555) [ty] Add support for relative import completions [ty] Refactor detection of import statements for completions [ty] Use dedicated collector for completions [ty] Attach subdiagnostics to `unresolved-import` errors for relative imports as well as absolute imports (#21554) [ty] support PEP 613 type aliases (#21394) [ty] More low-hanging fruit for inlay hint goto-definition (#21548) [ty] implement `TypedDict` structural assignment (#21467) [ty] Add more random TypeDetails and tests (#21546) [ty] Add goto for `Unknown` when it appears in an inlay hint (#21545) [ty] Add type definitions for `Type::SpecialForm`s (#21544) ...



Summary
This introduces a very bad and naive python-docstring-flavoured-reStructuredText to github-flavor-markdown translator. The main goal is to try to preserve a lot of the formatting and plaintext, progressively enhance the content when we find things we know about, and escape the text when we find things that might get corrupt.
Previously I'd broken this out into rendering each different format, but with this approach you don't really need to?
Test Plan
Lots of snapshot tests, also messed around in some random stdlib modules.