-
Notifications
You must be signed in to change notification settings - Fork 446
Replace attribute_manager with a new rdoc-inline-format parser #1559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
🚀 Preview deployment available at: https://0eb62010.rdoc-6cd.pages.dev (commit: 763f379) |
Remove complicated string-replacing/macro based parser and change to a normal parser that returns a structured data
1bb81c8 to
b8e0259
Compare
st0012
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the big rewrite 🙏 I couldn't finish review in one pass. I'll give it another look in the next few days.
lib/rdoc/markup/formatter.rb
Outdated
| next handle_TEXT(node) if String === node | ||
| case node[:type] | ||
| when :TIDYLINK | ||
| @in_tidylink = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved into handle_TIDYLINK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
I've move this to ToHtml#handle_TIDYLINK and renamed ivar to @in_tidylink_label.
@in_tidylink_label is true while processing tidylink label-part, but false while processing tidylink url-part.
Because rdoc-ref:... in tidylink label should be ignored (to avoid generating nested <a> tag) but rdoc-ref:... inside tidylink url should be handled.
lib/rdoc/markup/to_ansi.rb
Outdated
| attrs = @attributes.keys | ||
| if @current_attributes != attrs && @current_attributes != attrs.sort! | ||
| change_attribute(@current_attributes, attrs) | ||
| @current_attributes = attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be part of change_attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The method and ivar name was misleading.
@attributes is the current attributes to be applied to text
@current_attributes (renamed to @applied_attributes) is the styling applied/flushed to output as escape sequences.
on and off changes attribute
change_attributes (renamed to apply_attributes) will flush the buffered attribute changes
1b7488d to
9eac17b
Compare
RDoc::Markup::AttributeManager
RDoc parses block-level structure in
RDoc::Markup::Parserand parses inline styling inRDoc::Markup::AttributeManager.RDoc::Markup::AttributeManageris a string-replacing/macro based parser.It's very complicated, and its mechanism are the root cause of many bugs. We need to remove and replace it.
It converts to a flow/stream of string and styling changing operation.
Unfortunately, inline styles are structured data, output format (HTML) is also a structured data. There's an architecture mismatch.
Tidylink
{label}[url]should be a syntax rule, but it is currently handled in aregexp-based macro called
regexp_handling.Of course parsing nested structure such as styled tidylink label frequently fails. It's uncontrollable.
Solution
RDoc::Markup::AttributeManagerNew inline styling syntax
Tokens
+word+*word*_word_`word`and so on<br>,<tt>code_text</tt>,<code>code_text</code><i></i>,<b></b>,<em></em>,<s></s>,<del></del>{and closing}[url_part]word[url_part]Regexp handling macro
Matching with CROSSREF, RDOCREF, HYPERLINK regexp should be applied only to text nodes after parsing phase.
Parsed tree modication instead of text-node gsub is also another option to implement this.
Error recovery
RDoc format doesn't have syntax error. We need to make the behavior as similar before.
<a><b><c></a>will be<a><b><c></a>{<a><b>}[url]will be a tidylink with label<a><b><a></b>}</a>will be<a></b>}</a>{{inner}[url]}[<b>]</b>will be"{" + inner_tidylink + "}[" + bold("]")Simplified tidylink
RDoc was converting
a*_`+<b>c[foo]to<a href="foo">a*_`+<b>c</a>. This is terrible, it can't coexist with other syntaxes like*word*_word_+word+.We should restrict characters and recommend
{label}[url].In this pull request, only
Alphanumeric[url](should start with alphabet) is supported .Diff
Running rdoc in
ruby/ruby, there were 48 HTML files that had changes (with commit hash:763f379).There are too many diffs, but I think most of them are acceptable.
Some diff
List of all files with diffs:
Array.html
A {%w or %W string-array Literal}[rdoc-ref:...]<p>A array literal:</p><p>A %w or %W string-array Literal:</p>ERB.html
**Embedded Ruby**<strong>Embedded Ruby</strong><strong>Embedded <a href="Ruby.html"><code>Ruby</code></a></strong>Same diff in
IO.html,Net/HTTPHeader.html,OpenSSL/Cipher.htmlandSocket.htmlEncoding/Converter.html
"\uFFFD"“uFFFD”\uis unescaped“\uFFFD”Same diff in
PrettyPrint.html,Random/Formatter.htmlHash.html
hash same \hash keysame hash key\his unescapedsame \hash keyKernel.html
{\`command`}[rdoc-ref:Kernel#`]<a href="Kernel.html#method-i-60"><code>command</code></a><a href="Kernel.html#method-i-60">‘command`</a>Whether the entity's setgid bit is set.<td align="left">Whether the entity's setgid bit is set.</td>'<td align="left">Whether the entity’s setgid bit is set.</td>’Net/HTTP.html
{Proxy Using ENV['http_proxy']}[rdoc-ref:...]<a href="...">Proxy Using <a href="'http_proxy'">ENV</a></a></a></a>){Proxy Using <a href="'http_proxy'">ENV</a>}[<a href="...">...</a>]OpenSSL/SSL/SSLSocket.html
_remote\_host_ and _remote\_port_<em>remote</em>host_ and <em>remote</em>port__remote_host_ and _remote_port_OpenSSL/X509/Name.html
or ++1+ if++missing<code>+1</code>++1+OpenSSL.html
v2.0[http://www.rsa.com/rsalabs/node.asp?id=2127]<a href="http://www.rsa.com/rsalabs/node.asp?id=2127">v2.0</a>v2.0[<a href="http://www.rsa.com/rsalabs/node.asp?id=2127">www.rsa.com/rsalabs/node.asp?id=2127</a>]Regexp.html
(see {%r: Regexp Literals}[...])(see Regexp Literals):(see %r: Regexp Literals):{rdoc-ref:Regexp global variables}[rdoc-ref:Regexp@Global+Variables]<a href="Regexp.html#class-regexp-global-variables">Regexp global variables</a><a href="Regexp.html#class-regexp-global-variables">rdoc-ref:Regexp global variables</a>String.html
<b>+String+ +replacement+</b><a href="String.html"><code>String</code></a> <code>replacement</code></strong></strong>without opening<strong>(bug)<strong><a href="String.html"><code>String</code></a> <code>replacement</code></strong>{string literal}[rdoc-ref:syntax/literals.rdoc@String+Literals] or a {here document literal}[rdoc-ref:syntax/literals.rdoc@Here+Document+Literals]<p>string literal or a string literal</p><p>string literal or a here document literal</p>Same diff in
Regexp.htmlStringScanner.html
[stored string][1][stored <a href="1">string]</a>[stored string][1](and many more. give up listing)