Skip to content

Conversation

@dannobytes
Copy link
Contributor

@dannobytes dannobytes commented Dec 12, 2025

PR App Fix CX-2603

🧰 Changes

Prevents *, _ and # chars from getting escaped in the md syntax to
avoid something like **bold ** becoming \*\*bold \*\*.

Largely fixed by integrating this new plugin:
#1279

✅ What's working

before after
image image

🧬 QA & Testing

Prevents `*`, `_` and `#` chars from getting escaped in the md syntax to
avoid something like `**bold **` becoming `\*\*bold \*\*`.

Not entirely certain this is enough to solve the problem of MD content
with trailing spaces between the closing `**` from rendering, but added
some tests to verify that our strip comments transformer should at least
leave these alone.

Is there any harm in never escaping these?
handlers: {
text(node, _, state, info) {
// Don't escape special markdown characters like #, *, or _.
if (/[#*_]/.test(node.value)) return node.value;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, i'm not entirely certain if this is the correct way to be solving this but am not sure how to think of all the cases in which this may regress something.

but just wanted to throw up something as a starting point to iterate on.

i didn't get to see this make any noticeable impact on the readme app after linking, but i was having trouble getting anything in my MD repo to show up.

need to look into it a bit more and pair on it with someone, b/c i couldn't figure out why i wasn't seeing any console logs come thru.

@dannobytes
Copy link
Contributor Author

i added some better tests to help point out the hard problem here 2ab0f3d

split them up into two tests:

  1. allows compact headings with no whitespace delimiter
  2. allows leading/trailing spaces between bold/italic markers

the biggest problem here is that any invalid markdown that doesn't fit the CommonMark spec gets converted into a text node from remark-parse.

for example, here's a simple md input along with the mdast text nodes that it gets converted to:

#Blue
\\# Literal
# Black
{
  type: 'text',
  value: '#Blue\n# Literal',
  position: {
    start: { line: 2, column: 1, offset: 1 },
    end: { line: 3, column: 11, offset: 17 }
  }
}
{
  type: 'text',
  value: 'Black',
  position: {
    start: { line: 4, column: 3, offset: 20 },
    end: { line: 4, column: 8, offset: 25 }
  }
} "Black"

see how the first two lines with invalid MD get converted into a single text node instead of being split up. also notice how the backslashed \# character isn't included in the node value. this makes sense b/c it'll eventually get sanitized/escaped.

if we then try to evaluate these text nodes to either return its raw value vs a "safe" (i.e. escaped) value, how can we without being able to discern whether the # char is a heading vs backslashed literal?

the same problem exists for leading/trailing bold/italic markers.

single line with **bold ** text and \\*literal\\* asterisks.

**bold**
**  leading**
**trailing  **

turns into this mdast set of text nodes:

{
  type: 'text',
  value: 'single line with **bold ** text and *literal* asterisks.',
  position: {
    start: { line: 2, column: 1, offset: 1 },
    end: { line: 2, column: 59, offset: 59 }
  }
}
{
  type: 'text',
  value: '\n**  leading**\n**trailing  **',
  position: {
    start: { line: 4, column: 9, offset: 69 },
    end: { line: 6, column: 15, offset: 98 }
  }
}
{
  type: 'text',
  value: 'bold',
  position: {
    start: { line: 4, column: 3, offset: 63 },
    end: { line: 4, column: 7, offset: 67 }
  }
}
{
  type: 'text',
  value: '\n**  leading**\n**trailing  **',
  position: {
    start: { line: 4, column: 9, offset: 69 },
    end: { line: 6, column: 15, offset: 98 }
  }
}

how do you take a single text node like single line with **bold ** text and *literal* asterisks. and run a regex to determine whether to return a "safe" value vs not?

it feels like we might be going about this the wrong way. i think we need to write a remark plugin, much like the remark-gfm plugin to support GitHub-flavored markdown that includes all of ReadMe-flavored markdown and runs this like

const file = unified()
  .use(remarkParse)
  .use(remarkReadMe)
  ...

not sure if we have anything like this already, but this feels like the right approach instead of the processing we're doing in our remarkStringify handlers

@dannobytes
Copy link
Contributor Author

adding a note that this was solved for mdx-ish with this new plugin:
#1279

note, to try adding this here

@dannobytes
Copy link
Contributor Author

okay, this should be ready to re-review @kevinports

one callout is that while this solves malformed emphasis markers (e.g. **bold **), it does not solve malformed headers like #Header with no whitespace

@dannobytes dannobytes requested a review from a team January 22, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants