Commit Graph

17 Commits

Author SHA1 Message Date
divanoli
90974e1030 refactor(telegram): use snapshot for orphaned TLD offset clarity
Use explicit snapshot variable when checking tag positions in orphaned
TLD pass. While JavaScript's replace() doesn't mutate during iteration,
this makes intent explicit and adds test coverage for multi-TLD HTML.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:41:08 +03:00
divanoli
352398b9a5 fix(telegram): prevent orphaned TLD wrapping inside HTML tags
Code review fixes:

1. Orphaned TLD pass now checks if match is inside HTML tag
   - Uses lastIndexOf('<') vs lastIndexOf('>') to detect tag context
   - Skips wrapping when between < and > (inside attributes)
   - Prevents invalid HTML like <a href="...&<code>D.md</code>">

2. textMode: 'html' now trusts caller markup
   - Returns text unchanged instead of wrapping
   - Caller owns HTML structure in this mode

Tests added:
- 'does not wrap orphaned TLD inside href attributes'
- 'does not wrap orphaned TLD inside any HTML attribute'
- 'does not wrap in HTML mode (trusts caller markup)'
2026-02-05 12:23:24 +03:00
divanoli
94851de4f8 refactor(telegram): remove popular domain TLDs from file extension list
Remove .ai, .io, .tv, .fm from FILE_EXTENSIONS_WITH_TLD because:
- These are commonly used as real domains (x.ai, vercel.io, github.io)
- Rarely used as actual file extensions
- Users are more likely referring to websites than files

Keep: md, sh, py, go, pl (common file extensions, rarely intentional domains)
Keep: am, at, be, cc, co (less common as intentional domain references)

Update tests to reflect the change:
- Add test for supported extensions (.am, .at, .be, .cc, .co)
- Add test verifying popular TLDs stay as links
2026-02-05 12:03:43 +03:00
divanoli
38a94ec5fe fix(telegram): catch orphaned single-letter TLD patterns
When text like 'R&D.md' doesn't match the main file pattern (because &
breaks the character class), the 'D.md' part can still be auto-linked
by Telegram as a domain (https://d.md/).

Add second pass to catch orphaned TLD patterns like 'D.md', 'R.io', 'X.ai'
that follow non-alphanumeric characters and wrap them in <code> tags.

Pattern: ([^a-zA-Z0-9]|^)([A-Za-z]\.(?:extensions))(?=[^a-zA-Z0-9/]|$)

Tests added:
- 'wraps orphaned TLD pattern after special character' (R&D.md → R&<code>D.md</code>)
- 'wraps orphaned single-letter TLD patterns' (X.ai, R.io)
2026-02-05 11:54:09 +03:00
divanoli
5431591cfa fix(telegram): add escapeHtml and escapeRegex for defense in depth
Code review fixes:
1. Escape filename with escapeHtml() before inserting into <code> tags
   - Prevents HTML injection if regex ever matches unsafe chars
   - Defense in depth (current regex already limits to safe chars)

2. Escape extensions with escapeRegex() before joining into pattern
   - Prevents regex breakage if extensions contain metacharacters
   - Future-proofs against extensions like 'c++' or 'd.ts'

Add tests documenting regex safety boundaries:
- Filenames with special chars (&, <, >) don't match
- Only [a-zA-Z0-9_.\-./] chars are captured
2026-02-05 11:47:50 +03:00
divanoli
8a5453e3e7 fix(telegram): use regex literal and depth counters for tag tracking
Code review fixes:
1. Replace RegExp constructor with regex literal for autoLinkedAnchor
   - Avoids double-escaping issues with \s
   - Uses backreference \1 to match href=label pattern directly

2. Replace boolean toggles with depth counters for tag nesting
   - codeDepth, preDepth, anchorDepth track nesting levels
   - Correctly handles nested tags like <pre><code>...</code></pre>
   - Prevents wrapping inside any level of protected tags

Add 4 tests for edge cases:
- Nested code tags (depth tracking)
- Multiple anchor tags in sequence
- Auto-linked anchor with backreference match
- Anchor with different href/label (no match)
2026-02-05 11:36:58 +03:00
divanoli
99311daaed fix(telegram): prevent URL previews for file refs with TLD extensions
Two layers were causing spurious link previews for file references like
`README.md`, `backup.sh`, `main.go`:

1. **markdown-it linkify** converts `README.md` to
   `<a href="http://README.md">README.md</a>` (.md = Moldova TLD)
2. **Telegram auto-linker** treats remaining bare text as URLs

## Changes

### Primary fix: suppress auto-linkified file refs in buildTelegramLink
- Added `isAutoLinkedFileRef()` helper that detects when linkify auto-
  generated a link from a bare filename (href = "http://" + label)
- Rejects paths with domain-like segments (dots in non-final path parts)
- Modified `buildTelegramLink()` to return null for these, so file refs
  stay as plain text and get wrapped in `<code>` by the wrapper

### Safety-net: de-linkify in wrapFileReferencesInHtml
- Added pre-pass that catches auto-linkified anchors in pre-rendered HTML
- Handles edge cases where HTML is passed directly (textMode: "html")
- Reuses `isAutoLinkedFileRef()` logic — no duplication

### Bug fixes discovered during review
- **Fixed `isClosing` bug (line 169)**: the check `match[1] === "/"`
  was wrong — the regex `(<\/?)}` captures `<` or `</`, so closing
  tags were never detected. Changed to `match[1] === "</"`. This was
  causing `inCode/inPre/inAnchor` to stay stuck at true after any
  opening tag, breaking file ref wrapping after closing tags.
- **Removed double `wrapFileReferencesInHtml` call**: `renderTelegramHtmlText`
  was calling `markdownToTelegramHtml` (which wraps) then wrapping again.

### Test coverage (+12 tests, 26 total)
- `.sh` filenames (original issue #6932 mentioned backup.sh)
- Auto-linkified anchor replacement
- Auto-linkified path anchor replacement
- Explicit link preservation (different label)
- File ref after closing anchor tag (exercises isClosing fix)
- Multiple file types in single message
- Real URL preservation
- Explicit markdown link preservation
- File ref after real URL in same message
- Chunked output file ref wrapping

Closes #6932
2026-02-05 10:47:39 +03:00
divanoli
70f73e6f8d fix(telegram): auto-wrap file references with TLD extensions to prevent URL previews
Telegram's auto-linker aggressively treats filenames like HEARTBEAT.md,
README.md, main.go, script.py as URLs and generates domain registrar previews.

This fix adds comprehensive protection for file extensions that share TLDs:
- High priority: .md, .go, .py, .pl, .ai, .sh
- Medium priority: .io, .tv, .fm, .am, .at, .be, .cc, .co

Implementation:
- Added wrapFileReferencesInHtml() in format.ts
- Runs AFTER markdown→HTML conversion
- Tokenizes HTML to respect tag boundaries
- Skips content inside <code>, <pre>, <a> tags (no nesting issues)
- Applied to all rendering paths: renderTelegramHtmlText, markdownToTelegramHtml,
  markdownToTelegramChunks, and delivery.ts fallback

Addresses review comments:
- P1: Now handles chunked rendering paths correctly
- P2: No longer wraps inside existing code blocks (token-based parsing)
- No lookbehinds used (broad Node compatibility)

Includes comprehensive test suite in format.wrap-md.test.ts

AI-assisted: true
2026-02-04 15:02:54 +03:00
cpojer
f06dd8df06 chore: Enable "experimentalSortImports" in Oxfmt and reformat all imorts. 2026-02-01 10:03:47 +09:00
cpojer
5ceff756e1 chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
Peter Steinberger
de2d986008 fix: render Telegram media captions 2026-01-24 03:39:25 +00:00
Peter Steinberger
b77e730657 fix: add per-channel markdown table conversion (#1495) (thanks @odysseus0) 2026-01-23 18:39:25 +00:00
George Zhang
a1413a011e feat(telegram): convert markdown tables to bullet points (#1495)
Tables render poorly in Telegram (pipes stripped, whitespace collapses).
This adds a 'tableMode' option to markdownToIR that converts tables to
nested bullet points, which render cleanly on mobile.

- Add tableMode: 'flat' | 'bullets' to MarkdownParseOptions
- Track table state during token rendering
- Render tables as bullet points with first column as row labels
- Apply bold styling to row labels for visual hierarchy
- Enable tableMode: 'bullets' for Telegram formatter

Closes #TBD
2026-01-23 18:00:51 +00:00
George Pickett
232c512502 Format: apply oxfmt fixes 2026-01-15 01:27:16 +00:00
Peter Steinberger
bd7d362d3b refactor: unify markdown formatting pipeline 2026-01-15 00:31:07 +00:00
Peter Steinberger
c379191f80 chore: migrate to oxlint and oxfmt
Co-authored-by: Christoph Nakazawa <christoph.pojer@gmail.com>
2026-01-14 15:02:19 +00:00
Peter Steinberger
2140caaf67 fix: telegram html formatting (#435, thanks @RandyVentures) 2026-01-08 02:34:32 +01:00