-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat(ptx): Implement -S (sentence-regexp) mode with dual-mode architecture and comprehensive test coverage #8915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… and implement -S option
GNU testsuite comparison:
|
f74ffd9
to
c76de6d
Compare
GNU testsuite comparison:
|
@Misakait please keep in mind that Humans are reviewing PR, not AI...
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
Hi @sylvestre, Thank you very much for your review and valuable feedback. I sincerely apologize for the size of this PR. I understand that large PRs place a significant burden on maintainers, and I struggled for a long time trying to split it into smaller pieces, but found it nearly impossible. The reason is that these changes are, by their nature, highly architecturally coupled. In response to your first point, I'd like to explain my process and the dependency chain, and I would be extremely grateful for your advice on how to split this, or for any alternative refactoring path I should follow. As a newcomer, this is an area I find very challenging. My primary goal was to implement the -S (sentence-regexp) mode. To achieve this, I discovered that a series of tightly linked changes were necessary:
This is essentially what my first commit does. My difficulty in splitting it is that these changes are an interdependent chain. A PR that only modifies My second commit is dedicated to fixing the numerous compatibility bugs that the new architecture revealed. I could not bring myself to submit a PR full of bugs. This second commit primarily refactors the Regarding the on-the-fly test generation, you are absolutely right. I will work on modifying that. Thank you for taking the time to read this long explanation. I have many questions and would be very grateful for any advice you can offer to help me move forward. Thank you! |
This PR significantly enhances the ptx utility by implementing the -S (sentence-regexp) option and introducing a robust dual-mode architecture that can handle both traditional line-based processing and context-aware stream processing.
Architecture Overview
optimized for each mode
Key Features Implemented
-S, --sentence-regexp=REGEXP
Support:-r
(references) is used with the new stream mode (-S
).-W
(word-regexp) option is correctly respected by all internal logic, including performance optimizations.-A
(auto-reference) line number generation in both line and stream modes.head
,tail
,before
, andafter
fields.maintain compatibility
Motivation
The -S option is one of the most important features of GNU ptx, enabling intelligent
context-aware indexing beyond simple line-by-line processing. This implementation
provides:
Testing
All new functionality is covered by extensive tests with expected outputs matching GNU ptx exactly. The implementation has been verified against:
Breaking Changes
None. This enhances existing functionality while maintaining backward compatibility.
Related Issues