-
-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Firstly thanks for developing and maintaining great library.
Summary of Issue
We've identified a performance regression when upgrading from hyperscan 0.7.x to 0.8.0 in our use case. While 0.8.0 successfully resolves a memory leak present in 0.7.7, the throughput degradation makes it difficult to adopt the fix in production.
Environment
- Package:
hyperscan(PyPI) - Versions tested: 0.7.7, 0.7.19, 0.8.0
- Python: 3.12
- OS: Linux (Ubuntu on WSL2)
- Workload: High-volume pattern matching
Details
Until recently I was using Hyperscan v0.7.7, but we discovered a memory leak during processing which was identified as a hyperscan issue. At that stage I saw that you had also identified and resolved the issue and that the current stable version was 0.8.0.
Upgrading to 0.8.0 successfully resolved the memory leak. Memory now properly releases between iterations, which is great.
However, after upgrading to 0.8.0, we observed a noticeable throughput degradation in our benchmarks.
Testing 0.7.19: We tested this intermediate version and found:
- Good throughput (comparable to 0.7.7)
- Memory leak absent or significantly reduced
- Could be a viable alternative, but we prefer using the latest stable release
Benchmark Results (using 50 patterns, 500KB documents):
| Version | Avg Time/Scan | Throughput | Memory Leak | Status |
|---|---|---|---|---|
| 0.7.7 | 3.3 ms | 148.5 MB/s | Present | Unusable |
| 0.7.19 | 3.2 ms | 154.3 MB/s | Fixed | Acceptable |
| 0.8.0 | 44 ms | 11.1 MB/s | Fixed | Slower |
Our Use Case
Context: We maintain a Python package for pattern matching across thousands of documents some of which can be large.
Workload characteristics:
- Combined databases: Hundreds of regex patterns compiled into a single hyperscan database
- Large documents: 2-20MB text files
- Batch processing: Thousands of documents sequentially with the same compiled database
Questions
- Is this a known issue? Are there documented performance differences between 0.7.x and 0.8.0?
- Is there a potential workaround?
Happy to help
We're happy to:
- Provide the minimal reproducible example used to generate the throughput stats above
- Run more benchmarks
- Test proposed patches or fixes
- Collaborate on debugging and investigation
Thank you for maintaining python-hyperscan and for fixing the memory leak present in earlier versions. We're hoping this issue can help identify opportunities to improve 0.8.0 performance while maintaining the stability and memory improvements.