Performance regression in hyperscan 0.8.0 vs 0.7.x

Firstly thanks for developing and maintaining great library.

## Summary of Issue

We've identified a performance regression when upgrading from hyperscan 0.7.x to 0.8.0 in our use case. While 0.8.0 successfully resolves a memory leak present in 0.7.7, the throughput degradation makes it difficult to adopt the fix in production.

## Environment

- **Package**: `hyperscan` (PyPI)
- **Versions tested**: 0.7.7, 0.7.19, 0.8.0
- **Python**: 3.12
- **OS**: Linux (Ubuntu on WSL2)
- **Workload**: High-volume pattern matching

## Details

Until recently I was using Hyperscan v0.7.7, but we discovered a memory leak during processing which was identified as a hyperscan issue. At that stage I saw that you had also identified and resolved the issue and that the current stable version was 0.8.0.

Upgrading to 0.8.0 successfully resolved the memory leak. Memory now properly releases between iterations, which is great.

However, after upgrading to 0.8.0, we observed a noticeable throughput degradation in our benchmarks.

**Testing 0.7.19**: We tested this intermediate version and found:
- Good throughput (comparable to 0.7.7)
- Memory leak absent or significantly reduced
- Could be a viable alternative, but we prefer using the latest stable release

**Benchmark Results** (using 50 patterns, 500KB documents):

| Version | Avg Time/Scan | Throughput | Memory Leak | Status |
|---------|---------------|------------|-------------|---------|
| 0.7.7   | 3.3 ms     | 148.5 MB/s | Present | Unusable |
| 0.7.19  | 3.2 ms     | 154.3 MB/s | Fixed    | Acceptable |
| 0.8.0   | 44 ms     | 11.1 MB/s | Fixed    | Slower |

## Our Use Case

**Context**: We maintain  a Python package for pattern matching across thousands of documents some of which can be large.

**Workload characteristics:**
- **Combined databases**: Hundreds of regex patterns compiled into a single hyperscan database
- **Large documents**: 2-20MB text files
- **Batch processing**: Thousands of documents sequentially with the same compiled database

## Questions

1. Is this a known issue? Are there documented performance differences between 0.7.x and 0.8.0?
2. Is there a potential workaround?

## Happy to help

We're happy to:
- Provide the minimal reproducible example used to generate the throughput stats above
- Run more benchmarks
- Test proposed patches or fixes
- Collaborate on debugging and investigation

Thank you for maintaining python-hyperscan and for fixing the memory leak present in earlier versions. We're hoping this issue can help identify opportunities to improve 0.8.0 performance while maintaining the stability and memory improvements.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance regression in hyperscan 0.8.0 vs 0.7.x #253

Summary of Issue

Environment

Details

Our Use Case

Questions

Happy to help

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Version	Avg Time/Scan	Throughput	Memory Leak	Status
0.7.7	3.3 ms	148.5 MB/s	Present	Unusable
0.7.19	3.2 ms	154.3 MB/s	Fixed	Acceptable
0.8.0	44 ms	11.1 MB/s	Fixed	Slower

Uh oh!

Performance regression in hyperscan 0.8.0 vs 0.7.x #253

Description

Summary of Issue

Environment

Details

Our Use Case

Questions

Happy to help

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions