Describe the enhancement requested
Database like ClickHouse support bloom filters on the tokens present in a String rather than the String itself.
https://clickhouse.com/docs/optimize/skipping-indexes#bloom-filter-types
I suggest that Apache Parquet support this type of Bloom filter to speed up token matching or SQL like operations.