Skip to content

A high-performance, asynchronous web scraper built with Python, utilizing a modular "Fetcher-Parser-Storage" architecture. Focused on systems efficiency using HTTPX for concurrency and Selectolax for low-latency HTML parsing.

Notifications You must be signed in to change notification settings

Skip06/webScrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Architecture

Fetcher: Asynchronous HTTP/2 networking using HTTPX for connection pooling.

Parser: Low-latency HTML extraction using the C-based Selectolax engine.

Models: Strict data validation and schema enforcement via Pydantic.

Manager: uv (Rust-based package resolver)

Automation: Makefile (Task orchestration)

Logging: Loguru (Structured system logs)

1. Installation

git clone https://github.com/Skip06/kernel-scraper.git
cd kernel-scraper
uv sync 

2. Execution

make run    # Execute the scraper
make clean  # Purge __pycache__ and local data

About

A high-performance, asynchronous web scraper built with Python, utilizing a modular "Fetcher-Parser-Storage" architecture. Focused on systems efficiency using HTTPX for concurrency and Selectolax for low-latency HTML parsing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published