Crawling · unblocking · systems

I build crawlers and unblocking systems at scale, and I play with AI.

Technical Lead and Architect at Crawlbase. Shipping spiders, bypass layers, and pipelines that survive the messy open web.

On the side of the desk: helping teams ship data pipelines that touch billions of pages: not only fetchers and parsers, but the full stack that tackles blocks: WAFs, fingerprints, CAPTCHAs, proxy routing, and the kind of targeted bypass work that turns a hard “no” into a reliable feed.

LinkedIn GitHub

Billions of pages, built to last

End-to-end work: I build spiders and the systems that unblock them: custom layers for hard targets, pragmatic bypass paths when the work is allowed, proxy orchestration, and clean structured delivery you can plug straight into prod.

···Pages crawled & parsed across client projects
···HTTP & unblocking journeys: retries, fingerprints, redirects & proxies
···AI extraction, clean-up & structured output passes

Cumulative numbers across engineering programs and client projects. Need crawlers, unblocking, or both? Get in touch.

What I’m usually doing

Crawlers, unblocking, and the glue in between: day job, consulting, and side experiments.

Crawling at scale
High-volume extraction that stays up under load: millions of requests, shifting targets, fragile sites, and the chaos of the open web, with observability so you know when something breaks.
Crawlers & fetchers
Spiders and resilient fetchers that ship real data: parsing, scheduling, retries, and the hard edge where bot-detection actually hurts, not demo scripts that stop at hello world.
Unblocking & bypass systems
The systems that get you past the wall: WAFs, fingerprint and session tricks, smarter proxy and header strategy, CAPTCHA paths where allowed, and pragmatic custom layers when an off-the-shelf scraper is not enough.
AI & agents
Models plus agentic flows: tools, planners, and retries that summarize messy pages, structure extraction, and run multi-step crawl and research tasks without a human in every loop.
Automations & APIs
Glue code, internal tools, and HTTP APIs in whatever stack fits: wiring crawlers, unblocking jobs, and backends so teams can trigger, observe, and trust the pipeline.
Data pipelines
Ingest, clean, enrich, store, and deliver, so scraped and automated output lands in queues, warehouses, or products people actually use.

Jamal AwadTech Lead & Architect

About

Crawlers, blocks, and the systems in between

I’ve spent the better part of a decade building crawling systems from the ground up, but the interesting part is rarely only the spider: it’s the layers that unblock stubborn targets, adapt when defenses change, and still hand you clean, structured data.

At Crawlbase, I lead architecture for large-scale crawling and anti-bot mitigation: proxy orchestration, bypass and recovery paths, retry logic, and pipelines that tie fetch, unblock, and deliver together. Before that, I shipped complete projects for clients across e-commerce, real estate, finance, and travel, end-to-end: spiders, custom unblocking work, automations, scheduling, and production-ready datasets.

On the side, I experiment with AI agents and agentic workflows, where models genuinely help extraction and research, vs. where they’re noise on top of a broken fetch path.

10+ years in web crawling & data extraction
Billions of pages and countless unblock paths
Full stack: spider, bypass, delivery
Remote, working with teams worldwide

What people say

From teams across crawling, SaaS, databases, APIs, and consulting engagements.

Crawling & Data
“We needed 40M product pages crawled weekly across six markets. Jamal built the spider fleet, hardening layer for the worst anti-bot markets, and delivered clean CSVs to our S3 bucket on schedule. Our data team finally stopped complaining.”
Head of Data, EU e-commerce platform40M+ pages/week project

Crawling & Data
“Our stack was failing on 60% of targets after Cloudflare updates. Jamal didn't just patch crawlers; he rebuilt the unblocking pipeline in two weeks: fingerprinting, smarter retries, residential fallback, and custom logic where generic tools died. Failure rate dropped to under 3%.”
CTO, price intelligence startupUnblocking & pipeline rescue

Crawling & Data
“Jamal ships the full loop: scheduling, monitoring, alerts, and API delivery. But the difference was the blocking work: when everyone else shrugged at 403s, he had a system. Fourteen months, zero missed deliveries.”
VP Engineering, real estate data companyEnd-to-end crawl & unblock infrastructure

Database & HA
“Our MySQL replicas were drifting 45 seconds behind under write-heavy load. Jamal restructured the replication topology, tuned InnoDB buffer pools, and moved the hottest tables to a dedicated cluster. Lag dropped to sub-second and stayed there through Black Friday.”
DBA Lead, fintech marketplaceMySQL replication & performance

SaaS & Platform
“We were building multi-tenant SaaS from a single-tenant Rails monolith. Jamal designed the tenant isolation layer, data partitioning scheme, and migration path. We onboarded 200 tenants without a single data leak or downtime window.”
CEO, HR tech startupMulti-tenant SaaS architecture

API & Performance
“P95 latency on our public API had crept to 1.8 seconds. Jamal profiled the hot paths, added Redis caching with smart invalidation, restructured three N+1 query patterns, and pushed us down to 180ms. Customers noticed before we even announced it.”
Product Lead, logistics API providerAPI latency reduction

Database & HA
“After a major outage took our PostgreSQL primary down for two hours, we brought Jamal in to build a proper HA setup. Patroni cluster, automated failover, WAL archiving to S3, and a runbook the on-call team actually follows. We've had three hardware failures since then. Zero downtime.”
Infrastructure Lead, media streaming platformPostgreSQL high availability

Crawling & Data
“We needed real estate listings from 14 fragmented MLS sources, each with different auth, pagination, and anti-scraping. Jamal built adapters for all of them, a unified schema, and deduplication logic. Our agents got a single clean feed for the first time ever.”
CTO, proptech startupMulti-source data aggregation

SaaS & Platform
“Moving from a monolith to event-driven microservices felt impossible with our team size. Jamal carved the domain boundaries, set up Kafka topics with proper schemas, and migrated the first three services while keeping the monolith running. The rest of the team picked up the pattern and kept going.”
Engineering Manager, e-commerce companyEvent-driven architecture migration

Crawling & Data
“We needed 40M product pages crawled weekly across six markets. Jamal built the spider fleet, hardening layer for the worst anti-bot markets, and delivered clean CSVs to our S3 bucket on schedule. Our data team finally stopped complaining.”
Head of Data, EU e-commerce platform40M+ pages/week project

Crawling & Data
“Jamal ships the full loop: scheduling, monitoring, alerts, and API delivery. But the difference was the blocking work: when everyone else shrugged at 403s, he had a system. Fourteen months, zero missed deliveries.”
VP Engineering, real estate data companyEnd-to-end crawl & unblock infrastructure

SaaS & Platform
“We were building multi-tenant SaaS from a single-tenant Rails monolith. Jamal designed the tenant isolation layer, data partitioning scheme, and migration path. We onboarded 200 tenants without a single data leak or downtime window.”
CEO, HR tech startupMulti-tenant SaaS architecture

Database & HA
“After a major outage took our PostgreSQL primary down for two hours, we brought Jamal in to build a proper HA setup. Patroni cluster, automated failover, WAL archiving to S3, and a runbook the on-call team actually follows. We've had three hardware failures since then. Zero downtime.”
Infrastructure Lead, media streaming platformPostgreSQL high availability

SaaS & Platform
“Moving from a monolith to event-driven microservices felt impossible with our team size. Jamal carved the domain boundaries, set up Kafka topics with proper schemas, and migrated the first three services while keeping the monolith running. The rest of the team picked up the pattern and kept going.”
Engineering Manager, e-commerce companyEvent-driven architecture migration

Crawling & Data
“Our stack was failing on 60% of targets after Cloudflare updates. Jamal didn't just patch crawlers; he rebuilt the unblocking pipeline in two weeks: fingerprinting, smarter retries, residential fallback, and custom logic where generic tools died. Failure rate dropped to under 3%.”
CTO, price intelligence startupUnblocking & pipeline rescue

Database & HA
“Our MySQL replicas were drifting 45 seconds behind under write-heavy load. Jamal restructured the replication topology, tuned InnoDB buffer pools, and moved the hottest tables to a dedicated cluster. Lag dropped to sub-second and stayed there through Black Friday.”
DBA Lead, fintech marketplaceMySQL replication & performance

API & Performance
“P95 latency on our public API had crept to 1.8 seconds. Jamal profiled the hot paths, added Redis caching with smart invalidation, restructured three N+1 query patterns, and pushed us down to 180ms. Customers noticed before we even announced it.”
Product Lead, logistics API providerAPI latency reduction

Crawling & Data
“We needed real estate listings from 14 fragmented MLS sources, each with different auth, pagination, and anti-scraping. Jamal built adapters for all of them, a unified schema, and deduplication logic. Our agents got a single clean feed for the first time ever.”
CTO, proptech startupMulti-source data aggregation

Crawling & Data
“We needed 40M product pages crawled weekly across six markets. Jamal built the spider fleet, hardening layer for the worst anti-bot markets, and delivered clean CSVs to our S3 bucket on schedule. Our data team finally stopped complaining.”
Head of Data, EU e-commerce platform40M+ pages/week project

Database & HA
“Our MySQL replicas were drifting 45 seconds behind under write-heavy load. Jamal restructured the replication topology, tuned InnoDB buffer pools, and moved the hottest tables to a dedicated cluster. Lag dropped to sub-second and stayed there through Black Friday.”
DBA Lead, fintech marketplaceMySQL replication & performance

Database & HA
“After a major outage took our PostgreSQL primary down for two hours, we brought Jamal in to build a proper HA setup. Patroni cluster, automated failover, WAL archiving to S3, and a runbook the on-call team actually follows. We've had three hardware failures since then. Zero downtime.”
Infrastructure Lead, media streaming platformPostgreSQL high availability

Crawling & Data
“Our stack was failing on 60% of targets after Cloudflare updates. Jamal didn't just patch crawlers; he rebuilt the unblocking pipeline in two weeks: fingerprinting, smarter retries, residential fallback, and custom logic where generic tools died. Failure rate dropped to under 3%.”
CTO, price intelligence startupUnblocking & pipeline rescue

SaaS & Platform
“We were building multi-tenant SaaS from a single-tenant Rails monolith. Jamal designed the tenant isolation layer, data partitioning scheme, and migration path. We onboarded 200 tenants without a single data leak or downtime window.”
CEO, HR tech startupMulti-tenant SaaS architecture

Crawling & Data
“We needed real estate listings from 14 fragmented MLS sources, each with different auth, pagination, and anti-scraping. Jamal built adapters for all of them, a unified schema, and deduplication logic. Our agents got a single clean feed for the first time ever.”
CTO, proptech startupMulti-source data aggregation

Crawling & Data
“Jamal ships the full loop: scheduling, monitoring, alerts, and API delivery. But the difference was the blocking work: when everyone else shrugged at 403s, he had a system. Fourteen months, zero missed deliveries.”
VP Engineering, real estate data companyEnd-to-end crawl & unblock infrastructure

API & Performance
“P95 latency on our public API had crept to 1.8 seconds. Jamal profiled the hot paths, added Redis caching with smart invalidation, restructured three N+1 query patterns, and pushed us down to 180ms. Customers noticed before we even announced it.”
Product Lead, logistics API providerAPI latency reduction

SaaS & Platform
“Moving from a monolith to event-driven microservices felt impossible with our team size. Jamal carved the domain boundaries, set up Kafka topics with proper schemas, and migrated the first three services while keeping the monolith running. The rest of the team picked up the pattern and kept going.”
Engineering Manager, e-commerce companyEvent-driven architecture migration

Stack & focus areas

Systems architectureSaaS platform designTechnical leadershipDistributed systemsAPI design & integrationData pipelinesLLMs & AI agentsCloud infrastructure (AWS)Container orchestration (K8s)Self-hosted infrastructureEvent-driven architectureObservability & monitoringCI/CD & DevOpsRubyPythonPostgreSQLRedisKafkaElasticsearch

Activity & contributions

Issues, merge requests, commits, and code review across internal GitLab and private GitHub.

17,982 contributions in the last year

I build crawlers and unblocking systems at scale, and I play with AI.

Billions of pages, built to last

What I’m usually doing

Crawling at scale

Crawlers & fetchers

Unblocking & bypass systems

AI & agents

Automations & APIs

Data pipelines

Crawlers, blocks, and the systems in between

What people sayCompany names are kept confidential. Most of this feedback came from sensitive crawling and data projects where clients prefer not to disclose the collaboration publicly.

Stack & focus areas

Activity & contributions

What people say