Crawling · unblocking · systems

I build crawlers and unblocking systems at scale, and I play with AI.

Technical Lead and Architect at Crawlbase. Shipping spiders, bypass layers, and pipelines that survive the messy open web.

On the side of the desk: helping teams ship data pipelines that touch billions of pages — not only fetchers and parsers, but the full stack that tackles blocks: WAFs, fingerprints, CAPTCHAs, proxy routing, and the kind of targeted bypass work that turns a hard “no” into a reliable feed.

Billions of pages, one stubborn engineer

End-to-end work: I build spiders and the systems that unblock them — custom layers for hard targets, pragmatic bypass paths when the work is allowed, proxy orchestration, and clean structured delivery you can plug straight into prod.

  • ···Pages crawled & parsed across client projects
  • ···HTTP & unblocking journeys: retries, fingerprints, redirects & proxies
  • ···AI extraction, clean-up & structured output passes

Cumulative numbers across Crawlbase operations and consulting projects. Need crawlers, unblocking, or both? Hit Say hello.

What I’m usually doing

Crawlers, unblocking, and the glue in between — day job, consulting, and side experiments.

  • Crawling at scale

    High-volume extraction that stays up under load: millions of requests, shifting targets, fragile sites, and the chaos of the open web — with observability so you know when something breaks.

  • Crawlers & fetchers

    Spiders and resilient fetchers that ship real data: parsing, scheduling, retries, and the hard edge where bot-detection actually hurts — not demo scripts that stop at hello world.

  • Unblocking & bypass systems

    The systems that get you past the wall: WAFs, fingerprint and session tricks, smarter proxy and header strategy, CAPTCHA paths where allowed — and pragmatic custom layers when an off-the-shelf scraper is not enough.

  • AI & agents

    Models plus agentic flows: tools, planners, and retries that summarize messy pages, structure extraction, and run multi-step crawl and research tasks without a human in every loop.

  • Automations & APIs

    Glue code, internal tools, and HTTP APIs in whatever stack fits: wiring crawlers, unblocking jobs, and backends so teams can trigger, observe, and trust the pipeline.

  • Data pipelines

    Ingest, clean, enrich, store, and deliver, so scraped and automated output lands in queues, warehouses, or products people actually use.

Jamal Awad — portrait sketch
Jamal AwadTech Lead & Architect

About

Crawlers, blocks, and the systems in between

I’ve spent the better part of a decade building crawling systems from the ground up — but the interesting part is rarely only the spider: it’s the layers that unblock stubborn targets, adapt when defenses change, and still hand you clean, structured data.

At Crawlbase, I lead architecture for large-scale crawling and anti-bot mitigation: proxy orchestration, bypass and recovery paths, retry logic, and pipelines that tie fetch, unblock, and deliver together. Before that, I shipped complete projects for clients — e-commerce, real estate, finance, travel — end-to-end: spiders, custom unblocking work, automations, scheduling, and production-ready datasets.

On the side, I experiment with AI agents and agentic workflows — where models genuinely help extraction and research, vs. where they’re noise on top of a broken fetch path.

  • 10+ years in web crawling & data extraction
  • Billions of pages — and countless unblock paths
  • Full stack — spider, bypass, delivery
  • Remote — working with teams worldwide

What people say

From teams who needed data out the door — crawlers and the systems that clear the blocks.

We needed 40M product pages crawled weekly across six markets. Jamal built the spider fleet, hardening layer for the worst anti-bot markets, and delivered clean CSVs to our S3 bucket on schedule. Our data team finally stopped complaining.

Head of Data, EU e-commerce platform40M+ pages/week project

Our stack was failing on 60% of targets after Cloudflare updates. Jamal didn't just patch crawlers — he rebuilt the unblocking pipeline in two weeks: fingerprinting, smarter retries, residential fallback, and custom logic where generic tools died. Failure rate dropped to under 3%.

CTO, price intelligence startupUnblocking & pipeline rescue

Jamal ships the full loop: scheduling, monitoring, alerts, and API delivery. But the difference was the blocking work — when everyone else shrugged at 403s, he had a system. Fourteen months, zero missed deliveries.

VP Engineering, real estate data companyEnd-to-end crawl & unblock infrastructure

Stack & focus areas

RubySidekiqRedisPostgreSQLMySQLMongoDBElasticsearchKafkaDockerKubernetesAnsibleChef WorkstationAWSREST APIsGraphQLScrapySeleniumLLMs & agentsSystems architectureSelf-hosted infrastructureInternal toolingWorkflow automation