Slow down rogue web srappers on my website and still use Varnish

Imagine there are scrappers crawling my website. How can I ban them and still white list Google Bots ?

I think I can find the ip range of Google bots, and I am thinking of using Redis to store all the access of the day and if in a short time I see too many requests from the same IP -> ban.

My stack is ubuntu server, nodejs, expressjs.

The main problem I see is that this detection is behind Varnish. So Varnish cache has to be disabled. Any better idea, or good thoughts ?

You could stop the crawler using the robots.txt

User-agent: BadCrawler
Disallow: /

This solution works if the crawler follow the robots.txt specifications

You can use an Varnish ACL [1], it will be possibly a bit harder to maintain that in apache, but surely will work:

acl bad_boys {
  "666.666.666.0"/24; // Your evil range
  "696.696.696.696"; //Another evil IP
}

// ...

sub vcl_recv {
  if (client.ip ~ bad_boys) {
    error 403 "Forbidden";
  }
  // ...
}

// ...

You can also white-listing, use user agent or other techniques to ensure that it isn't GoogleBot... but I would defend myself in Varnish rather than in Apache.

[1] https://www.varnish-cache.org/docs/3.0/reference/vcl.html#acls