Method for Detecting & Validating Drupal Sites.

This guide is designed to give the most accurate documentation of detecting and scanning Drupal sites. I have broken down the process as much as possible so we can easily reproduce these steps in the most reliable way possible for anyone to follow.

To begin, I started with a valid list of Drupal sites I had previously gathered in order to check this that this methodology was in face accurate. The tool that we will be using for detecting sites is called Webanalyze. This is the terminal version of the popular tool Wappalyzer and allows us to get a very accurate scan of websites, and see what the made from within the terminal to be able to work with the data and parse it. The only prerequisite is to have a valid list of domains/subdomains saved as hosts.txt.

Side note: upon first run of webanalyze you may encounter an app.json error. to resolve this run the following commands.

$ sudo apt-get update && sudo apt-get upgrade
$ webanalyze -update

Once you have met the prerequisites and webanalyze is working, the process of identifying sites built with Drupal is only a few commands away. For the purpose of demonstrating, the list of domains/subdomains will be contained in a text file called: hosts.txt

cat hosts.txt | while read targ; do webanalyze -worker 16 -host $targ | tee webanalyze-$(echo $targ | sed 's/:\/\//./g'|sed 's/\/$//g').txt ; done

This command will run each domain/subdomain through webanalyze and save each site with the corresponding domain/subdomain within the newly created text file, so you can easily parse the output of the now scanned sites. The next step is to now refine this list in only valid Drupal sites.


Now that we have our initial scan done, we need to refine this so we only see Drupal sites. The first command, will give us a valid list of Drupal sites with there corresponding versions.

 grep -i drupal *.txt | sed 's/webanlyze-http\.|\.txt//g' -E|sort -u | cut -d "-" -f 2 | cut -d ":" -f 1-2 > valid-drupal-domains.txt

The second optional command, will give us just a list of Drupal sites without the corresponding versions.

grep -i drupal *.txt | sed 's/webanlyze-http\.|\.txt//g' -E|sort -u | cut -d "-" -f 2 | cut -d ":" -f 1 > valid-drupal-domains.txt

At this point you will now have a valid list of domains/subdomains of sites from your “hosts.txt” list that can allow you to quickly identify and mitigate any vulnerabilities disclosed for specific versions of Drupal across a large range of sites.