This guide is designed to give the most accurate methodology for detecting and vulnerability scanning WordPress sites in mass. I have broken this guide into it’s three main categories so you can easily reproduce it.
This is the most crucial part of this methodology, before I made this document I noticed some discrepancies within WPscan that gives us inaccurate and therefore unreliable results. Before we start this process you will need to get a lost of hosts/vhosts which we will save in a file called “hosts.txt”. We will also be using the tool meg, with a custom “paths.txt” list so we can get a more accurate identification of what sites are actually build from WordPress.
Once you have meg installed and you have obtained a lost of hosts/vhost, we will make the paths.txt file for meg to use which should look like the following.
Once you have the prerequisites, it’s time to run our first command in the process.
$ meg -c 50 -v --header "$UA" paths.txt hosts.txt meg-scan-results
The output of this command will be saved in the directory: meg-scan-results. From here I like to move the index file within that file to a separate directory called refinement so we can process the scan results.
Now that we have our index file from the above command, we will run the file through a series of commands in order to refine our list down before running it through WPscan, if we don’t we will receive numerous false positives.
$ cat index | grep “wp-content” | grep “301 Moved” | cut -d " " -f 2-5 | tee wp-content.txt
$ cat index | grep "wp-includes" | grep "301 Moved" | cut -d " " -f 2-5 | tee wp-includes.txt
$ cat index | grep "wp-settings.php" | grep "500 Internal" | cut -d " " -f 2-6 | tee wp-settings.txt
$ cat index | grep "wp-config.php" | grep "200 OK" | cut -d " " -f 2-6 | tee wp-config.txt
$ cat index | grep "xmlrpc.php" | grep "301 Moved" | cut -d " " -f 2-6 | tee wp-xmlrpc.php
Now that we have grabbed everything we are looking for from the initial index file provided by the resulting scan of meg. we need to gather only the domains and sort them so we don’t have repeats. This can easily be accomplished with this command listed below.
$ cat wp* | cut -d "/" -f 1-3 | sort -u | tee domains.txt
Side note: I specifically named the files from our refinement process to all start with “wp” for this command specifically to make is much less tedious to obtain all of the unique hosts.
The result of this command is our list of domains we will use for scanning. I prefer to move the newly created domains.txt in a directory called Scanning.
Now that you have finished with the Identification & Refinement process, we can move to the Scanning phase. In this stage we will be using the domains.txt file with WPscan which we will than filter to give us a list of accurately identified WordPress Sites.
Before you run anything make sure you moved the domains.txt into the Scanning directory. Now to keep everything clean we will make a sub-folder called scan-results.
In order to eliminate any remaining false positives, we will run our domains.txt through WPscan with the following command:
$ cat domains.txt | while read targ; do wpscan --disable-tls-checks --ignore-main-redirect --url $targ -o WP-scan-results/WPscan-$(echo $targ | sed 's/\//_/g' | sed 's/\:/_/g').txt; done
This is going to take a while to run depending on how large the domain.txt file is because we are running WPscan on each individual domain to validate which domains are truly running WordPress as our domain.txt file still contains false positives.
Once the first WPscan is finished. If we look at the scan-results directory we will see that there is a huge list of .txt files with the output of each domain. So we will need to filter these down to only the domains that were validated by WPscan. This can be done with the following command:
$ cat WPscan-* | grep -E '^.+\+.+URL:' | grep -v Effective | cut -d “ “ -f 3 | tee WPscan-detected-sites.txt
At this point you now have an accurately identified list of sites running WordPress.