It may be possible to find your content via Google hacking, otherwise known as Google Dorks, but this is one form of SEO it's wise to avoid. Indeed, the more you can get into the mind (and tools) of a hacker, the better protected you will be. This article provides some Google Dorks basics to help you understand and hopefully avoid the threat.
Despite the common nomenclature, Google hacking work with any search engine, from DuckDuckGo to Bing. They’re strings that shouldn’t be seen, found in places that shouldn’t be searched, and they cause havoc.
Dorks comprise strings fed to a search engine. Using these strings, the search engine returns all matches to the query. Try it: It’s easy to add the name of your own domain or IP addresses(s) to see if your organization has unwittingly exposed sensitive information. The results can be stunning (and not in a good way).
The search strings can be found in the Exploit Database, a wealth of resources for basic pen testing, exploits for patched (and unpatched) systems, and lots of code. It's important to note that, in many jurisdictions, it’s not legal to hack anyone but yourself.
Using a few of the exploits, I watched to see what I might find. Suffice to say there seem to be a lot of junior programmers and students out there who are leaving the door completely open on their directories.
For example, the query “https://www.google.com/search?q=filetype:sql%20intext:password%20|%20pass%20|%20passwd%20intext:username%20intext:INSERT%20INTO%20`users`%20VALUES” is pertinent to various SQL databases. Almost all of the results dutifully provided user names and passwords, including for students at prestigious universities like Harvard and MIT.
You may ask, “How did I get on this list?” The answer is simple, but the remedy may not be. Googlebots search the network to the absolute depths possible, often ignoring instructions not to.
Such instructions are contained in sites via robots.txt, .htaccess and other files that claim boundaries for search engines. The robots.txt file is frequently ignored, legally or not.
The .htaccess file is used by Apache and Ngnix web hosts (and others) as a boundary for accessibility via a web server application. A search engine can go around a web application, or even through it, if the security foundation underneath the web folders permit a web crawler to do so. They’ll go as deep as they can until they hit a wall, then gleefully go on to the next folder until they’re done. This is their job; they’re search and index engines, and they work by the thousands, 24/7.
You can go through the Exploit Database and append your own site-specific information to see if you’ve inadvertently set permissions or other security fundamentals incorrectly. Note that the particular query I used produced thousands of hits, and it's just one of hundreds of well-known search strings that reveal sensitive information.
Even junior coders can take the exploits and automate them, harvesting long lists of what are most likely mistakes in systems security settings.
The average site gets queried dozens, even thousands, of times a day, depending on its potential target value. What might hackers find at your IP address? It's better to leverage Google hacking to find out--and fix the issues--before they do.