Search Tools for Profiling
Search engines have always been a hacker’s best friend.
Our personal favorite is Google. Here are some of the basic techniques we employ when taking a search engine–based approach to web application profiling (the following examples are based on Google’s syntax):
• Search for a specific web site using “site:www.victim.com” (with the quotation marks) to look for URLs that contain www.victim.com.
• Search for pages related to a specific web site using related:www.victim.com to return more focused results related to www.victim.com.
• Examine the “cached” results that pull the web page’s contents out of Google’s archive. Thus, you can view a particular page on a site without leaving the comfort of www.google.com. It’s like a superproxy!
• Investigate search results links called similar pages. These work like the “related” keyword noted earlier.
• Examine search results containing newsgroup postings to see if any relevant information has been posted about the site. This might include users complaining about login difficulties or administrators asking for help about software components.
• Make sure to search using just the domain name such as site:victim.com. This can return search results such as “mail.victim.com” or “beta.victim.com”.
• To locate specific file types use the filetype operator, such as “filetype:swf”, which will filter the results to only include Flash SWF files that contain the corresponding keywords of your search.
You can read more in the post INFORMATION GATHERING USE GOOGLE DORK.
Open Source Intelligence
Beyond Google there are other search engines with a specific focus that can be invaluable in finding specific information. Whether you want to find information on a person or inquire about public records, chances are a specialized search engine has been made to find what you desire.
A tool that automates much of the effort in gathering such information is Maltego. Defined as an open source intelligence-gathering tool, Maltego helps visualize the relationships among people, organizations, web sites, Internet infrastructure, and many other links.
Maltego can aid in information gathering, and it can find affiliations between components within an organization. Even with information as simple as a domain name or an IP address, it can query publicly available records to discover connections. A complete list of the queries the tool can perform can be found at http://ctas.paterva.com/view/Category:Transforms.
The robots.txt file contains a list of directories that search engines such as Google are supposed to index or ignore. The file might even be on Google, or you can retrieve it from the site itself:
[[email protected]]# getit.sh www.victim.com /robots.txt
The Disallow tags instruct a cooperative spidering tool to ignore the directory. Tools and search engines rarely do. The point is that a robots.txt file provides an excellent snapshot of the directory structure—and maybe even some clear pointers toward misconfigurations that can be exploited later.
Automated Web Crawling
We’ve noted that one of the most fundamental and powerful techniques used in profiling is the mirroring of the entire application to a local copy that can be scrutinized slowly and carefully. We call this process web crawling, and web crawling tools are an absolute necessity when it comes to large-scale web security assessments.
Some other key positives of web crawling include the following:
• Spares tons of manual labor!
• Provides an easily browseable, locally cached copy of all web application components, including static pages, executables, forms, and so on.
• Enables easy global keyword searches on the mirrored content (think “password” and other tantalizing search terms).
• Provides a high-level snapshot that can easily reveal things such as naming conventions used for directories, files, and parameters.
As powerful as web crawling is, it is not without its drawbacks. Here are some things that it doesn’t do very well:
• Forms Crawlers, being automated things, often don’t deal well with filling
in web forms designed for human interaction.
• Complex flows Usually, crawling illustrates logical relationships among directories, files, and so on. But some sites with unorthodox layouts may defy simple interpretation by a crawler and require that a human manually clicks through the site.
• State problems Attempting to crawl an area within a web site that requires web-based authentication is problematic. Most crawlers run into big trouble when they’re asked to maintain logged-in status during the crawl.
• Broken HTML/HTTP A lot of crawlers attempt to follow HTTP and HTML specifications when reviewing an application, but a major issue is that no web application follows an HTML specification. In fact, a broken link from a web site could work in one browser but not another.
• Web services As more applications are designed as loosely coupled series of services, it will become more difficult for traditional web crawlers to determine relationships and trust boundaries among domains. Many modern web applications rely on a web-based API to provide data to their clients. Traditional crawlers will not be able to execute and map an API properly without explicit instructions on how execution should be performed.
Web Crawling Tools
Here are our favorite tools to help automate the grunt work of the application survey. They are basically spiders that, once you point to an URL, you can sit back and watch them create a mirror of the site on your system. Remember, this will not be a functional
replica of the target site with ASP source code and database calls; it is simply a complete collection of every available link within the application. These tools perform most of the grunt work of collecting files.
The –dump option is useful for its “References” section.
[[email protected]]# lynx –dump https://www.victim.com > homepage [[email protected]]# cat homepage ...text removed for brevity... References 1. http://www.victim.com/signup?lang=en 2. http://www.victim.com/help?lang=en 3. http://www.victim.com/faq?lang=en 4. http://www.victim.com/menu/ 5. http://www.victim.com/preferences?anon 6. http://www.victim.com/languages 7. http://www.victim.com/images/
If you want to see the HTML source instead of the formatted page, then use the –source option. Two other options, –crawland* –traversal*, will gather the formatted HTML and save it to files.
Wget Wget (www.gnu.org/software/wget/wget.html) is a command-line tool for Windows and UNIX that will download the contents of a web site. Its usage is simple:
[[email protected]]# wget -r www.victim.com --18:17:30-- http://www.victim.com/ => `www.victim.com/index.html' Connecting to www.victim.com:80... connected! HTTP request sent, awaiting response... 200 OK Length: 21,924 [text/html] 0K .......... .......... . 100% @ 88.84 KB/s 18:17:31 (79.00 KB/s) - `www.victim.com/index.html' saved [21924/21924] Loading robots.txt; please ignore errors. --18:17:31-- http://www.victim.com/robots.txt => `www.victim.com/robots.txt' Connecting to www.victim.com:80... connected! HTTP request sent, awaiting response... 200 OK Length: 458 [text/html] 0K 100% @ 22.36 KB/s ...(continues for entire site)...
The -r or –recursive option instructs wget to follow every link on the home page. This will create a www.victim.com directory and populate that directory with every HTML file and directory wget finds for the site. A major advantage of wget is that
it follows every link possible. Thus, it will download the output for every argument that the application passes to a page.
Some sites may require more advanced options such as support for proxies and HTTP basic authentication. Sites protected by basic authentication can be spidered by:
[[email protected]]# wget -r --http-user:dwayne --http-pass:woodelf \> https://www.victim.com/secure/ --22:12:13-- https://www.victim.com/secure/ => 'www.victim.com/secure/index.html' Connecting to www.victim.com:443... connected! HTTP request sent, awaiting response... 200 OK Length: 251 [text/html] 0K 100% @ 21.19 KB/s ...continues for entire site...
Burp Suite Spider Burp Suite is a set of attack tools that includes a utility for mapping applications. Rather than having to follow links manually, submitting forms, and parsing the responses, the Burp Spider will automatically gather this information to help identify
potentially vulnerable functionality in the web application.
Teleport Pro Of course, for Windows users there is always something GUI. Teleport Pro(www.tenmax.com/teleport/pro/home.htm) brings a graphical interface to the function of wget and adds sifting tools for gathering information.
Black Widow Black Widow extends the capability of Teleport Pro by providing an interface for searching and collecting specific information. The other benefit of Black Widow is that you can download the files to a directory on your hard drive. This directory
is more user-friendly to tools like grep and findstr.
Common Web Application Profiles
Oracle Application Server
Most Oracle applications contain a main subfolder called /pls/. This /pls/ folder is actually Oracle’s PL/SQL module, and everything that follows it are call parameters. To help you understand, take a look at this Oracle Application URL:
In this example, /pls/ is the PL/SQL gateway; /Index/ is the Database Access Descriptor; and CATALOG. is a PL/SQL package that has the PROGRAM_TEXT_RPT procedure, which accepts the parameters on the rest of the URL.
REFERENCES & FURTHER READING
Relevant Vendor Bulletins
|Internet Information Server
Returns IP Address in HTTP
Web Server/App Firewalls
|Teros application firewalls||http://www.teros.com|
|F5’s TrafficShield Application||Firewall: http://www.f5.com|
Web Search Engines
Web Crawling and
|Offline Explorer Pro||http://www.metaproducts.com|
|Burp Suite||Burp Suite|
|HTML 4.01 FORM
|PHP scripting language||http://www.php.net/|
|The File Extension Source, a
database of file extensions and
the programs that use them
(Reference: Hacking Exposed Web Applications 3rd)