The purpose of surveying the application is to generate a complete picture of the content, components, function, and flow of the web site in order to gather clues about where underlying vulnerabilities might be.
Depending on your level of experience, you should be able to recognize quickly what language the site is written in, basic site structure, use of dynamic content, and so on.
This section will present a basic approach to web application profiling comprised of
the following key tasks:
• Manual inspection
• Search tools
• Automated crawling
• Common web application profiles
I. Manual Inspection
The first thing we usually do to profile an application is a simple click-through. Become familiar with the site, look for all the menus, and watch the directory names in the URL change as you navigate.
Web applications are complex. They may contain a dozen files, or they may contain a dozen well-populated directories. Therefore, documenting the application’s structure in a well-ordered manner helps you track insecure pages and provides a necessary reference for piecing together an effective attack.
1. Documenting the Application
Opening a text editor is the first step, store information about every page in the application. We suggest documenting things such as:
• Page name
• Full path to the page
• Does the page require authentication?
• Does the page require SSL/TLS?
• GET/POST arguments
A partially completed matrix may look similar to table:
Page Path Auth? SSL? GET/POST Comments Index.html / N N Login.asp /login/ N Y POST password Main auth page Company.html /about/ N N Company info
Some other information you should consider recording in your matrix/flowchart
includes the following:
• Statically and dynamically generated pages
• Directory structure
• Common file extensions
• Common files
• Helper files
• Java classes and applets
• Flash and Silverlight objects
• HTML source code
• Query strings and parameters
• Common cookies
• Backend access points
2. Statically and Dynamically Generated Pages
Static pages are the generic .html files usually relegated to FAQs and contact information. They may lack functionality to attack with input validation tests, but the HTML source may contain comments or information. At the very least, contact information reveals e-mail addresses and usernames. Dynamically generated pages (.asp, .jsp, .php, etc.) are more interesting. Record a short comment for interesting pages such as “administrator functions,” “user profile information,” or “cart view.”
As we noted earlier, as you manually profile an application, it’s a good idea to mirror the structure and content of the application to local disk. For example, if www.victim.com has an /include/database.inc file, then create a top-level directory called “www.victim.com” and a subdirectory called “include”, and place the database.inc file in theinclude directory. The text-based browser, lynx, can accelerate this process:
[[email protected] ]# mkdir www.victim.com [[email protected] ]# cd www.victim.com [[email protected] www.victim.com]# lynx –dump www.victim.com/index.html >index.html
Netcat is even better because it will also dump the server headers:
[[email protected] ]# mkdir www.victim.com [[email protected] ]# cd www.victim.com [[email protected] www.victim.com]# echo -e "GET /index.html HTTP/1.0\n\n" | \> nc -vv www.victim.com 80 > index.html www.victim.com [192.168.33.101] 80 (http) open sent 27, rcvd 2683: NOTSOCK
You can download dynamically generated pages with the getit scripts as long as thepage does not require a POST request. This is an important feature because the contents of some pages vary greatly depending on the arguments they receive. Here’s another example; this time getit.sh retrieves the output of the same menu.asp page, but for two different users:
[[email protected] main]# getit.sh www.victim.com \ > /main/menu.asp?userID=002 > menu.002.asp www.victim.com [192.168.33.101] 80 (http) open sent 40, rcvd 3654: NOTSOCK [[email protected] main]# getit.sh www.victim.com \ > /main/menu.asp?userID=007 > menu.007.asp www.victim.com [192.168.33.101] 80 (http) open sent 40, rcvd 5487: NOTSOCK
Keep in mind the naming convention that the site uses for its pages. The naming convention provides an insight into the
programmers’ mindset. If you found a page called UserMenu.asp, chances are that a page called AdminMenu.asp also exists.
3. Directory Structure
The structure of a web application will usually provide a unique signature. Examining things as seemingly trivial as directory structure, file extensions, naming conventions used for parameter names or values, and so on, can reveal clues that will immediately identify what application is running. Obtaining the directory structure for the public portion of the site is trivial. After all, the application is designed to be surfed. However, don’t stop at the parts visible through the browser and the site’s menu selections. The web server may have directories for administrators, old versions of the site, backup directories, data directories, or other directories that are not referenced in any HTML code.
Other common directories to check include these:
• Directories that have supposedly been secured, either through SSL, authentication, or obscurity: /admin/ /secure/ /adm/ • Directories that contain backup files or log files: /.bak/ /backup/ /back// log/ /logs/ /archive/ /old/ • Personal Apache directories: /~root/ /~bob/ /~cthulhu/ • Directories for include files: /include/ /inc/ /js/ /global/ /local/ • Directories used for internationalization: /de/ /en/ /1033/ /fr/
Attempting to enumerate the directory structure can be an arduous process, but the getit scripts can help whittle any directory tree. Web servers return a non-404 error code when a GET request is made to a directory that exists on the server. The code might be 200, 302, or 401, but as long as it isn’t a 404 you’ve discovered a directory. The technique is simple:
[[email protected]]# getit.sh www.victim.com /isapi www.victim.com [192.168.230.219] 80 (http) open HTTP/1.1 302 Object Moved Location: http://tk421/isapi/ Server: Microsoft-IIS/5.0 Content-Type: text/html Content-Length: 148 <title>Document Moved</title> <h1>Object Moved</h1>This document may be found <a HREF="http:// tk-421/isapi/"> here</a>sent 22, rcvd 287: NOTSOCK
Another tool that can reduce time and effort when traversing a web application for hidden folders is OWASP DirBuster. DirBuster is a multithreaded Java application that is designed to brute-force directories and files on a web server. Based on a user-supplied dictionary file, DirBuster will attempt to crawl the application and guess at non-linked directories and files with a specific extension.
4. Common File Extensions
File extensions are a great indicator of the nature of an application. File extensions are used to determine the type of file, either by language or its application association. File extensions also tell web servers how to handle the file.
Common File Extensions and the Application or Technology That Typically Uses Them:
Application/Technology Common File Extension ColdFusion .cfm ASP.NET .aspx Lotus Domino .nsf ASP .asp WebSphere .d2w PeopleSoft .GPL BroadVision .do Oracle App Server .show Perl .pl CGI .cgi Python .py PHP .php/.php3/.php4 SSI .shtml Java .jsp/.java
5. Common Files
Most software installations will come with a number of well-known files, for instance:
By searching every folder and subfolder in a site, you might just hit on plenty of useful information that will tell you what applications and versions are running and a nice URL that will lead you to a download page for software and updates.
6. Helper Files
• Cascading Style Sheets CSS files (.css) instruct the browser on how to format
text. They rarely contain sensitive information, but enumerate them anyway.
• XML Style Sheets Applications are turning to XML for data presentation.
Style sheets (.xsl) define the document structure for XML requests and
formatting. They tend to have a wealth of information, often listing database
fields or referring to other helper files.
of it is embedded in the actual HTML file, but individual files also exist.
session handling. In addition to enumerating these files, it is important to note
what types of functions the file contains.
• Include Files On IIS systems, include files (.inc) often control database access
or contain variables used internally by the application. Programmers love to
place database connection strings in this file—password and all!
• The “Others” References to ASP, PHP, Perl, text, and other files might be in
the HTML source.
Try common file suffixes and directives:
.asp .css .file .htc .htw .inc #include .js .php .pl script .txt virtual .xsl
[[email protected] tb]# getit.sh www.victim.com /tb/tool.php > tool.php
[[email protected] tb]# grep js tool.php
www.victim.com [192.168.189.113] 80 (http) open
var ss_path = “aw/pics/js/”; // and path to the files
7. HTML Source Code
HTML source code can contain numerous juicy tidbits of information.
HTML Comments The most obvious place attackers look is in HTML comments, special sections of source code where the authors often place informal remarks that can be quite revealing. The <– characters mark all basic HTML comments.
This process can reveal even more interesting information, including:
• Filename-like comments You will typically see plenty of comments with template filenames tucked in them. Download them and review the template code. You never know what you might find.
• Old code Look for links that might be commented out. They could point to an old portion of the web site that could contain security holes. Or maybe the link points to a file that once worked, but now, when you attempt to access it, a very
revealing error message is displayed.
• Auto-generated comments A lot of comments that you might see are automatically generated by web content software. Take the comment to a search engine and see what other sites turn up those same comments. Hopefully, you’ll discover what software generated the comments and learn useful information.
• The obvious We’ve seen things like entire SQL statements, database
passwords, and actual notes left for other developers in files such as IRC chat
logs within comments.
Other HTML Source Nuggets Don’t stop at comment separators. HTML source has all kinds of hidden treasures. Try searching for a few of these strings:
SQL Select Insert #include #exec Password Catabase Connect //
Some final thoughts on HTML source-sifting: the rule of thumb is to look for anything that might contain information that you don’t yet know. When you see some weirdlooking string of random numbers within comments on every page of the file, look into it. Those random numbers could belong to a media management application that might have a web-accessible interface. The tiniest amount of information in web assessments can bring the biggest breakthroughs. So don’t let anything slide by you, no matter how insignificant it may seem at first.
Forms are the backbone of any web application.
When manually inspecting an application, note every page with an input field. You can find most of the forms by a click-through of the site. However, visual confirmation is not enough. Once again, you need to go to the source. For our command-line friends who like to mirror the entire site and use grep, start by looking for the simplest indicator of a form, its tag. Remember to escape the < character since it has special meaning on the command line:
[[email protected]]# getit.sh www.victim.com /index.html | grep -i \<form www.victim.com [192.168.33.101] 80 (http) open sent 27, rcvd 2683: NOTSOCK <form name="gs" method="GET" action="/search">
So when inspecting a page’s form, make notes about all of its aspects:
• Method Does it use GET or POST to submit data? GET requests are easier to manipulate on the URL.
• Action What script does the form call? What scripting language was used (.pl, .sh, .asp)? If you ever see a form call a script with a .sh extension (shell script), mark it. Shell scripts are notoriously insecure on web servers.
• Maxlength Are input restrictions applied to the input field? Length restrictions are trivial to bypass.
• Hidden Was the field supposed to be hidden from the user? What is the value
of the hidden field? These fields are trivial to modify.
• Autocomplete Is the autocomplete tag applied? Why? Does the input field ask
for sensitive information?
• Password Is it a password field? What is the corresponding login field?
Query Strings and Parameters
Perhaps the most important part of a given URL is the query string, the part following the question mark (in most cases) that indicates some sort of arguments or parameters being fed to a dynamic executable or library within the application. An example is shown here:
This shows the parameter searchTerm with the value test being fed to the search.cgi executable on this site.
You can manipulate parameter values to attempt to impersonate other users, obtain restricted data, run arbitrary system commands, or execute other actions not intended by the application developers.
Fingerprinting Query Strings: In complex and customized applications, however, this rule does not always apply. So one of the first things that you need to do is to identify the paths, filenames, and parameters.
Common Query String Structure
|/file.xxx?paramname=paramvalue||Simple, standard URL parameter structure|
|/folder/filename/paramname=paramvalue||Filename here looks like a folder.|
|/folder/file/paramname¶mvalue||Equal sign is represented by &.|
|/folder/(SessionState)/file/paramvalue||Session state kept in the URL—it’s hard to determine where a file, folder, or parameter starts or ends.|
Analyzing Query Strings and Parameters Collecting query strings and parameters is a complicated task that is rarely the same between two applications. As you collect the variable names and values, watch for certain trends. We’ll use the following example (again) to illustrate some of these important trends:
There are three interesting things about these parameters:
• The resultPage value is equal to the search term—anything that takesuser input and does something other than what it was intended for is a good prospect for security issues.
• The name resultPage brings some questions to mind. If the value of this parameter does not look like a URL, perhaps it is being used to create a file or to tell the application to load a file named with this value.
• The thing that really grabs our attention, however, is db=/templates/db/archive.db, which we’ll discuss next.
Attack Attempts and Implications
|db=/../../../../etc/passwd||File retrieval possible? Pass in boot.ini or some other file if it’s win32.|
|db=/templates/db/||Can we get a directory listing or odd error?|
|db=/templates/db/%00||Use the NULL byte trick to grab a directory listing or other odd errors.|
|db=/templates/db/junk.db||What happens when we pass in an invalid database name?|
|db=|ls or db=|dir||Attempt to use the old Perl pipe trick.|
|db=||Always try blank.|
|db=*||If we use \*, will it search all the databases in the configuration?|
|db=/search.cgi||What happens if we give it an existing filename on the web site? Might dump source code?|
|http://www.site.com/templates/db/ archive.db||Can we just download the DB file directly?|
|http://www.site.com/templates/db/||Can we retrieve a directory listing?|
Here are some other common query string/parameter “themes” that might indicate potentially vulnerable application logic:
• User identification
• Session identification
• Database queries
• Look for encoded/encrypted values
• Boolean arguments
In the next post, we will discuss about Search Tools for Profiling in details.