Tuesday, August 7, 2012

How to use HP WebInspect to scan only a part of a web application

I saw a thread on-line regarding this topic, and realized my response would be too large and look silly stuffed into the post comments.  So I will rewrite and expand my response here.  No offense meant, Rohit.

WebInspect is highly configurable for whatever situation you may be encountering.  Please do not feel that you have to perform all of these configurations.  They are just there for when you need them.


Question:  Can anyone explain the way to scan only a certain part of a web application using WebInspect?

Answer:  There are many ways to "shape" your scan with WebInspect (currently at version 9.20), depending on what you are faced with and your end-goal.  I will review them from most common to lesser known (and least used).


Restrict To Folder:

This feature is not found in the scan settings, but it is on page one of the Scan Wizard.  By enabling this option, the user gets three sub-options as follows, defined in the Help guide.

  • Directory only - WebInspect will crawl and/or audit only the URL you specify. For example, if you select this option and specify a URL of www.mycompany/one/two/, WebInspect will assess only the "two" directory. 
  • Directory and subdirectories - WebInspect will begin crawling and/or auditing at the URL you specify, but will not access any directory that is higher in the directory tree.
  • Directory and parent directories - WebInspect will begin crawling and/or auditing at the URL you specify, but will not access any directory that is lower in the directory tree.
A common error with this Restrict To Folder feature is that the user may not realize that the Starting URL field defines the anchor point for their chosen Restriction. The specific folder is identified by the final portion of the Starting URL that is enclosed in slash marks ("/").  This means that both the Starting URLs "../folder1/folder2/" and "../folder1/folder2/index.html" would anchor to "folder2", but that the path "../folder1/folder2" would anchor onto "folder1".



Session Exclusions:

Let's say that rather than focusing on one area, you wish to omit it from being scanned.  For example, the /manuals/ folder within any default Apache installation is rife with samples and various junk text that will add time to your scan without appreciable results.  For that scenario, open the Session Exclusions scan settings panel and add an Exclusion, such as (URL contains "/manuals/"), and the scan should complete faster.  I had an international client use this to split a very large site separated into three languages (English, Chinese, Arabic) into three separate scans.  Their site structure was primarily segregated, so scanning with Session Exclusions for the other two languages kept the scan targeted to one language area.  Post-scan, they were able to combine these three scans into one report.

The Session Exclusions settings were expanded in WebInspect 9.10 or 9.20.  Besides simple keywords or parts of a URI path, the user can also exclude via a variety of Targets (POST parameter, Query parameter, Status Code, et al) and a variety of Matching styles (Contains, Regex, et al).


Scan Log:

As an added feature, when using the Restrict To Folder or Session Exclusions, you may want to watch Scan Log tab found at the bottom of the WebInspect UI (Summary Information pane).  This area should display informational messages when pages found are removed from the testing coverage due to one of your scan configuration.


Scan Methods:

Stepping out of the settings, the Scan Wizard itself offers a variety of methods for controlling the scan, currently found on page one of the wizard.

Crawl-Only:  This will only perform a Discovery of the target, perhaps with some passive auditing of keywords seen in the traffic.   When completed (or Paused), the user can deselect the undesirable pages or folders in the Site Tree and then proceed to the Audit phase by pressing the Audit button found in the toolbar area.  Bear in mind that each folder is selected individually.  To deselect entire branches, use the right-click menu for additional options.

Audit-Only:  Switching from the default Crawl-and-Audit to Audit-Only will prevent the crawl, or discovery, of the rest of the website.

Manual Step-Mode:  This option turns off automated Crawling.  WebInspect will turn itself into a localhost proxy and spawn an instance of IE.   The user will be performing the discovery phase by hand, by browsing.  When finished, return to WebInspect and click the Finish button found at the top of the Site Tree.  You now have an opportunity to deselect undesired folders or branches in the Site Tree (right-click menu!), and then proceed to the Audit phase by clicking the Audit button found in the toolbar area.

List-Driven Scan:  This is a different style from the automated scan where the crawl engine is provided a list of known URLs at the onset of the scan.  This list can be a XML file listing all of the web root files harvested from the target server by its administrator, or it can be a simple TXT file with one full URL per line.  This style of scan can be used with the Audit-Only method to only attack the pages in the list.  Or it can be used with the Crawl-and-Audit to force-feed the crawl engine with that list input.

Workflow-Driven Scan: Identical to the List-Driven scan, except the input used is a pre-recorded "Start Macro", a browser capture of your desired business process.  Using this Macro feeds the crawler or the Audit-Only with the recorded sessions, and then the scan continues from there.


Scan Policy - Audit Only:

The Audit-Only scan method sounds good, but it is not foolproof.  Currently most of the scan Policies you may use for your Audit will perform a variety of forceful browsing and discovery checks.  To completely disable this possible expansion of your scan target, you will need to make a custom copy of your desired scan Policy.  I will detail that in a separate posting.


Filters:


The defaults for Session Exclusions include a variety of exclusions where the URI may indicate a logout page, such as "exit", "logoff", and "logout".  But what if the offending data does not suit the Session Exclusion model, such as dynamically named folders?  For this, you could avoid that specific data by defining a HTTP Request filter that replaces the offending value or data in real-time.  With a regular expression and Filter, you could dynamically alter all live submissions such as "/pda2789/" (regex="\/pda(\d+)\/") to "/pdareplaced/", a nonsensical value that would cause the server to not identify or honor the request and thereby avoid scanning it.

 

~~~~ Habeas Data

No comments:

Post a Comment