Class: Arachni::Analyzer

Inherits:
Object
  • Object
show all
Includes:
UI::Output
Defined in:
lib/analyzer.rb

Overview

Analyzer class

Analyzes HTML code extracting forms, links and cookies depending on user opts.

It grabs all element attributes not just URLs and variables.
All URLs are converted to absolute and URLs outside the domain are ignored.

Forms

Form analysis uses both regular expressions and the Nokogiri parser
in order to be able to handle badly written HTML code, such as not closed
tags and tag overlaps.

In order to ease audits, in addition to parsing forms into data structures
like “select” and “option”, all auditable inputs are put under the
“auditable” key.

Links

Links are extracted using the Nokogiri parser.

Cookies

Cookies are extracted from the HTTP headers and parsed by WEBrick::Cookie

@author: Anastasios “Zapotek” Laskos

                                     <tasos.laskos@gmail.com>
                                     <zapotek@segfault.gr>

@version: 0.1-pre

Instance Attribute Summary (collapse)

Instance Method Summary (collapse)

Methods included from UI::Output

#debug!, #debug?, #only_positives!, #only_positives?, #print_debug, #print_debug_backtrace, #print_debug_pp, #print_error, #print_info, #print_line, #print_ok, #print_status, #print_verbose, #verbose!, #verbose?

Constructor Details

- (Analyzer) initialize(opts)

Constructor
Instantiates Analyzer class with user options.

Parameters:



95
96
97
98
99
100
101
102
103
104
105
# File 'lib/analyzer.rb', line 95

def initialize( opts )
    @url = ''
    @opts = opts
    @structure = Hash.new
    @structure['forms']   = []
    @structure['links']   = []
    @structure['cookies'] = []
    @structure['headers'] = []
    
    @cookies = []
end

Instance Attribute Details

- (Array<Hash <String, String> >) cookies (readonly)

Array of extracted cookies

Returns:

  • (Array<Hash <String, String> >)


74
75
76
# File 'lib/analyzer.rb', line 74

def cookies
  @cookies
end

- (Array<Hash <String, String> >) forms (readonly)

Array of extracted HTML forms

Returns:

  • (Array<Hash <String, String> >)


62
63
64
# File 'lib/analyzer.rb', line 62

def forms
  @forms
end

- (Array<String>) headers (readonly)

Array of valid HTML headers

Returns:

  • (Array<String>)


80
81
82
# File 'lib/analyzer.rb', line 80

def headers
  @headers
end

Array of extracted HTML links

Returns:

  • (Array<Hash <String, String> >)


68
69
70
# File 'lib/analyzer.rb', line 68

def links
  @links
end

- (Options) opts (readonly)

Options instance

Returns:



87
88
89
# File 'lib/analyzer.rb', line 87

def opts
  @opts
end

- (Hash<String, Hash<Array, Hash>>) structure (readonly)

Structure of the html elements in Hash format

Returns:

  • (Hash<String, Hash<Array, Hash>>)


56
57
58
# File 'lib/analyzer.rb', line 56

def structure
  @structure
end

- (String) url

The url of the page

Returns:

  • (String)

    the url of the page



50
51
52
# File 'lib/analyzer.rb', line 50

def url
  @url
end

Instance Method Details

- (Array<Hash <String, String> >) get_cookies(headers)

Extracts cookies from an HTTP headers

Parameters:

  • (String) headers

    HTTP headers

Returns:

  • (Array<Hash <String, String> >)

    of cookies



301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
# File 'lib/analyzer.rb', line 301

def get_cookies( headers )
    cookies = WEBrick::Cookie.parse_set_cookies( headers )
    
    cookies_arr = []

    cookies.each_with_index {
        |cookie, i|
        cookies_arr[i] = Hash.new

        cookie.instance_variables.each {
            |var|
            value = cookie.instance_variable_get( var ).to_s
            value.strip!
            
            key = normalize_name( var )
            val = value.gsub( /[\"\\\[\]]/, '' )

            cookies_arr[i][key] = val
        }
        
        # detect when a cookie has been updated and discard the old one
        @cookies.reject!{ |cookie| cookie['name'] == cookies_arr[i]['name'] }
        
    }

    return cookies_arr
end

- (Array<Hash <String, String> >) get_forms(html)

TODO: Add support for radio buttons.

Extracts forms from HTML document

Parameters:

  • (String) html

Returns:

  • (Array<Hash <String, String> >)

    array of forms

See Also:



193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
# File 'lib/analyzer.rb', line 193

def get_forms( html )

    elements = []

    begin
        
        #
        # This imitates Firefox's behavior when it comes to
        # broken/unclosed form tags
        #
        
        # get properly closed forms
        forms = html.scan( /<form(.*?)<\/form>/ixm ).flatten
        
        # now remove them from html...
        forms.each {
            |form|
            html = html.gsub( form, '' )
        }
        
        # and get unclosed forms.
        forms |= html.scan( /<form (.*)(?!<\/form>)/ixm ).flatten
        
    rescue Exception => e
        print_error( "Error: Couldn't get forms from '" + @url +
        "' [" + e.to_s + "]" )
        return {}
    end

    i = 0
    forms.each {
        |form|

        elements[i] = Hash.new
        elements[i]['attrs']    = get_form_attrs( form )
        
        if( !elements[i]['attrs'] || !elements[i]['attrs']['action'] )
            action = @url.to_s
        else
            action = elements[i]['attrs']['action']
        end
            
        elements[i]['attrs']['action'] = to_absolute( action )

        if( !elements[i]['attrs']['method'] )
            elements[i]['attrs']['method'] = 'post'
        else
            elements[i]['attrs']['method'] =
                elements[i]['attrs']['method'].downcase
        end
            
        elements[i]['attrs']['action'] = to_absolute( action )
            
        if !in_domain?( URI.parse( elements[i]['attrs']['action'] ) )
            next
        end
        
        elements[i]['textarea'] = get_form_textareas( form )
        elements[i]['select']   = get_form_selects( form )
        elements[i]['input']    = get_form_inputs( form )

        # merge the form elements to make auditing easier
        elements[i]['auditable'] = 
            elements[i]['input'] | elements[i]['textarea']
        
        elements[i]['auditable'] =
            merge_select_with_input( elements[i]['auditable'],
                elements[i]['select'] )
        
        i += 1 
    }

    elements
end

- (Hash) get_headers

Returns a list of valid auditable HTTP header fields.

It’s more of a placeholder method, it doesn’t actually analyze anything.
It’s a long shot that any of these will be vulnerable but better be safe than sorry.

Returns:

  • (Hash)

    HTTP header fields



165
166
167
168
169
170
171
172
173
174
175
176
177
# File 'lib/analyzer.rb', line 165

def get_headers( )
    return {
        'accept'          => 'text/html,application/xhtml+xml,application' +
            '/xml;q=0.9,*/*;q=0.8',
        'accept-charset'  => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
        'accept-language' => 'en-gb,en;q=0.5',
        'accept-encoding' => 'gzip;q=1.0,deflate;q=0.6,identity;q=0.3',
        'from'       => @opts.authed_by,
        'user-agent' => @opts.user_agent,
        'referer'    => @url,
        'pragma'     => 'no-cache'
    }
end

Extracts variables and their values from a link

Parameters:

  • (String) link

Returns:

  • (Hash)

    name=>value pairs

See Also:



338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
# File 'lib/analyzer.rb', line 338

def get_link_vars( link )
    if !link then return {} end

    var_string = link.split( /\?/ )[1]
    if !var_string then return {} end

    var_hash = Hash.new
    var_string.split( /&/ ).each {
        |pair|
        name, value = pair.split( /=/ )
        var_hash[name] = value
    }

    var_hash

end

Extracts links from HTML document

Parameters:

  • (String) html

Returns:

  • (Array<Hash <String, String> >)

    of links

See Also:



277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
# File 'lib/analyzer.rb', line 277

def get_links( html )
    links = []
    get_elements_by_name( 'a', html ).each_with_index {
        |link, i|

        link['href'] = to_absolute( link['href'] )
        
        if !link['href'] then next end
        if( exclude?( link['href'] ) ) then next end
        if( !include?( link['href'] ) ) then next end    
        if !in_domain?( URI.parse( link['href'] ) ) then next end
            
        links[i] = link
        links[i]['vars'] = get_link_vars( link['href'] )
    }
end

- (Hash<String, Hash<Array, Hash>>) run(url, html, headers)

Runs the Analyzer and extracts forms, links and cookies

Parameters:

  • (String) url

    the url of the HTML code, mainly used for debugging

  • (String) html

    HTML code to be analyzed

  • (Hash) headers

    HTTP headers

Returns:

  • (Hash<String, Hash<Array, Hash>>)

    HTML elements



116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
# File 'lib/analyzer.rb', line 116

def run( url, html, headers )

    @url = url

    msg = "["

    elem_count = 0
    if @opts.audit_forms
        @structure['forms'] = get_forms( html )
        elem_count += form_count = @structure['forms'].length
        msg += "Forms: #{form_count}\t"
    end

    if @opts.audit_links
        @structure['links'] = get_links( html )
        elem_count += link_count = @structure['links'].length
        msg += "Links: #{link_count}\t"
    end

    if @opts.audit_cookies
        cookies << get_cookies( headers['set-cookie'].to_s )
        cookies.flatten!.uniq!
        @structure['cookies'] = cookies 
            
        elem_count += cookie_count =  @structure['cookies'].length
        msg += "Cookies: #{cookie_count}\t"
    end

    if @opts.audit_headers
        @structure['headers'] = get_headers( )
        elem_count += header_count = @structure['headers'].length
        msg += "Headers: #{header_count}"
    end
    
    msg += "]\n\n"
    print_verbose( msg ) if !only_positives?

    return @structure
end