Class: Anemone::Page

Inherits:
Object
  • Object
show all
Defined in:
lib/anemone/page.rb

Overview

Overides Anemone’s Page class methods:
o in_domain?( uri ): adding support for subdomain crawling
o links(): adding support for frame and iframe src URLs

@author: Anastasios “Zapotek” Laskos

                                     <tasos.laskos@gmail.com>
                                     <zapotek@segfault.gr>

@version: 0.1-pre

Instance Method Summary (collapse)

Instance Method Details

- (Object) doc

Nokogiri document for the HTML body



50
51
52
53
54
# File 'lib/anemone/page.rb', line 50

def doc
  return @doc if @doc
#      @doc = Nokogiri::HTML( @body ) if @body && html? rescue nil
  @doc = Nokogiri::HTML( @body ) if @body rescue nil
end

- (String) extract_domain(url)

Extracts the domain from a URI object

Parameters:

  • (URI) url

Returns:

  • (String)


78
79
80
81
82
83
84
85
86
87
# File 'lib/anemone/page.rb', line 78

def extract_domain( url )

    if !url.host then return false end
        
    splits = url.host.split( /\./ )

    if splits.length == 1 then return true end

    splits[-2] + "." + splits[-1]
end

- (Boolean) in_domain?(uri)

Returns true if uri is in the same domain as the page, returns false otherwise.

The added code enables optional subdomain crawling.

Returns:

  • (Boolean)


63
64
65
66
67
68
69
# File 'lib/anemone/page.rb', line 63

def in_domain?( uri )
    if( Arachni::Options.instance. )
        return extract_domain( uri ) ==  extract_domain( @url )
    end

    uri.host == @url.host
end

Array of distinct A tag HREFs and (i)frame SRCs from the page
The original links() method takes care of A tags and the added code takes care of (i)frame SRCs.



32
33
34
35
36
37
38
39
40
41
42
43
44
45
# File 'lib/anemone/page.rb', line 32

def links
    @links = old_links
    return @links if !doc
    
    doc.css('frame', 'iframe').each do |a|
        u = a.attributes['src'].content rescue nil
        next if u.nil? or u.empty?
        abs = to_absolute(URI(u)) rescue next
        @links << abs if in_domain?(abs)
    end
    
    @links.uniq!
    @links
end