Class: Anemone::Page
- Inherits:
-
Object
- Object
- Anemone::Page
- Defined in:
- lib/anemone/page.rb
Overview
Overides Anemone’s Page class methods:
o in_domain?( uri ):
adding support for subdomain crawling
o links(): adding support for
frame and iframe src URLs
@author: Anastasios “Zapotek” Laskos
<tasos.laskos@gmail.com> <zapotek@segfault.gr>
@version: 0.1-pre
Instance Method Summary (collapse)
-
- (Object) doc
Nokogiri document for the HTML body.
-
- (String) extract_domain(url)
Extracts the domain from a URI object.
-
- (Boolean) in_domain?(uri)
Returns true if uri is in the same domain as the page, returns false otherwise.
-
- (Object) links
(also: #old_links)
Array of distinct A tag HREFs and (i)frame SRCs from the page
The original links() method takes care of A tags and the added code takes care of (i)frame SRCs.
Instance Method Details
- (Object) doc
Nokogiri document for the HTML body
50 51 52 53 54 |
# File 'lib/anemone/page.rb', line 50 def doc return @doc if @doc # @doc = Nokogiri::HTML( @body ) if @body && html? rescue nil @doc = Nokogiri::HTML( @body ) if @body rescue nil end |
- (String) extract_domain(url)
Extracts the domain from a URI object
78 79 80 81 82 83 84 85 86 87 |
# File 'lib/anemone/page.rb', line 78 def extract_domain( url ) if !url.host then return false end splits = url.host.split( /\./ ) if splits.length == 1 then return true end splits[-2] + "." + splits[-1] end |
- (Boolean) in_domain?(uri)
Returns true if uri is in the same domain as the page, returns false otherwise.
The added code enables optional subdomain crawling.
63 64 65 66 67 68 69 |
# File 'lib/anemone/page.rb', line 63 def in_domain?( uri ) if( Arachni::Options.instance.follow_subdomains ) return extract_domain( uri ) == extract_domain( @url ) end uri.host == @url.host end |
- (Object) links Also known as: old_links
Array of distinct A tag HREFs and (i)frame SRCs from the page
The
original links() method takes care of A tags and the added code takes care
of (i)frame SRCs.
32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
# File 'lib/anemone/page.rb', line 32 def links @links = old_links return @links if !doc doc.css('frame', 'iframe').each do |a| u = a.attributes['src'].content rescue nil next if u.nil? or u.empty? abs = to_absolute(URI(u)) rescue next @links << abs if in_domain?(abs) end @links.uniq! @links end |