Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Web Reconnaissance

… is the foundation of a thorough security assessment and involves systematically and meticulously collecting information about a target website or web application.

Some primary goals:

  • Identifying Assets
  • Discovering Hidden Information
  • Analysing the Attack Surface
  • Gathering Intelligence

In active recon, the attacker directly interacts with the target system to gather information:

TechniqueExampleDescriptionToolsRisk of Detection
Port Scanningusing Nmap to scan a web server for open portsidentifying open ports and services running on the targetNmap, Masscan, UnicornscanHIGH: Direct interaction with the target can trigger IDS and firewalls
Vulnerability Scanningtunning Nessus against a web application to check for SQLi flaws or XSS vulnsprobing the target for known vulns, such as outdated software or misconfigurationsNessus, OpenVAS, NiktoHIGH: Vulnerability scanners send exploit payloads that security solutions can detect
Network Mappingusing traceroute to determine the path packets take to reach the target server, revealing potential network hops and infrastructuremapping the target’s network topology, including connected devices and their relationshipsTraceroute, NmapMEDIUM to HIGH: Excessive or unusual network traffic can raise suspicion
Banner Grabbingconnecting to a web server on port 80 and examining the HTTP banner to identify the web server software and versionretrieving information from banners displayed by services running on the targetNetcat, curlLOW: Banner grabbing typically involves minimal interaction that can still be logged
OS Fingerprintingusing Nmap’s OS detection capabilities (-O) to determine if the target is running Windows, Linux, or another OSidentifying the OS running on the targetNmap, Xprobe2LOW: OS fingerprinting is usually passive, but some advanced techniques can be detected
Service Enumerationusing Nmap’s service version detection (-sV) to determine if a web server is running Apache 2.4.50 or Nginx 1.18.0determining the specific versions of services running on open portsNmapLOW: Similar to banner grabbing, service enumeration can be logged but is less likely to trigger alerts
Web Spideringrunning a web crawler like Burp Spider or OWASP ZAP Spider to map out the structure of a website and discover hidden resourcescrawling the target website to identify web pages, directories, and filesBurp Suite Spider, OWASP ZAP Spider, ScrapyLOW to MEDIUM: Can be detected if the crawler’s behaviour is not carefully configured to mimic legitimate traffic

In passive recon information about the target is gathered without directly interacting with it.

TechniqueExampleDescriptionToolsRisk of Detection
Search Engine Queriessearching Google for “[Target Name] Employees” to find employee information or social media profilesutilising search engines to uncover information about the target, including websites, social media, profiles, social media profiles, and news articleGoogle, DuckDuckGo, Bing, Shodan, …VERY LOW: Search engine queries are normal internet activity and unlikely to trigger alerts
WHOIS Lookupperforming a WHOIS lookup on a target domain to find the registrant’s name, contact information, and name serversquerying WHOIS databases to retrieve domain registration detailswhois command-line tool, online WHOIS lookup servicesVERY LOW: WHOIS queries are legitimate and do not raise suspicion
DNSusing dig to enumerate subdomains of a target domainanalysing DNS records to identify subdomains, mail servers, and other infrastructuredig, nslookup, host, dnsenum, fierce, dnsreconVERY LOW: DNS queries are essential for internet browsing and are not typically flaggedd as suspicious
Web Archive Analysisusing the wayback machine to view past versions of a target website to see how it has changed over timeexamining historical snapshots of the target’s website to identify vulnerabilities, or hidden informationWayback MachineVERY LOW: Accessing archived versions of a website is a normal activity
Social Media Analysissearching LinkedIn for employees of a target organisation to learn about their roles, responsibilities, and potential social engineering targetsgathering information from social media platforms like LinkedIn, Twitter, and FacebookLinkedIn, Twitter, Facebook, specialised OSINT ToolsVERY LOW: Accessing public social media profiles is not considered intrusive
Code Repossearching GitHub for code snippets or repos related to the target that might contain sensitive information or code vulnerabilitiesanalysing publicly accessible code repos like GitHub for exposed credentials or vulnsGitHub, GitLabVERY LOW: Code repos are meant for public access, and searching them is not suspicious

WHOIS

… is a widely used query and response protocol designed to access databases that store information about registered internet resources.

Example:

d41y@htb[/htb]$ whois inlanefreight.com

[...]
Domain Name: inlanefreight.com
Registry Domain ID: 2420436757_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.registrar.amazon
Registrar URL: https://registrar.amazon.com
Updated Date: 2023-07-03T01:11:15Z
Creation Date: 2019-08-05T22:43:09Z
[...]

A WHOIS record typically contains:

  • Domain Name: domain name itself
  • Registrar: company where the domain was registered
  • Registrant Contact: person or organization that registered the domain
  • Administrative Contact: person responsible for managing the domain
  • Technical Contact: person handling technical issues related to the domain
  • Creation and Expiration Dates: when the domain was registered and when it’s set to expire
  • Name Servers: servers that translate the domain name into an IP address

Facebook Example:

d41y@htb[/htb]$ whois facebook.com

   Domain Name: FACEBOOK.COM
   Registry Domain ID: 2320948_DOMAIN_COM-VRSN
   Registrar WHOIS Server: whois.registrarsafe.com
   Registrar URL: http://www.registrarsafe.com
   Updated Date: 2024-04-24T19:06:12Z
   Creation Date: 1997-03-29T05:00:00Z
   Registry Expiry Date: 2033-03-30T04:00:00Z
   Registrar: RegistrarSafe, LLC
   Registrar IANA ID: 3237
   Registrar Abuse Contact Email: abusecomplaints@registrarsafe.com
   Registrar Abuse Contact Phone: +1-650-308-7004
   Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited
   Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
   Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
   Domain Status: serverDeleteProhibited https://icann.org/epp#serverDeleteProhibited
   Domain Status: serverTransferProhibited https://icann.org/epp#serverTransferProhibited
   Domain Status: serverUpdateProhibited https://icann.org/epp#serverUpdateProhibited
   Name Server: A.NS.FACEBOOK.COM
   Name Server: B.NS.FACEBOOK.COM
   Name Server: C.NS.FACEBOOK.COM
   Name Server: D.NS.FACEBOOK.COM
   DNSSEC: unsigned
   URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
>>> Last update of whois database: 2024-06-01T11:24:10Z <<<

[...]
Registry Registrant ID:
Registrant Name: Domain Admin
Registrant Organization: Meta Platforms, Inc.
[...]

Domain Name System (DNS)

… acts as the internet’s GPS, guiding your online journey from memorable landmarks (domain names) to precise numerical coordinates (IP addresses).

DNS Workflow

flowchart LR
    A[Checks Cache]
    B[IP Found]
    C[Sends DNS Query to Resolver]
    D[Checks Cache]
    E[Recursive Lookup]
    F[Root Name Server]
    G[TLD Name Server]
    H[Authoritative Name Server]
    I[Returns IP to Computer]
    J[Connects to Website]

    subgraph my_computer[My Computer]
        style my_computer fill:#f0f8ff, stroke:#000000, stroke-width:2px, color:black
        A --> B
        B --> |Yes| J
        B --> |No| C
        C --> D
    end

    subgraph dns_resolver[DNS Resolver]
        style dns_resolver fill:#fffacd, stroke:#000000, stroke-width:2px, color:black
        D --> |No| E
        D --> |Yes| I
    end

    E --> F
    F --> G
    G --> H
    H --> I
    I --> J
  1. Computer asks for directories
  2. DNS Resolver checks its map
  3. Root name server points the way
  4. TLD name server narrows it down
  5. Authoritative name server delivers the address
  6. DNS Resolver returns the Information
  7. Computer connects

Hosts-File

… is a simple text file used to map hostnames to IP addresses, providing a manual method of domain name resolution that bypasses the DNS process. While DNS automates the translation of domain IP addresses, the hosts-file allows for direct, local ovverrides. This can be particularly useful for development, troubleshooting, or blocking websites. It is located in:

Linux/etc/hosts
WindowsC:\Windows\System32\drivers\etc\hosts

… and can look like this example:

127.0.0.1       localhost
192.168.1.10    devserver.local

Key DNS Concepts

Key concepts:

DNS conceptexampledescription
Domain Namewww.example.coma human-readable label for a website or other internet resource
IP Address192.0.2.1a unique numerical identifier assigned to each device connected to the internet
DNS Resolveryour ISP’s DNS server or public resolver like Google DNSa server that translates domain names into IP addresses
Root Name Serverthere are 13 root servers worldwide, named A-M: a.root-server.netthe top-level servers in the DNS hierarchy
TLD Name ServerVerisign for .com, PIR for .orgservers responsible for specific top-level domains
Authoritative Name Serveroften managed by hosting providers or domain registrarsthe server that holds the actual IP address for a domain
DNS Record TypesA, AAAA, CNAME, MX, NS, TXT, …different types of information stored in DNS

DNS Record Types:

Record TypeFull NameZone File ExampleDescription
AAddress Recordwww.example.com IN A 192.0.2.1maps a hostname to its IPv4 address
AAAAIPv6 Address Recordwww.example.com in AAAA 2001:db8:85a3::8a2e:370:7334maps a hostname to its IPv6 address
CNAMECanonical Name Recordblog.exmaple.com IN CNAME webserver.example.netcreates an alias for a hostname, pointing it to another hostname
MXMail Exchange Recordexample.com IN MX 10 mail.example.comsepcifies the mail server(s) responsible for handling email for the domain
NSName Server Recordexample.com IN NS ns1.example.comdelegates a DNS zone to a specific authoritative name server
TXTText Recordexample.com IN TXT "v=spf1 mx -all"stores arbitrary text information, often used for domain verification or security policies
SOAStart of Authority Recordexample.com. IN SOA ns1.example.com. admin.example.com. 2023060301 10800 3600 604800 86400specifies administrative information about a DNS zone, including the primary name server, responsible person’s email, and other parameters
SRVService Record_sip._udp.example.com. IN SRV 10 5060 sipserver.example.com.defines the hostname and port number for specific services
PTRPointer Record1.2.0.192.in-addr.arpa. IN PTR example.comused for reverse DNS lookups, mapping an IP address to a hostname

DNS Tools

ToolKey FeaturesUse Cases
digversatile DNS lookup tool that supports various query types and detailed outputmanual DNS queries, zone transfer, troubleshooting DNS issues, and in-depth analysis of DNS records
nslookupsimpler DNS lookup tool, primarily for A, AAAA, and MX recordsbasic DNS queries, quick checks of domain resolution and mail server records
hoststreamlined DNS lookup tool with concise outputquick checks of A, AAAA, and MX records
dnsenumautomated DNS enumeration tool, dictionary attacks, bruteforcing, zone transfersdiscovering subdomains and gathering DNS information efficiently
fierceDNS recon and subdomain enumeration tool with recursive search and wildcard detectionuser-friendly interface for DNS recon, identifying subdomains and potential targets
dnsreconcombines multiple DNS recon techniques and supports various output formatscomprehensive DNS enumeration, identifying subdomains, and gathering DNS records for further analysis
theHarvesterOSINT tool that gathers information from various sources, including DNS recordscollecting email addresses, employee information, and other data associated with a domain from multiple sources

DNS Zones

In the DNS, a zone is a distinct part of the domain namespace that a specific entity or administrator manages. For example, example.com and all its subdomains would typically belong to the same DNS zone.

Primary DNS Server

The primary DNS server is the server of the zone file, which contains all authoritative information for a domain and is responsible for administering this zone. The DNS records of a zone can only be edited on the primary DNS server, which then updates the secondary DNS servers.

Secondary DNS Server

Secondary DNS servers contai read-only copies of the zone file from the primary DNS server. These servers compare their data with the primary DNS server at regular intervals and thus serve as a backup server. It is useful because a primary name server’s failure means that connections without name resolution are no longer possible. To establish connections anyway, the user would have to know the IP addresses of the contacted servers.

DNS Zone File

The zone file, a text file residing on a DNS Server, defines the resource records within this zone, providing crucial information for translating domain names into IP addresses.

Example:

$TTL 3600 ; Default Time-To-Live (1 hour)
@       IN SOA   ns1.example.com. admin.example.com. (
                2024060401 ; Serial number (YYYYMMDDNN)
                3600       ; Refresh interval
                900        ; Retry interval
                604800     ; Expire time
                86400 )    ; Minimum TTL

@       IN NS    ns1.example.com.
@       IN NS    ns2.example.com.
@       IN MX 10 mail.example.com.
www     IN A     192.0.2.1
mail    IN A     198.51.100.1
ftp     IN CNAME www.example.com.

This file defines the authoritative name server (NS records), mail server (MX record), and IP addresses (A records) for various hosts within the example.com domain.

Also, you distinguish between Primary Zone (master zone) and Secondary Zone (slave zone). The secondary zone on the secondary DNS server serves as a substitute for the primary zone on the primary DNS server if the primary DNS server should become unreachable. The creation and transfer of the primary Zone copy from the primary DNS server to the secondary DNS server is called a “zone transfer”.

The update of the zone files can only be done on the primary DNS server, which then updates the secondary DNS server. Each zone file can have only one primary DNS server and an unlimited number of secondary DNS servers.

DNS Zone Transfer

DNS Zone Transfer

  1. Zone Transfer Request (AXFR)
  2. SOA Record Transfer
  3. DNS Records Transmission
  4. Zone Transfer Complete
  5. Acknowledgement (ACK)

In the early days of the internet, allowing any client to request a zone tranfer from a DNS server was common practice. This open approach simplified administration but opened a gaping security hole. It meant that anyone, including malicious actors, could ask a DNS server for a complete copy of its zone file, which contains a wealth of sensitive information.

Awareness of this vulnerability has grown, and most DNS server administrators have mitigated the risk. Modern DNS servers are typically configured to allow zone transfers only to trusted secondary severs, ensuring that sensitive zone data remains confidential.

Dig example:

d41y@htb[/htb]$ dig axfr @nsztm1.digi.ninja zonetransfer.me

; <<>> DiG 9.18.12-1~bpo11+1-Debian <<>> axfr @nsztm1.digi.ninja zonetransfer.me
; (1 server found)
;; global options: +cmd
zonetransfer.me.	7200	IN	SOA	nsztm1.digi.ninja. robin.digi.ninja. 2019100801 172800 900 1209600 3600
zonetransfer.me.	300	IN	HINFO	"Casio fx-700G" "Windows XP"
zonetransfer.me.	301	IN	TXT	"google-site-verification=tyP28J7JAUHA9fw2sHXMgcCC0I6XBmmoVi04VlMewxA"
zonetransfer.me.	7200	IN	MX	0 ASPMX.L.GOOGLE.COM.
...
zonetransfer.me.	7200	IN	A	5.196.105.14
zonetransfer.me.	7200	IN	NS	nsztm1.digi.ninja.
zonetransfer.me.	7200	IN	NS	nsztm2.digi.ninja.
_acme-challenge.zonetransfer.me. 301 IN	TXT	"6Oa05hbUJ9xSsvYy7pApQvwCUSSGgxvrbdizjePEsZI"
_sip._tcp.zonetransfer.me. 14000 IN	SRV	0 0 5060 www.zonetransfer.me.
14.105.196.5.IN-ADDR.ARPA.zonetransfer.me. 7200	IN PTR www.zonetransfer.me.
asfdbauthdns.zonetransfer.me. 7900 IN	AFSDB	1 asfdbbox.zonetransfer.me.
asfdbbox.zonetransfer.me. 7200	IN	A	127.0.0.1
asfdbvolume.zonetransfer.me. 7800 IN	AFSDB	1 asfdbbox.zonetransfer.me.
canberra-office.zonetransfer.me. 7200 IN A	202.14.81.230
...
;; Query time: 10 msec
;; SERVER: 81.4.108.41#53(nsztm1.digi.ninja) (TCP)
;; WHEN: Mon May 27 18:31:35 BST 2024
;; XFR size: 50 records (messages 1, bytes 2085)

DNS Security

Many companies have already recognized DNS’s vuln and try to close this gap with dedicated DNS servers, regular scans, and vulnerability assessment software. However, beyond that fact, more and more companies recognize the value of the DNS as an active line of defense, embedded in an in-depth and comprehensive security concept.

This makes sense because the DNS is part of every network connection. The DNS is uniquely positioned in the network to act as a central control point to decide whether a benign or malicious request is received.

DNS threat intelligence can be integrated with other open-source and other threat intelligence feeds. Analytics systems such as EDR and SIEM can provide a holistic and situation-based picture of the security situation. DNS Security Services support the coordination of incident response by sharing IOCs and IOAs with other security technologies such as firewalls, network proxies, endpoint security, Network Access Control and vulnerability scanners, providing them with information.

DNSSEC

Another feed used for the security of DNS servers is Domain Name System Security Extensions (DNSSEC), designed to ensure the authenticity and integrity of data transmitted through the DNS by securing resource records with digital certificates. DNSSEC ensures that the DNS data has not been manipulated and does not originate from any other source. Private keys are used to sign the resource records digitally. Resource records can be signed several times with different private keys, for example, to replace keys that expire in time.

Private Keys

The DNS server that manages a zone to be secured signs its sent resource records using its only known private key. Each zone has its zone keys, each consisting of a private and a public key. DNSSEC specifies a new resource record type with the RRSIG. It contains the signature of the respective DNS record, and these used keys have a specific validity period and are provided with a start and end date.

Public Key

The public key can be used to verify the signature of the recipients of the data. For the DNSSEC security mechanisms, it must be supported by the provider of the DNS information and the requesting client system. The requesting clients verify the signatures using the generally known public key of the DNS zone. If check is successful, manipulating the response is impossible, and the information comes from the requested source.

Subdomains

Beneath the surface of a primary domain lies a potential network of subdomains. For instance, a company might use example.com as the primary domain, but also blog.example.com for its blog, shop.example.com for its shop, or mail.example.com for its email services.

Active Subdomain Enumeration

… involves directly interacting with the target domain’s DNS servers to uncover subdomains. One method is attempting a DNS zone transfer, where a misconfigured server might inadvertently leak a complete list of subdomains. However, due to tightened security measures, this is rarely successful.

A more common active technique is brute-force enumeration, which involves systematically testing a list potential subdomain names against a target domain. Tools like dnsenum, ffuf, and gobuster can automate this process, using wordlists of common subdomain names or custom-generated lists based on specific patterns.

Passive Subdomain Enumeration

… relies on external sources of information to discover subdomains without directly querying the target’s DNS servers. One valuable resource is Certificate Transparency (CT) logs, public repos of SSL/TLS certificates. These certificates often include a list of associated subdomains in their Subject Alternative Name (SAN) field, providing a treasure trove of potential targets.

Another approach involves utilising search engines like Google or DuckDuckGo. By employing specialised search operators, you can filter results to show only subdomains related to the target domain.

Subdomain Bruteforcing

… is a powerful active subdomain discovery technique that leverages pre-defined lists of potential subdomain names. The process breaks down into four steps:

  1. Wordlist Selection
    • General-Purpose
    • Targeted
    • Custom
  2. Iteration and Querying
  3. DNS Lookup
  4. Filtering and Validation

Some tools to bruteforce subdomains are:

  • dnsenum
  • fierce
  • dnsrecon
  • amass
  • assetfinder
  • puredns

Dnsenum example:

d41y@htb[/htb]$ dnsenum --enum inlanefreight.com -f  /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt 

dnsenum VERSION:1.2.6

-----   inlanefreight.com   -----


Host's addresses:
__________________

inlanefreight.com.                       300      IN    A        134.209.24.248

[...]

Brute forcing with /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt:
_______________________________________________________________________________________

www.inlanefreight.com.                   300      IN    A        134.209.24.248
support.inlanefreight.com.               300      IN    A        134.209.24.248
[...]


done.

Virtual Hosts

At the core of virtual hosting is the ability of web servers to distinguish between multiple websites or applications sharing the same IP address. This is achieved by leveraging HTTP Host header, a piece of information included in every HTTP request sent by a web browser.

Key difference to subdomains:

  • Subdomains
    • are extensions of a main domain name
    • typically have their own DNS records, pointing to either the same IP address as the main or a different one
    • can be used to organise different sections or services of a website
  • Virtual Hosts
    • are configurations within a web server that allow multiple websites or apps to be hosted on a single server
    • can be associated with top-level domains or subdomains
    • can have its own separate configuration, enabling precise control over how requests are handled

VHosts can also be configured to use different domains, not just subdomains:

# Example of name-based virtual host configuration in Apache
<VirtualHost *:80>
    ServerName www.example1.com
    DocumentRoot /var/www/example1
</VirtualHost>

<VirtualHost *:80>
    ServerName www.example2.org
    DocumentRoot /var/www/example2
</VirtualHost>

<VirtualHost *:80>
    ServerName www.another-example.net
    DocumentRoot /var/www/another-example
</VirtualHost>

Server VHost Lookup

VHost workflow

  1. Browser Requests a Website
  2. Host Header Reveals the Domain
  3. Web Server Determines the Virtual Host
  4. Serving the Right Content

Types of Virtual Hosting

  • Name-Based Virtual Hosting
    • relies solely on the HTTP Host header
    • most common and flexible method
    • requires the web server to support name-based virtual hosting
    • can have limitations with certain protocols like SSL/TLS
  • IP-Based Virtual Hosting
    • assigns a unique IP address to each website hosted on the server
    • server determines which website to serve based on the IP addrss to which the request was sent
    • doesn’t rely on the Host header
    • can be used with any protocol
    • offers better isolation between websites
  • Port-Based Virtual Hosting
    • different websites are associated with different ports on the same IP address
    • not as common or user-friendly as name-based virtual hosting
    • might require users to specify the port number in the URL

Virtual Host Discovery Tools

  • gobuster
  • Feroxbuster
  • ffuf

Gobuster example:

d41y@htb[/htb]$ gobuster vhost -u http://inlanefreight.htb:81 -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt --append-domain
===============================================================
Gobuster v3.6
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@firefart)
===============================================================
[+] Url:             http://inlanefreight.htb:81
[+] Method:          GET
[+] Threads:         10
[+] Wordlist:        /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt
[+] User Agent:      gobuster/3.6
[+] Timeout:         10s
[+] Append Domain:   true
===============================================================
Starting gobuster in VHOST enumeration mode
===============================================================
Found: forum.inlanefreight.htb:81 Status: 200 [Size: 100]
[...]
Progress: 114441 / 114442 (100.00%)
===============================================================
Finished
===============================================================

Fingerprinting

… focuses on extracting technical details about the technologies powering a website or web application. The digital signatures of web servers, operating systems, and software components can reveal critical information about a target’s infrastructure and potential security weaknesses.

Fingerprinting serves as a cornerstone of a web recon for several reasons:

  • Targeted Attacks
  • Identifying Misconfigurations
  • Prioritising Targets
  • Building a Comprehensive Profile

Techniques

  • Banner Grabbing
    • involves analysing the banners presented by web servers and other services
    • often reveal the server software, version numbers, and other details
  • Analysing HTTP headers
    • contain a wealth of information
    • typically discloses the web server software, while the X-Powered-By header might reveal additional technologies like scripting languages or frameworks
  • Probing for Specific Responses
    • can elicit unique responses that reveal specific technologies or versions
  • Analysing Page Content
    • can often provide clues about the underlying technologies

Tools

ToolDescriptionFeatures
Wappalyzerbrowser extension and online service for website technology profilingidentifies a wide range of web technologies, including CMSs, frameworks, analytics tools, and more
BuiltWithweb technology profiler that provides detailed reports on a website’s technology stackoffers both free and paid plans with varying levels of detail
WhatWebcommand-line tool for website fingerprintinguses a vast database if signatures to identify various web technologies
Nmapversatile network scanner that can be used for various recon tasks, including service and OS fingerprintingcan be used with scripts (NSE) to perform more specialised fingerprinting
Netcraftoffers a range of web security services, including website fingerprinting and security reportingprovides detailed, reports on a website’s technology, hosting provider, and security posture
wafw00fcommand-line tool specifically designed for identifying Web Application Firewalls (WAFs)helps determine if a WAF is present and, if so, its type and configuration

Banner Grabbing example:

d41y@htb[/htb]$ curl -I inlanefreight.com # could have just used '-L'

HTTP/1.1 301 Moved Permanently
Date: Fri, 31 May 2024 12:07:44 GMT
Server: Apache/2.4.41 (Ubuntu)
Location: https://inlanefreight.com/
Content-Type: text/html; charset=iso-8859-1
d41y@htb[/htb]$ curl -I https://inlanefreight.com

HTTP/1.1 301 Moved Permanently
Date: Fri, 31 May 2024 12:12:12 GMT
Server: Apache/2.4.41 (Ubuntu)
X-Redirect-By: WordPress
Location: https://www.inlanefreight.com/
Content-Type: text/html; charset=UTF-8
d41y@htb[/htb]$ curl -I https://www.inlanefreight.com

HTTP/1.1 200 OK
Date: Fri, 31 May 2024 12:12:26 GMT
Server: Apache/2.4.41 (Ubuntu)
Link: <https://www.inlanefreight.com/index.php/wp-json/>; rel="https://api.w.org/"
Link: <https://www.inlanefreight.com/index.php/wp-json/wp/v2/pages/7>; rel="alternate"; type="application/json"
Link: <https://www.inlanefreight.com/>; rel=shortlink
Content-Type: text/html; charset=UTF-8

WAF example:

d41y@htb[/htb]$ wafw00f inlanefreight.com

                ______
               /      \
              (  W00f! )
               \  ____/
               ,,    __            404 Hack Not Found
           |`-.__   / /                      __     __
           /"  _/  /_/                       \ \   / /
          *===*    /                          \ \_/ /  405 Not Allowed
         /     )__//                           \   /
    /|  /     /---`                        403 Forbidden
    \\/`   \ |                                 / _ \
    `\    /_\\_              502 Bad Gateway  / / \ \  500 Internal Error
      `_____``-`                             /_/   \_\

                        ~ WAFW00F : v2.2.0 ~
        The Web Application Firewall Fingerprinting Toolkit
    
[*] Checking https://inlanefreight.com
[+] The site https://inlanefreight.com is behind Wordfence (Defiant) WAF.
[~] Number of requests: 2

Nikto example:

d41y@htb[/htb]$ nikto -h inlanefreight.com -Tuning b

- Nikto v2.5.0
---------------------------------------------------------------------------
+ Multiple IPs found: 134.209.24.248, 2a03:b0c0:1:e0::32c:b001
+ Target IP:          134.209.24.248
+ Target Hostname:    www.inlanefreight.com
+ Target Port:        443
---------------------------------------------------------------------------
+ SSL Info:        Subject:  /CN=inlanefreight.com
                   Altnames: inlanefreight.com, www.inlanefreight.com
                   Ciphers:  TLS_AES_256_GCM_SHA384
                   Issuer:   /C=US/O=Let's Encrypt/CN=R3
+ Start Time:         2024-05-31 13:35:54 (GMT0)
---------------------------------------------------------------------------
+ Server: Apache/2.4.41 (Ubuntu)
+ /: Link header found with value: ARRAY(0x558e78790248). See: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Link
+ /: The site uses TLS and the Strict-Transport-Security HTTP header is not defined. See: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Strict-Transport-Security
+ /: The X-Content-Type-Options header is not set. This could allow the user agent to render the content of the site in a different fashion to the MIME type. See: https://www.netsparker.com/web-vulnerability-scanner/vulnerabilities/missing-content-type-header/
+ /index.php?: Uncommon header 'x-redirect-by' found, with contents: WordPress.
+ No CGI Directories found (use '-C all' to force check all possible dirs)
+ /: The Content-Encoding header is set to "deflate" which may mean that the server is vulnerable to the BREACH attack. See: http://breachattack.com/
+ Apache/2.4.41 appears to be outdated (current is at least 2.4.59). Apache 2.2.34 is the EOL for the 2.x branch.
+ /: Web Server returns a valid response with junk HTTP methods which may cause false positives.
+ /license.txt: License file found may identify site software.
+ /: A Wordpress installation was found.
+ /wp-login.php?action=register: Cookie wordpress_test_cookie created without the httponly flag. See: https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies
+ /wp-login.php:X-Frame-Options header is deprecated and has been replaced with the Content-Security-Policy HTTP header with the frame-ancestors directive instead. See: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Frame-Options
+ /wp-login.php: Wordpress login found.
+ 1316 requests: 0 error(s) and 12 item(s) reported on remote host
+ End Time:           2024-05-31 13:47:27 (GMT0) (693 seconds)
---------------------------------------------------------------------------
+ 1 host(s) tested

Crawling

… often called spidering, is the automated process of systematically browsing the World Wide Web. It follows links from one page to another, collecting information.

Example:

  1. Homepage
    ├── link1
    ├── link2
    └── link3

  2. link1 Page
    ├── Homepage
    ├── link2
    ├── link4
    └── link5

  3. and so on …

Breadth-first-crawling

… prioritizes exploring a website’s width before going deep. It starts by crawling all the links on the seed page, then moves on those pages, and so on. This is useful for getting a broad overview of a website’s structure and content.

Depth-first-crawling

… prioritizes depth over breadth. It follows a single path of links as far as possible before backtracking and exploring other paths. This can be useful for finding specific content or reaching deep into a website’s structure.

Extractig Valuable Information

  • Links (_Internal and External)
    • fundamental building blocks of the web, connecting pages within a website and to other websites
  • Comments
    • comment sections on blogs, forums, or other interactive pages can be a goldmine of information
  • Metadata
    • refers to data about data
    • in the context of web pages, it includes information like page titles, descriptions, keywords, author names, and dates
  • Sensitive Files
    • web crawlers can be configured to actively search for sensitive files that might be inadvertently exposed on a website
  • Burp Suite Spider
  • OWASP ZAP
  • Scrapy
  • Apache Nutch
  • ReconSpider

robots.txt

… is a simple text file placed in the root directory of a website. It adheres to the Robots Exclusion Standard, guidelines for how web crawlers should behave when visiting a website. This file contains instructions in the form of “directives” that tell bots which parts of the website they can and cannot crawl.

Structure

The robots.txt follows a straightforward structure, with each set of instruction, or “record”, separated by a blank line. Each record consists of two main components:

  1. User-Agent
    • specifies which crawler or bot the following rules apply to
    • a “*” indicates that the rules apply to all bots
  2. Directives
    • these lines provide specific instructions to the identified user-agent

Common directives:

DirectiveExampleDescription
DisallowDisallow: /admin/specifies paths or patterns that the bot should not crawl
AllowAllow: /public/explicitly permits the bot to crawl specific paths or patterns, even if they fall under a broader Disallow rule
Crawl-delayCrawl-delay: 10sets a delay between successive requests from the bot to avoid overloading the server
SitemapSitemap: https://www.example.com/sitemap.xmlprovides the URL to an XML sitemap for more efficient crawling

Full robots.txt example:

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/

User-agent: Googlebot
Crawl-delay: 10

Sitemap: https://www.example.com/sitemap.xml

Well-Known URIs

The .well-known standard, defined in RFC 8615, serves as a standardized directory within a website’s root domain. This designated location, typically accessible via the /.well-known/ path on a web server, centralizes a website’s critical configuration files and information related to its services, protocols. and security mechanisms.

The Internet Assigned Numbers Authority (IANA) maintains a registry of .well-known URIs, each serving a specific purpose defined by various specifications and standards. Some examples:

URI SuffixDescription
security.txtcontains contact information for security researchers to report vulnerability
/.well-known/change-passwordprovides a standard URL for directing users to a password change page
openid-configurationdefines configuration details for OpenID Connect, an identity layer on top of the OAuth 2.0 protocol
assetlinks.jsonused for verifying ownership of digital assets associated with a domain
mta-sts.txtspecifies the policy for SMTP MTA Strict Transport Security to enhace email security

Search Engines

… serve you as your guides in the vast landscape of the internet, helping you to navigate through the seemingly endless expanse of information. However, beyond their primary function of answering everyday queries, search engines also hold a treasure trove of data that can be invaluable for web recon and information gathering.

Search Operators

… are like search engines’ secret codes. These special commands and modifiers unlock a new level of precision and control allowing you to pinpoint specific types of information amidst the vastness of the indexed web.

Here are some of them.

OffSec maintains the Exploit Database which has lots of different approaches to a various amount of google dorks.

Web Archives

With the Internet Archive’s Wayback Machine, you have a unique oppurtunity to revisit the past and explore the digital footprints of websites as they once were.

It can help with:

  • uncovering hidden assets and vulns
  • tracking changes and identifying patterns
  • gathering intel
  • stealthy recon

Automating Recon

… can significantly enhance efficiency and accuracy, allowing you to gather information at scale and identify potential vulns more rapidly.

Recon Frameworks

… aim to provide a complete suite of tools for web recon. Some are:

  • FinalRecon
  • Recon-ng
  • theHarvester
  • SpiderFoot
  • OSINT Framework

FinalRecon example:

d41y@htb[/htb]$ ./finalrecon.py --headers --whois --url http://inlanefreight.com

 ______  __   __   __   ______   __
/\  ___\/\ \ /\ "-.\ \ /\  __ \ /\ \
\ \  __\\ \ \\ \ \-.  \\ \  __ \\ \ \____
 \ \_\   \ \_\\ \_\\"\_\\ \_\ \_\\ \_____\
  \/_/    \/_/ \/_/ \/_/ \/_/\/_/ \/_____/
 ______   ______   ______   ______   __   __
/\  == \ /\  ___\ /\  ___\ /\  __ \ /\ "-.\ \
\ \  __< \ \  __\ \ \ \____\ \ \/\ \\ \ \-.  \
 \ \_\ \_\\ \_____\\ \_____\\ \_____\\ \_\\"\_\
  \/_/ /_/ \/_____/ \/_____/ \/_____/ \/_/ \/_/

[>] Created By   : thewhiteh4t
 |---> Twitter   : https://twitter.com/thewhiteh4t
 |---> Community : https://twc1rcle.com/
[>] Version      : 1.1.6

[+] Target : http://inlanefreight.com

[+] IP Address : 134.209.24.248

[!] Headers :

Date : Tue, 11 Jun 2024 10:08:00 GMT
Server : Apache/2.4.41 (Ubuntu)
Link : <https://www.inlanefreight.com/index.php/wp-json/>; rel="https://api.w.org/", <https://www.inlanefreight.com/index.php/wp-json/wp/v2/pages/7>; rel="alternate"; type="application/json", <https://www.inlanefreight.com/>; rel=shortlink
Vary : Accept-Encoding
Content-Encoding : gzip
Content-Length : 5483
Keep-Alive : timeout=5, max=100
Connection : Keep-Alive
Content-Type : text/html; charset=UTF-8

[!] Whois Lookup : 

   Domain Name: INLANEFREIGHT.COM
   Registry Domain ID: 2420436757_DOMAIN_COM-VRSN
   Registrar WHOIS Server: whois.registrar.amazon.com
   Registrar URL: http://registrar.amazon.com
   Updated Date: 2023-07-03T01:11:15Z
   Creation Date: 2019-08-05T22:43:09Z
   Registry Expiry Date: 2024-08-05T22:43:09Z
   Registrar: Amazon Registrar, Inc.
   Registrar IANA ID: 468
   Registrar Abuse Contact Email: abuse@amazonaws.com
   Registrar Abuse Contact Phone: +1.2024422253
   Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited
   Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
   Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
   Name Server: NS-1303.AWSDNS-34.ORG
   Name Server: NS-1580.AWSDNS-05.CO.UK
   Name Server: NS-161.AWSDNS-20.COM
   Name Server: NS-671.AWSDNS-19.NET
   DNSSEC: unsigned
   URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/


[+] Completed in 0:00:00.257780

[+] Exported : /home/htb-ac-643601/.local/share/finalrecon/dumps/fr_inlanefreight.com_11-06-2024_11:07:59

Certificate Transparency Logs

… are public, append-only ledgers that record the issuance of SSL/TLS certificates. Whenever a Certificate Authority (CA) issues a new certificate, it must submit it to multiple CT logs. Independent organisations maintain these logs and are open for anyone to inspect.

You can think of CT logs as a global registry of certificates. They provide a transparent and verifiable record of every SSL/TLS certificate issued for a website. This transparency serves several crucial purposes:

  • Early Detection of Rogue Certificates
  • Accountability for Certificate Authorities
  • Strengthening the Web PKI

CT Logs and Web Recon

CT logs offer a unique advantage in subdomain enumeration compared to other methods. Unlike brute-forcing or wordlist-based approaches, which rely on guessing or predicting subdomain names, CT logs provide a definitive record of certificates issued for a domain and its subdomains. This means you’re not limited by the scope of your wordlist or the effectiveness of your brute-forcing algorithm. Instead, you gain access to a historical and comprehensive view of a domain’s subdomains, including those that might not be actively used or easily guessable.

Furthermore, CT logs can unveil subdomains associated with old or expired certificates. These subdomains might host outdated software or configurations, making them potentially vulnerable to exploitation.

In essence, CT logs provide a reliable and efficient way to discover subdomains without the need for exhaustive brute-forcing or relying on the completeness of wordlists. They offer a unique window into a domain’s history and can reveal subdomains that might otherwise remain hidden, significantly enhancing your recon capabilities.

Two popular options for searching CT logs:

Crt.sh lookup example:

d41y@htb[/htb]$ curl -s "https://crt.sh/?q=facebook.com&output=json" | jq -r '.[]
 | select(.name_value | contains("dev")) | .name_value' | sort -u
 
*.dev.facebook.com
*.newdev.facebook.com
*.secure.dev.facebook.com
dev.facebook.com
devvm1958.ftw3.facebook.com
facebook-amex-dev.facebook.com
facebook-amex-sign-enc-dev.facebook.com
newdev.facebook.com
secure.dev.facebook.com