How to find all broken links of the AUT in Selenium with Java?

In the last tutorial, we discussed running test automation with a headless browser. Whether it is physically installed browser in the system or headless browser, WebDriver will fetch all the links and sometimes tests got failed if the links which it is trying to fetch showing an error. For example, if the page does not exist then the browser will show 404 error, similarly, there are many errors come. Errors are categorized in the series of code, let’s say 1XX, 2XX, 3XX, 4XX, and 5XX, Here X is the integer. If the test fails due to the page not reachable then we say that the page has broken links. So, in today’s agenda, we will discuss some techniques to find all the broken links of the application under test. It will help us to design our test execution sheet based on the availability of the testable application.

What are broken links?

Broken links are the links which are not active or down or not reachable at the time of fetching it. Hence, the page associated with that link displays the error code. Such error code is associated with the specific HTTP status code, which has some meanings of categorizing broken link of the page.

For example, suppose you tried to fetch any link and you get 502 error code then it simply means that Bad Gateway Error.

How to check all the broken links of any web page?

First thing, we do it manually. We need to open each and every link and then we come to the conclusion that whether a particular link is active or broken. The manual activity becomes tedious, cumbersome and lazy if we need to check a bulk set of links.

We replace tedious manual activities with the automated one. Hence, we take help of Java HTTP Component to send and receive the response of the links and ultimately, we come to the conclusion whether the links are broken or not in a fraction of minutes. Isn’t it the smart move?

List of HTTP Status code

Followings are some of the lists of HTTP status code which inculcates you the category of the errors you get after identifying the broken links.

  • 101 – Switching protocols
  • 200 – OK
  • 201 – Created
  • 204 – No Content
  • 301 – URL moved permanently
  • 302 – URL found
  • 307 – Temporary redirection of the URL
  • 400 – Bad Request
  • 401 – Unauthorised
  • 403 – Forbidden
  • 404 – Not found
  • 408 – Response Timeout
  • 414 – Requested URI is too long
  • 500 – Internal Server Error
  • 502 – Bad Gateway
  • 503 – Service Unavailable
  • 504 – Gateway Timeout

These are not all the HTTP status codes, but it might be helpful at the extent of Automation testing.

Broken links Inviul

How to find broken links of the web page in Selenium?

Now we will see the implementation part of the identification of broken links of the application under test.

Here the question comes how do we do it?

We first open the web page then we collect all the URLs associated with the application under test. Once we get the associated URL then we sort it by ignoring empty, null and URLs which lead us to the 3rd party website. Hence by doing this, we got all the direct URLs associated with the test URL. Now we send HTTP request to each URL and we record the response code. Based on the received response code, we can identify the broken links.

Following sample, codes implement the above discussion. Let’s have a look.

package Test;

import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

import com.itextpdf.io.IOException;


public class FindBrokenLinksDemo {

	public static void main(String[] args) throws InterruptedException {
		
		String testUrl = "http://www.theavinashmishra.com/";
		
		HttpURLConnection urlCon = null;
		
		int statusCode = 200; //It means in general URL is OK
		
		System.setProperty("webdriver.chrome.driver", "C:\\Selenium\\chromedriver.exe");
	  	 
		WebDriver driver  = new ChromeDriver();
		  
	  	driver.get(testUrl);
		  
	    driver.manage().window().maximize();
	    
	    Thread.sleep(2000);
		
	  	List<WebElement> linkList = driver.findElements(By.tagName("a"));
	  	List<String> validURL = new ArrayList<String>();
	  	List<String> brokenURL = new ArrayList<String>();
	  	
	  	for(WebElement e : linkList){
	  		String urls = e.getAttribute("href");
	  		
	  		
	  		if(!(urls==null) && !urls.isEmpty()){
	  			
	  			//Look for the same domain URL only
	  			
	  			if(urls.startsWith(testUrl)){
	  				
	  				try{
	  					URL url = new URL(urls);
	  					urlCon = (HttpURLConnection)(url.openConnection());
	  					urlCon.setRequestMethod("HEAD");
	  					urlCon.connect();
	  					statusCode = urlCon.getResponseCode();
	  					if(statusCode>=400){
	  						System.out.println("This is client side error, hence, page is broken "+
	  					                         "\n Broke URL is- "+url+
	  					                         " \n Its status code is- "+statusCode);
	  						brokenURL.add(urls);
	  					}else if(statusCode>=500){
	  						System.out.println("This is server side error, hence, page is broken "+
				                         "\n Broke URL is- "+url+
				                         " \n Its status code is- "+statusCode);
	  						brokenURL.add(urls);
	  					}else {
	  						System.out.println("This is valid URL- "+urls);
	  						validURL.add(urls);
	  						
	  					}
	  				}catch(MalformedURLException e0){
	  					e0.printStackTrace();
	  				}catch(IOException e1){
	  					e1.printStackTrace();
	  				}catch(Exception e3){
	  					e3.printStackTrace();
	  				}
	  			}	
	  		}
	  		
	  	}
	  	
	  	driver.close();
	  	driver.quit();

	}

}

Here is the console output.

Broken Links list

That’s all about finding all the broken links of the application under test with the Selenium WebDriver. If you have any queries, then don’t miss to write in the comment section below. Please join our Facebook group for quick updates on Test automation.

Join Inviul fb group

Leave a Reply