WEB-SPIDERS TASK DESCRIPTION 4
WHAT COULD GO WRONG? 7
USED MATERIALS: 10
Hyperlinks allow for the navigation in the internet. Web pages can be linked with other pages, one important question is with which other pages a page is linked. The resulting link structure can be visualized using a directed graph. The nodes of the graph denote the individual web page whereas arrows connecting the nodes represent hyperlinks referring from one page to another.
In order to develop such a application it's needed to go through all Html-code which given link consists and look for tags with argument and cut each of them out. This step is most important, next steps are just for manipulating with those url-s. As we need not just one page url-s but all url-s which are connected to given page we have to go them through once again. And finally merge together gathered links and application which makes graph from them. This is very rough explanation what should be done order to make web-spider.
Next chapters are devoted for step by step instructions how to make web-spider and what kind of problems may appear doing it.
Web-Spiders task description
Develop a web spider, that is, an application that takes the URL of a web page as input and extracts all hyperlinks of this page. For each web page that can be reached by these hyperlinks, again all hyperlinks are to be extracted. The result should be visualized as a directed graph. For the visualization you can use the Graphviz tool which takes a graph description in the dot format as an input and draws the graph as a SVG image. The following picture shows the HTML code of the site www.mail.de and the...