Abstract: For Web content analysis, the generally tools used are based on the hit count. In web content analyser, We are going to provide an agent for the website, that will perform text analysis. The result of this analysis will be the word count i.e. total no of words in the given text and word density i.e. total no of specific word in the text.
For example, the college website contains the information of the each department. This department offers different courses during the academic year. These courses are uploaded on the website of the college with their corresponding co-ordinator. This co-ordinator is hyperlinked so that when any interested student will visit the web site, he can navigate to profile of the co-ordinator using the hyperlink. This hyperlink will display the profile of that professor. And if there is no link available, it will generate a report.The information about the courses and the co-ordinator will be stored in the look up table and these tables will be mapped to each other.
Currently the feedback system for the website is not efficient enough. Through the Web snapper, we are introducing a feedback system, which will help the user or visitor to provide the multiple type of feedback to the administrator of the web site. What we will do is we provide a feedback form to the visitor containing the details such as name. What type of feedback he wants to give as messages, suggestion, problems or any GUI related issues etc.
It will be in very specific manner. And at the end of the form, it will have facility for attaching the screenshots specifying the problems or difficulty that visitor has came across. This form will be stored in the database of the website. The administrator is provided with login id and password. The administrator can login and check the various types of feedback that he received for his website. We provide the facility to categorise the feedback on the basis of the type of the feedback. It will help the administrator to reply the respective feedback.
The main scope of this project is that we will perform the analysis of the text, based on the content of the web page. In this we are adding features such as ‘Duplication Detection’ in which it will detect the duplicated content. It will also detect the ‘Broken Links’.