Web mining
Encyclopedia

Web mining - is the application of data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

 techniques to discover patterns from the Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

.
According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining.

Web usage mining

Web usage mining is the process of extracting useful information from server logs i.e users history.
Web usage mining is the process of finding out what users are looking for on the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

. Some users might be looking at only textual data, whereas some others might be interested in multimedia data.

Web structure mining

Web structure mining is the process of using graph theory to analyze the node and connection structure of a web site. According to the type of web structural data, web structure mining can be divided into two kinds:

1. Extracting patterns from hyperlinks in the web: a hyperlink
Hyperlink
In computing, a hyperlink is a reference to data that the reader can directly follow, or that is followed automatically. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text with hyperlinks...

 is a structural component that connects the web page to a different location.

2. Mining the document structure: analysis of the tree-like structure of page structures to describe HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 or XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 tag usage.

Pros

Web mining essentially has many advantages which makes this technology attractive to corporations including the government agencies. This technology has enabled ecommerce to do personalized marketing, which eventually results in higher trade volumes. The government agencies are using this technology to classify threats and fight against terrorism. The predicting capability of the mining application can benefits the society by identifying criminal activities. The companies can establish better customer relationship by giving them exactly what they need. Companies can understand the needs of the customer better and they can react to customer needs faster.
The companies can find, attract and retain customers; they can save on production costs by utilizing the acquired insight of customer requirements. They can increase profitability by target pricing based on the profiles created. They can even find the customer who might default to a competitor the company will try to retain the customer by providing promotional offers to the specific customer, thus reducing the risk of losing a customer or customers.

Cons

Web mining, itself, doesn’t create issues, but this technology when used on data of personal nature might cause concerns. The most criticized ethical issue involving web mining is the invasion of privacy. Privacy is considered lost when information concerning an individual is obtained, used, or disseminated, especially if this occurs without their knowledge or consent. The obtained data will be analyzed, and clustered to form profiles; the data will be made anonymous before clustering so that there are no personal profiles. Thus these applications de-individualize the users by judging them by their mouse clicks. De-individualization, can be defined as a tendency of judging and treating people on the basis of group characteristics instead of on their own individual characteristics and merits.
Another important concern is that the companies collecting the data for a specific purpose might use the data for a totally different purpose, and this essentially violates the user’s interests.
The growing trend of selling personal data as a commodity encourages website owners to trade personal data obtained from their site. This trend has increased the amount of data being captured and traded increasing the likeliness of one’s privacy being invaded. The companies which buy the data are obliged make it anonymous and these companies are considered authors of any specific release of mining patterns. They are legally responsible for the contents of the release; any inaccuracies in the release will result in serious lawsuits, but there is no law preventing them from trading the data.

Some mining algorithms might use controversial attributes like sex, race, religion, or sexual orientation to categorize individuals. These practices might be against the anti-discrimination legislation. The applications make it hard to identify the use of such controversial attributes, and there is no strong rule against the usage of such algorithms with such attributes. This process could result in denial of service or a privilege to an individual based on his race, religion or sexual orientation, right now this situation can be avoided by the high ethical standards maintained by the data mining company.
The collected data is being made anonymous so that, the obtained data and the obtained patterns cannot be traced back to an individual. It might look as if this poses no threat to one’s privacy, actually many extra information can be inferred by the application by combining two separate unscrupulous data from the user.

External links


Books

  • Jesus Mena, "Data Mining Your Website", Digital Press, 1999
  • Soumen Chakrabarti, "Mining the Web: Analysis of Hypertext and Semi Structured Data", Morgan Kaufmann, 2002
  • Bing Liu, "Web Data Mining: Exploring Hyperlinks, Contents and Usage Data", Springer, 2007
  • Advances in Web Mining and Web Usage Analysis 2005 - revised papers from 7 th workshop on Knowledge Discovery on the Web, Olfa Nasraoui, Osmar Zaiane, Myra Spiliopoulou, Bamshad Mobasher, Philip Yu, Brij Masand, Eds., Springer Lecture Notes in Artificial Intelligence, LNAI 4198, 2006
  • Web Mining and Web Usage Analysis 2004 - revised papers from 6 th workshop on Knowledge Discovery on the Web, Bamshad Mobasher, Olfa Nasraoui, Bing Liu, Brij Masand, Eds., Springer Lecture Notes in Artificial Intelligence, 2006
  • Mike Thelwall, "Link Analysis: An Information Science Approach", 2004, Academic Press

Bibliographic references

  • Baraglia, R. Silvestri, F. (2007) "Dynamic personalization of web sites without user intervention", In Communication of the ACM 50(2): 63-67
  • Cooley, R. Mobasher, B. and Srivastave, J. (1997) “Web Mining: Information and Pattern Discovery on the World Wide Web” In Proceedings of the 9th IEEE International Conference on Tool with Artificial Intelligence
  • Cooley, R., Mobasher, B. and Srivastava, J. “Data Preparation for Mining World Wide Web Browsing Patterns”, Journal of Knowledge and Information System, Vol.1, Issue. 1, pp. 5–32, 1999
  • Kohavi, R., Mason, L. and Zheng, Z. (2004) “Lessons and Challenges from Mining Retail E-commerce Data” Machine Learning, Vol 57, pp. 83–113
  • Lillian Clark, I-Hsien Ting, Chris Kimble, Peter Wright, Daniel Kudenko (2006)"Combining ethnographic and clickstream data to identify user Web browsing strategies" Journal of Information Research, Vol. 11 No. 2, January 2006
  • Eirinaki, M., Vazirgiannis, M. (2003) "Web Mining for Web Personalization", ACM Transactions on Internet Technology, Vol.3, No.1, February 2003
  • Mobasher, B., Cooley, R. and Srivastava, J. (2000) “Automatic Personalization based on web usage Mining” Communications of the ACM, Vol. 43, No.8, pp. 142–151
  • Mobasher, B., Dai, H., Kuo, T. and Nakagawa, M. (2001) “Effective Personalization Based on Association Rule Discover from Web Usage Data” In Proceedings of WIDM 2001, Atlanta, GA, USA, pp. 9–15
  • Nasraoui O., Petenes C., "Combining Web Usage Mining and Fuzzy Inference for Website Personalization", in Proc. of WebKDD 2003 – KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications, Washington DC, August 2003, p. 37
  • Nasraoui O., Frigui H., Joshi A., and Krishnapuram R., “Mining Web Access Logs Using Relational Competitive Fuzzy Clustering”, Proceedings of the Eighth International Fuzzy Systems Association Congress, Hsinchu, Taiwan, August 1999
  • Nasraoui O., “World Wide Web Personalization,” Invited chapter in “Encyclopedia of Data Mining and Data Warehousing”, J. Wang, Ed, Idea Group, 2005
  • Pierrakos, D., Paliouras, G., Papatheodorou, C., Spyropoulos C. D. (2003) “Web usage mining as a tool for personalization: a survey”, User modelling and user adapted interaction journal, Vol.13, Issue 4, pp. 311–372
  • I-Hsien Ting, Chris Kimble, Daniel Kudenko (2005)"A Pattern Restore Method for Restoring Missing Patterns in Server Side Clickstream Data"
  • I-Hsien Ting, Chris Kimble, Daniel Kudenko (2006)"UBB Mining: Finding Unexpected Browsing Behaviour in Clickstream Data to improve a Web Site’s Design"
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK