Two recent research papers shed some light in the cave in terms of mining Web content. Imagine putting your hand in the Yangtze River and trying to catch a sturgeon minnow between two and three inches long. This is akin to conducting a simple keyword search and then singularly perusing each result to discern relevancy (one’s mind conducting semantic correlations to net down relevant results). The challenge is to derive a tool that drives the “semantic sifting” process higher up in the process, thereby making it more efficient to find relevant results.
Jean-Pierre Norguet, et al, discuss semantic analysis of website usage and how to apply this analysis to on-going website development. Nortguet’s approach combined web server log files, site content records, content calls by browsers, and TCP/IP packets. The Norguet team then ran these through an ontology-based OLAP tool. What it derived was a visual representation of interest values pertaining to certain categories of content. This visual representation demonstrated that despite a category’s breadth of presence across a website, interest value indicators provide valuable insight into consumer use patterns. Nouguet argues that visually displaying interest values allows for intuitive decision-making, which aligns more accurately with mapping and responding to consumer interests.
Michelle L. Gregory et al, explored a framework that allows users to map blog entries, query results sets, understand themes, and see how blog content changes over time. Gregory modified a tool called IN-SPIRE–which uses semantic indexing, among other things, to categorize result sets–to analyze 7,000 blog entries chosen at random. In addition to the powerful filtering and querying aspect explored, Gregory demonstrated how one can use this tool to build multi-lingual analyses using one’s native language. The team also delved into the realm of affect analysis. What they showed was powerful visual representation of positive versus negative feelings about a particular blog topic (taking the pulse of a slice of the blogosphere on a particular topic).
Some immediate applications of these types of analyses–in one’s native languge or across a multi-lingual website–are in improving web product development, mapping political sentiments, or sentiments pertaining to one’s own or a competitor’s product.