Text Mining Research: A Survey

ABSTRACT

Text mining, also known as knowledge discovery from text, and document information mining, refers to the process of extracting interesting patterns from very large text corpus for the purposes of discovering knowledge. Text mining is an interdisciplinary field involving information retrieval, text understanding, information extraction, clustering, categorization, visualization, database technology, machine learning, and data mining. Regarded by many as the next wave of knowledge discovery, text mining has a very high commercial value. This paper presents a general framework for text mining, consisting of two stages: text refining that transforms unstructured text documents into an intermediate form; and knowledge distillation that deduces patterns or knowledge from the intermediate form. I then give the explanations of two of the text refining methods which are information retrieval and information extraction. Then, I survey different documents representation methods and algorithms, give the comparison among these representation and algorithms, and also some of their advantages and limitations.

Claude Shannon, painted portrait - la théorie de ...

NSA Spy Center

I then survey the state-of-the-art text mining approaches, products, and applications by aligning them based on the text refining and knowledge distillation functions as well as the intermediate form that they adopt. At the last part, I highlight the upcoming challenges of text mining and the opportunities it offers and give a short conclusion.

1. INTRODUCTION

Text mining, also known as text data mining [25] or knowledge discovery from textual databases [19], is an emerging technology for analyzing large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns or knowledge. It can be envisaged as a leap from data mining or knowledge discovery from (structured) databases [17; 58].

As the most natural form of storing and exchanging information is written words, text mining has a very high commercial potential. In fact, a recent study indicated that 80% of a company's information was...

Text Mining Research: A Survey

Citation styles:

More Artificial Intelligence essays:

Learning from Data

Can Computers Understand? Includes John Searle's arguments

Mind and Machine, an essay on A.I

Artificial Intellegence, Identification And Description of the issue

The impact of AI on Warfare

Students & Profs. say about us:

We've been mentioned in:

Where our 375,000 members study:

Popular essays:

Data Input/ Output Methods, Storage Devices and Computer Speed

Human versus human>>Artificial intelligence..

Will robots rule the world one day?

Computers and Artificial Intelligence

... Essay Arguing Against the Theories Presented in "The Turing Test" of Artificial Intelligence.

WriteWork

Short-cuts

Research

WRITING GUIDES