Dan Schlegel

436F646521

Artificial Intelligence is a subfield of computer science devoted to the construction of extremely complex programs that do not work.

Prospective Sources

From preliminary library research, these sources look as if they will be helpful in the project at hand.

"From the Stacks"

P: Explorations in Cognitive Neuorscience: Understanding the Mind by Simulating the Brain (O'Reilly and Munakata)
This book focuses on modering mental processes using artificial neural networks. Much of the work is original by O'Reilly and Munakata and is very interesting. It does not cover specifically SOMs, but being an ANN they do share some charactoristics.

P: A Self-organizing Semantic Map for Information Retrieval (Lin, Soergel, Marchlonini)
This paper uses a Kohonen Feature Map to organize like titles of AI literature. This is a good introduction to implimentation, but is very simple in that it doesnt need to abstract the data at all before it is given to the map for organization.

P: Introduction to Multidimensional Scaling : Theory, Methods, and Applications (Susan S. Schiffman, M. Lance Reynolds, Forrest W. Young)
Multidimensional scaling is a concept which allows us to organize high dimensional data spacially as a map. This also includes the fact that the data may be of unknown dimensionality. The book also takes a look at computational solutions to the problem, but since they are circa 1980, these may not be horribly helpful.

S: Self-Organizing Maps (Tuevo Kohonen)
This book talks about the mathematical and biological basis for SOMs, and discusses the algorithm in detail. I am currently waiting for this book to arrive via ILLiad.

S: Modern Information Retrieval (Baeza-Yates and Ribeiro-Neto)
This book contains a ton of data on structured data collections, data mining, metadata, etc. It covers the standard methods for preprocessing, storing data, searching, etc.

S: Web Mining : Applications and Techniques (Anthony Scime, ed)
This book contains a very good definition of what exactly metadata is, and how it can be generated. It also contains some techniques used for categorizing data by topic and other generalizations based on metadata. Since web data as with file data is of extremely high dimensionality this will help me determine how to initially reduce this dimensionality.

Sources from the Web

P: WEBSOM - Self-organizing maps of document collections (Kaski, Honkela, Lagus, and Kohonen) WEBSOM strives to organize large collections of documents using self organizing maps. This paper gives an excellent overview of different strategies for creating a self organizing map from a document. This will help in my research immensly as these concepts are easily transferrable from text documents to metadata from documents.

P: Internet Categorization and Search: A Self-Organizing Approach (Chen, Schuffels, and Orwig) This article discusses a web server that has been created that automatically self organizes the content on it. They had similar problems to what I will be facing, namely reducing extremely high dimensional data to a state in which it can be organized efficiently.

P: Self Organization of a Massive Document Collection (Kohonen, Kaski, Lagus, Salojarvi, Honkela, Paatero, and Saarela) This paper, co-written by Tuevo Kohonen who developed SOMs, has an interesting approach to organizing large document collections. Like other papers this one discusses preprocessing of data, then iterativly creating larger SOMs from that data. It also discusses a browsable user interface which I would like to impliment myself sometime once my project reaches maturity.

P: Neural Information Retrieval: An Experimental Study of Clustering and Browsing of Document Collections with Neural Networks (Jakub Zavrel)
This paper talks about unsupervised neural networks and specifically SOMs in data catagorization. This paper is interesting because it quotes a bunch of works that I have already read, and has a similar format to what I intend to write. It serves more as a large literature review of the state of the art than new research though.

P: The SOMLib Digital Library System (Andreas Rauber and Dieter Merkl)
SOMLib is a digital library system that goes one step beyond organization. It allows for keyword generation based on topological location in a map, giving the user a neat browsable system. It also discusses some techniques of problem size reduction by using small SOMs for different sets of data, then self organizing the results of those.

S: Data Exploration Using Self Organizing Maps (Samuel Kaski)
This is Samuel Kaski's PhD thesis on Self Organizing Maps. It is a really interesting paper - discussing in detail the actual methods someone has to perform to design and create a self organizing map. It discusses preprocessing, computation of the maps, how to choose a good map, and how to interpret and analyze the results. The paper also contains some case studies, including one on full text search.

S: The self-organizing map (Teuvo Kohonen)
This paper presents the generalized mathematical model on which the SOM is based. It also presents some interesting ideas about imporoving results ("blurring" when finding the BMU, the gaussian function usually used for finding the neighbors), as well as some speed improvements (aka a Batch Map as used in the Matlab SOM toolkit).

S: An Ensemble of SOM Networks for Document Organization and Retrieval (Apostolos Georgakis, Haibo Li, and Mihaela Gordan)
This paper is a neat comparison of SOM techniques, including one called a Bagging SOM. This technique uses many different techniques to classify data, into what they call Bags. Using these different "predictors" as they call them, they can develop better results.

S: Web Content Management by Self-Organizaton
This paper takes a different approach to document organization. It intends to generate topic headers which large numbers of documents fall in to (a la Yahoo's directory system). The authors call this a Topological Organization of Content system, and it is an interesting approach for a web based SOM system.

S: Web document clustering - a feasability study
This document discusses first clustering methods and shows the feasibility of them. It does not discuss SOMs per se, but the preprocessing needed is largely the same.


Also see the bibliography here

And the current notes here

Outline: here

Demos: 1, 2

Presentation: here

Copyright © Dan Schlegel 2007