CALD Software Repository

 

CALD has gathered together a wide variety of commercial and publically available software that contains algorithms for learning classifications and for clustering. The commercial software (Clementine, Darwin, IBM Intelligent Miner, Model 1, SAS Enterprise Miner, and SGI Mineset) has been donated by Software Affiliates of CALD. (Darwin and SGI Mineset have not yet arrived, but will be available soon.) These programs may be used by anyone in the CALD community for educational or research purposes; they should not be used for any commercial purpose (although they may be used for blitz projects with CALD Corporate Partners.) The software that has been gathered and the basic functions and features of the software are listed in the tables appearing below, which are followed by brief descriptions, which may be reached by clicking on the name of a program. The features that are present are marked with an X in the table, and some of the entries in the table also contain letters referring to footnotes below the table which contain further information. The table also indicates whether the software runs under Unix or Windows NT (abbreviated as NT.) The Software Affiliates hope that they will gain useful information about improvements to their software, and some information about successful applications to advertise. If you have suggestions or found a program useful, please send e-mail to ps7z@andrew.cmu.edu.

 

Unix Access

 

The software that runs under UNIX has been installed in CALD directories on the afs file system. The specific location for each program can be found by clicking on the name of the program. Anyone with a CS account or an Andrew account can execute them. Those using an Andrew account must type "cklog cs.cmu.edu" before executing the program.

Windows NT Access

 

The software that runs on Windows NT has been installed in the CALD laboratory (Wean Hall 4616) on machines CALD-1 through CALD-4. These machines are all dual boot (LINUX and Windows NT.) CALD-2 and CALD-4 have been designated as machines that can be switched to Windows NT by the user at the console at any time, and should be used before CALD-1 and CALD-3. The programs can be executed from the Start menu (except for Intelligent Miner, whose execution instructions can be found by clicking on the name of the program.) To get an account on a CALD-1 through CALD-4, write diane@cs.cmu.edu.

Documentation

 

The hard-copy documentation that is available (which includes documentation of all the commercial software) is in the CALD laboratory (Wean Hall 4616.) Those programs without hard-copy documentation (as well as those with hard-copy documentation) have on-line help.

Demonstrations and Help

 

If you would be interested in a demonstration of Clementine, IBM Intelligent Miner, Model 1, or SAS Enterprise Miner, (or when they arrive, SGI Mineset or Darwin), please write me at ps7z@andrew.cmu.edu. I am familiar with the basics of these programs, and would be happy to try and answer questions or to give information about the technical support services of the individual software vendors.

 

 

 Classification Software

Software:

 

Features

Bayes Knowledge Discoverer (Unix)

Clementine (NT)

Darwin (NT)

IBM Intelligent Miner (NT)

Microsoft Bayes Network (NT)

MLC++ (Unix)

 Model 1 (NT)

SAS Enterprise Miner (NT)

SGI Mineset (NT)

SNNS (Unix)

Tetrad (Unix)

Simple Bayes

 

 

 

 

 

X

X

 

 

 

 

Decision Trees

 

X a

X b

X

 

X c

X d

X

X e

 

 

Logistic Regression

 

 

 

 

 

 

X

X

 

 

 

Linear Regression

 

X

 

 

 

X f

X

X

X f

 

 

Neural Networks

 

X

X

X

 

X

X

X

 

X

 

Rule Builders

 

X

 

 

 

X g

 

 

 

 

 

Association Rule Builder

 

X

 

X

 

 

 

 

 

 

 

Decision Table

 

 

 

 

 

X

 

 

X

 

 

Radial Basis Functions

 

X

 

X

 

 

 

 

 

 

 

Instance Based

 

 

 

 

 

X h

 

 

 

 

 

Linear Discriminators

 

 

 

 

 

X i

 

 

 

 

 

Memory-based

 

 

X

 

 

 

 

 

 

 

 

Bayesian Networks

X

 

 

 

X

 

 

 

 

 

X

Time Series

 

 

 

X

 

 

 

 

 

 

 

Features

Software:

Bayes Knowledge Discoverer (Unix)

Clementine (NT)

Darwin (NT)

IBM Intelligent Miner (NT)

Microsoft Bayes Network (NT)

MLC++ (Unix)

 Model 1 (NT)

SAS Enterprise Miner (NT)

SGI Mineset (NT)

SNNS (Unix)

Tetrad (Unix)

  1. C5.0 with boosting b. CART c. ID3, MC4, C4.5, T2 d. CHAID, CART e. Option Trees f. Regression Trees g. OneR h. IB i. Perceptron, Winnow

 

Clustering Software

Software:

Features

Autoclass III

Clementine (NT)

Darwin (NT)

IBM Intelligent Miner (NT)

 Model 1 (NT)

SAS Enterprise Miner (NT)

SGI Mineset (NT)

 Miscellaneous

 

 

 

 

X a

X a

 

Kohonen Networks

 

X

 

X

 

 

 

Kmeans clustering

 

X

X

 

 

 

X b

Bayesian

X

 

 

 

 

 

 

 a. unspecified method b. single and iterative

 

Miscellaneous Software

Software:

Features

DB2

 SAS

 Statistical

 

X

Database

X

 

 

Autoclass III

CALD Location: /afs/cs.cmu.edu/project/cald-1/autoclass-c/autoclass

"The program AUTOCLASS III, Automatic Class Discovery from Data, uses Bayesian probability theory to provide a simple and extensible approach to problems such as classification and general mixture separation. Its theoretical basis is free from ad hoc quantities, and in particular free of any measures which alter the data to suit the needs of the program. As a result, the elementary classification model used lends itself easily to extensions."

Link to Autoclass III Web Page

 

Bayes Knowledge Discoverer

CALD Location: /afs/cs.cmu.edu/project/cald-1/ramoni/bkd/bin/bkd

"Bayesian Knowledge Discoverer (BKD) is a computer program able to learn Bayesian Belief Networks from (possibly incomplete) databases. BKD is based on a new estimation method called Bound and Collapse and it has been developed within the Bayesian Knowledge Discovery project." It is capable of parameter estimation, model selection, goal-oriented propagation, discretization, handling missing values, and has a graphic user interface.

Link to Bayes Knowledge Discoverer Web Page

 

Clementine

CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)

Clementine is based on a visual programming interface which links data access, manipulation and visualisation together with machine learning (decision tree induction and neural networks). Trained rules and networks can be exported as C source code. It uses a graphical 'building block' approach to develop applications. Clementine can acess data from files or databases, provides a graphical interface for data manipulation and data visualization, and has a number of machine learning algorithms. It contains the C5.0 algorithm for creating decision trees, and allows for using boosting. Clementine also has a standard which allows users to write code to use the Clementine interface on programs written outside of Clementine.

Link to Clementine Web Page

 

Darwin

CALD Location: Coming Soon

Darwin uses wizards to guide users through the process of assembling the data, building models and interpreting results. The models can also be generated in C++ or Java to be used outside of Darwin. It allows users to integrate features of Darwin into other applications and decision support tools.

Link to Darwin Web Page

 

DB2

DB2 is IBM's database manager.

CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)

Link to DB2 Web Page

 

IBM Intelligent Miner

CALD Location: CALD-2, CALD-4 (when running Windows NT)

Intelligent Miner provides a number of different tools for creating classification and clustering models, for preprocessing data, for applying models. It can read data from a db2 data bas or an ASCII file. To access this program you will need to get instructions and a password from ps7z@andrew.cmu.edu.

Link to IBM Enterprise Miner Web Page

 

Microsoft Bayes Network

CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)

"Microsoft Bayes Network allows the creation, assessment and evaluation of Bayesian belief networks… It supports the following operations: loading and storing of belief networks in textual form, creation and modification of networks through the addition of nodes and arcs, assessment of discrete probabilities, evaluation of belief networks using exact clique-tree propagation methods , decision-theoretic troubleshooting and recommendations, asymmetric assessment, and single-decision influence diagrams."

Link to Microsoft Bayes Network Web Page

 

MLC++

CALD Location: /afs/cs.cmu.edu/project/cald-4/mlc++/mlclogout

MLC++ provides a wide variety of learning algorithms with a common interface. It allows for discretization, bagging, variable selection and boosting. It provides the algorithms for SGI Mineset. The advantage over Mineset is that it is more flexible and has more algorithms. The disadvantage is that the interface is much more primitive, and it lacks Minesets powerful visualization tools.

Link to MLC++ Web Page

 

Model 1

CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)

Model 1 provides a wide variety of methods for data access, pre-processing, analysis, optimization and validation. Model1 automatically applies many different classification algorithms to a given problem. It also automatically tries many different subsets of variables, and bins variables in a variety of different ways. It provides wizards for guiding the user through a problem. It can read either ASCII files, or read from a data base.

Link to Model 1 Web Page

 

SAS

CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)

SAS is a powerful statistical program that provides a large suite of statistical and visualization tools, as well as supporting SQL. SAS Enterprise Miner provides a graphical interface to some of the tools that are in SAS.

Link to SAS Web Page

 

SAS Enterprise Miner

CALD Location: CALD-1, CALD-2, CALD-3, CALD-4 (when running Windows NT)

Enterprise Miner provides a graphical interface to the SAS statistical program. It allows the user to pre-process variables in a variety of ways, and then create and test models of the data.

Link to SAS Enterprise Miner Web Page

 

SGI Mineset

CALD Location: Coming Soon.

Mineset provides many powerful visualization tools, and a simple interface. It uses interactive 3 dimensional images, colors, shapes, and animation to allows the user to visually explore both complex models and data. It can read data from ASCII files and from databases. and to Direct access to Oracle, Informix, Sybase, and flat file data. It also provides an easy to user interface to many of the algorithms in MLC++. It also allows boosting

Link to SGI Mineset Web Page

 

SNNS

CALD Location: /afs/cs.cmu.edu/project/cald-1/snn/SNNSv4.1

SNNS is a very flexible and powerful tool for creating neural networks. It includes a graphical interface, a wide variety of propagation algorithms, and allows the user to set many different parameters.

Link to SNNS Web Page

 

Tetrad

Location: /afs/cs.cmu.edu/project/cald-1/tetrad/tetrad

Tetrad II is a multi-module program that assists in the constructionof causal explanations for sample data and their use in prediction. Withcontinuous variables the program will aid in the search for "pathmodels" or "structural equation models;" with discrete datathe program will construct and update a Bayes network from sample dataand user knowledge of the domain; the program includes Monte Carlo facilities.Proofs of the asymptotic correctness of all but one of the search modulesare available in P. Spirtes, C. Glymour and R. Scheines, Causation,Prediction and Search, Springer Lecture Notes in Statistics, 1993.

Link to Tetrad Web Page