Abstract
This paper investigates the application of neural networks to aid the selection of the most appropriate heuristic procedure for the resource-constrained project scheduling problem with cash flows(RCPSP). This is a difficult combinatorial optimization problem that finds wide application. Many heuristic procedures have been developed for this problem with differing performance characteristics in different problem environments. This paper deals with the complex task of choosing the most relevant heuristic or heuristic category for a given instance of the problem. We apply neural networks to induce the relationship between project parameters and heuristic performance to guide the selection under different project environments. We also compare the results of the neural network approach with those from traditional statistical procedures. An innovative feature of our approach is the integration of statistical and optimization methods with neural networks to address data preprocessing, thereby improving the performance of the neural network. We demonstrate that neural network methodology can be employed both to extract information about project conditions as well as to provide predictions for novel cases. Extensive experimentation with network topologies and learning parameters indicate that this approach has significant promise in identifying categories of heuristics that are appropriate for any instance of the problem, rather than selecting a single best heuristic. The methods investigated in this article have potential to be useful in many project management applications as well as other hard combinatorial optimization problems where the difficulty in solving problems to optimality has resulted in the development of a number of heuristics. 

The resource-constrained project scheduling problem (RCPSP) is concerned with the scheduling of a collection of precedence related activities subject to resource constraints. When significant levels of cash flows are present over the duration of the project in the form of expenses and payments, determining a schedule that maximizes the Net Present Value (NPV) of the cash flows is an appropriate objective [3]. This is a hard combinatorial optimization problem [8] for which many heuristic procedures have been developed to obtain good solutions [2, 6, 21, 26]. However, it is not clear which procedures produce the best schedules and highest NPV under different scheduling environments. Hence project managers have difficulties making decisions regarding the appropriate scheduling rule to use. Once a rule is selected, it will generally be used throughout the project. The selection of an inappropriate heuristic may result in large monetary losses. 	
Little work has been done on investigating the dependence of the various heuristic rules on project parameters for the RCPSP with cash flows. A major difficulty has been the lack of success in achieving a meaningful set of summary measures which characterize the problem. Furthermore, different heuristics perform well under different conditions, i.e., no single heuristic dominates in all problem settings, and the project parameters which may contribute to the heuristic performance are not readily apparent. The only way to choose a heuristic that exhibits good performance is to apply each heuristic to the problem and see what happens [5]. These are ad hoc methods that consume significant time and effort. 
This paper integrates several methodologies to address the issue of selecting heuristics in order to accomplish intelligent scheduling in the project management domain. The study aims to: (1) Construct a set of summary measures that characterize relevant information about projects; (2) Design a robust neural network based procedure to support the process of heuristic selection; (3) Compare the generalization performance of neural networks with some traditional statistical procedures. 
	Neural networks are a promising vehicle for exploring the performance of these heuristics with respect to their dependencies on the project parameters. They have been generally successful in pattern recognition and classification [30], thus providing a useful tool in the classification and selection process [12, 18, 28]. Previous studies have shown encouraging results on the application of neural networks to combinatorial optimization problems such as the traveling salesman and vehicle routing problems [12, 18, 29]. Padman [20] conducted an exploratory study for predicting project scheduling heuristic performance using neural networks with encouraging results. In this paper, we elaborate on that study with significant emphasis on problem representation, preprocessing of input and output information, as well as extensive experimentation with neural network architectures. 	
We have designed and conducted a series of experiments with a neural network system using a collection of problems from [21]. A systematic preprocessing approach consisting of various statistical analyses, optimization techniques, and feature detection networks has been used to enhance the system performance. We experimented with a single neural network system as well as a modular system to train and test the data. We also compared the results from applying neural networks with those from multivariate regression and discriminant analysis. Empirical results show that the neural network is successful in selecting categories of appropriate heuristics for an instance of the RCPSP rather than selecting a single heuristic. The methodology is generalizable and adaptable for other difficult combinatorial optimization problems as well. 
The paper is organized as follows. In Section 1, we describe the resource constrained project scheduling problem. In Section 2, we present various data preprocessing techniques and data representation schemes employed in this study. Section 3 describes the experimental design and computational experience with determining the parameters of the neural network. In Section 4, we discuss the experiments and results. Section 5 summarizes the conclusions. 
1.   Resource-Constrained Project Scheduling Problem with Cash Flows
1.1  Problem Statement
Resource-constrained project scheduling with cash flows find wide application in environments such as research and development, manufacturing, and construction. As described in [26], a mathematical formulation of the problem with exponential discounting is as follows: 
In the above formulation, n is the number of nodes in the network,  ( = ) represents the cash flow associated with the event (node) , and  is the discount rate. The variable  represents the scheduled time of each event . There are  activities in the project network, each with duration  ().  represents the resource requirements for the set  of activities in progress at time  (), where  denotes the duration of the project.  is the resource limit vector. The objective is to obtain the optimal time for scheduling each task such that the Net Present Value (NPV) of the project is maximized. 
Figure 1 depicts an example of a project network with durations represented on the arcs and cash inflows (positive number) and outflows (negative number) at the nodes. The resource requirement for each activity is represented on the arc, where each activity requires three types of resources. Given a resource limit vector of [12, 12, 12], and assuming a discount rate of 10%, the maximum NPV for the above example is $547 with a scheduled time for each event as given by  at the event nodes  in the figure. As the project size increases, determining this optimal schedule becomes impractical. Hence considerable effort has been devoted to the development of efficient heuristic procedures to solve the RCPSP with cash flows [2, 6, 21, 26]. 
FIGURE 1. Example Project Network 
Recognizing the need to compare these heuristic procedures, previous research has utilized regression models to infer the relationship between heuristic performance and project parameters. For example, for the RCPSP with minimum duration objective, multiple regression models provide general guidance for heuristic selection [5, 23]. Davis [5] employed multiple regression models to predict the percentage increase in critical path duration for a particular sequencing rule (minimum late finish time heuristic) in RCPSP. The results demonstrate that a statistical relationship exists between project network and resource parameters and the heuristic rule selected for the problem. Patterson [23] made an extensive study of project scheduling summary measures and investigated eight heuristic rules along with three objective functions, also using multiple regression models to predict heuristic performance. While both studies demonstrate the possibility of predicting relative heuristic performance based on the project summary measures, they did not take into account cash flow features. Russell [26] used simple linear regression analysis to test and compare six heuristic scheduling rules for RCPSP with cash flows. Based on analysis of 80 problems, he concluded that level of resource constrainedness has a significant impact on heuristic performance. 
In this paper, we examine all the parameters that have been identified in the literature for the resource-constrained project scheduling problem. To analyze the effect of these problem parameters on heuristic performance, we utilize sixteen heuristic rules from Padman [21] that were developed for the RCPSP with cash flows. In the following, we describe these parameters and heuristic rules. 
1.2  Project Descriptors 
	In order to classify/predict scheduling rules, it is necessary to identify project summary measures as predictors that will characterize the problem. Pascoe [22] first introduced the concept of summary measure for categorization of heuristics which was later utilized by Johnson[10], Davis [5] and Patterson [23]. Overall, we identified more than 40 project scheduling characteristics from the literature which may be relevant to heuristic performance. We eliminated several of those variables due to multi-collinearity effects, finally selecting a set of 30 variables. We thus assume that project networks can be characterized by a vector of these 30 variables which measure the complexity of a given project scheduling problem along several dimensions. These summary measures are used in subsequent neural network training. As shown in Table I, they consist of categories such as project size parameters(size), parameters based on critical path analysis (cpl-based), resource based parameters, cash flow parameters, and project network shape.
1.3  Heuristic Rules 
	Given the number of heuristics developed in the literature to address the scheduling of resource-constrained projects with cash flows, in this paper, we focus on the study by Padman et al. [21] who developed nine optimization guided heuristic rules designed to exploit different aspects of the project scheduling problem. These rules utilized information from a relaxed optimization model obtained by ignoring the resource constraints in the RCPSP [25]. These priority dispatching scheduling rules were embedded in a single-pass greedy algorithm and applied to a set of randomly generated problems describing 144 different project environments. Their performance was compared with seven other heuristics from past literature. A brief description of the rules is provided in Table II.
	It may be observed that these sixteen heuristic rules are proposed based on information such as opportunity cost, dual price, cash flow weight, and so on, and can thus be categorized into six distinct groups. Based on the ordering of the heuristics in Table II, heuristics 1, 2, 3, 10, 12, 13, 14, 15 and 16 are the nine optimization-based heuristics proposed in [21]. Within this large group, heuristics 1, 2 and 3 are rules based on opportunity cost; heuristics 10, 12 and 15 are rules based on dual prices while heuristics 13, 14 and 16 are rules based on cash flow weight. Duration-based rules include heuristics 4 and 5, which are a set of heuristics that employ information derived from Critical Path Method (CPM) [13]. Heuristic 11 is a cash flow weight rule proposed by Baroum et al. [2] that does not use optimization based methods. Heuristics 6, 7 and 8 are target scheduling rules proposed by Russell [26]. Heuristic 9 is a random heuristic. This information on categorization of heuristics is useful for output preprocessing, which is discussed in Section 2.
TABLE  I. Project Characteristics
Category
Variable 
Description

S
I
Z
E
NNODE
Number of nodes to be scheduled
NARC
Number of arcs (activities) to be scheduled
NDUMMY
Number of dummy activities
XDUR
Average activity duration
XDENSITY
Average activity density
COMPLEXITY
Project complexity
C
P
L
|
B
A
S
E
D
CPL
Critical path length
NSLACK
Total slack of all activities
PCTSLACK
Percent of activities possessing positive total slack
XFREESLK
Average total slack per activity
TSLACK_R
Total slack ratio
XSLACK_R
Average slack ratio
PDENSITY
Project density - Total
TFREESLK
Free slack of all activities (Johnson)
NFREESLK
Number of activities possessing positive free slack 
PCTFREESLK
Percent of activities possessing positive free slack 
XFREESLK
Average free slack per activity 
PDENSITY_F
project density - free(Pascoe)
R
E
S
O
U
R
C
E
UTIL1
Utilization of resource 1 (R1)
UTIL2
Utilization of resource 2 (R2)
UTIL3
Utilization of resource 3 (R3)
XUTIL
Average resource utilization
TCON1
Resource constrainedness over time for R1
TCON2
Resource constrainedness over time for R2
TCON3
Resource constrainedness over time for R3
XCON_TM
Average resource constrainedness over time
CASH
FLOW-
BASED
PROFIT_MARGIN
Profit margin
FREQ_P
Frequency of payment
INT_RATE
Interest rate
SHAPE
NSTR
Network structure
TABLE II. Heuristic Priority Rules and Descriptions 
Category
Heuristics 
Descriptions


OPPORTUNITY COST
1. OCS/LAN
Activities are selected for scheduling in ascending order of opportunity cost of scheduling. Activities with zero tardiness penalties are scheduled according to lowest activity number (LAN).
2. OCR/LAN
Activities are scheduled in ascending order of the opportunity cost of resources, with zero tardiness penalty activities scheduled according to LAN. 
3. NOC/LAN
Activities are scheduled in increasing order of net opportunity cost which balances the cost of delaying some activities against scheduling others. Zero tardiness penalty activities are scheduled according to LAN. 

DURATION-

BASED
4. LFT/LAN
Priority is given to the activities with the minimum latest finish time as determined by critical path analysis. LAN is used to break ties between the activities. 
5. MS/LAN
Select activity with the smallest slack, where the critical path is continuously revised as each activity is scheduled. LAN is used to break ties. 

TARGET

SCHEDULING
6. TS/LAN
Select activity with the maximum difference between current early finish time and optimal finish time as given by the relaxation model. LAN is used to break ties. 
7. DUAL/TS
Activities with the highest dual price are selected for scheduling. Ties are broken using TS rule. 
8. TS/DUAL
Select activities with maximum difference between current early finish time and optimal finish time as given by the relaxation model. Maximum dual price is used to break ties.
RANDOM
9. RAND-50
Random selection of eligible activities is chosen for scheduling, with the best of 50 replications being reported. 


DUAL 
PRICE
10. MTP/LAN
Activities are selected from the schedule queue in descending order of their tardiness penalties. Activities with zero tardiness penalties are scheduled according to LAN. 
12. LTP/LAN
Activities are scheduled in ascending order of tardiness penalties. Activities with zero tardiness penalties are scheduled according to LAN. 
15. MTP/ET
Same as MTP/LAN, except that activities with zero tardiness penalties are scheduled according to early time rule. 


CASH FLOW
WEIGHT
11. CFW/LAN
Select activity with highest cash flow weight for scheduling. LAN is used to break ties. 
13. CFW-CFW
Use highest cash flow weight activities for scheduling. Activities with zero tardiness penalties are then scheduled using highest cash flow weights. 
14. CFW-OCC
Use cash flow weights for the scheduling from the first queue, and opportunity cost of cash flows for scheduling from the second queue. 
16. TSCFW
Schedule activities based on highest cash flow weight. 
1.4  Experimental Data
	The data set consists of 1,440 randomly generated project networks with resource constraints and cash flows. These 1,440 problems were generated from six experimental parameters including project size(size), project network structure(shape), resource constrainedness (AUF), interest rate (CC), profit margin (PM), and frequency of progress payments (FP). These six parameters form a significant subset of the project descriptors presented in Table I. All six parameters were each varied over several levels in constructing the project networks. 	
Project size is characterized by the number of activities involved, i.e., all small size project networks have 48 activities while the medium size problems have 110 activities. There are three types of resources associated with each activity. The degree of resource constrainedness is characterized by AUF (average utilization factor) [14]. It varies from a low level of 1.0 to a medium level of 1.5 and a high level of 2.0. Network shape varies over three levels: balanced, skewed to the right, and skewed to the left. There are three cash flow parameters; the interest rate which varies over two levels from a low of 10% to a high of 20% annually; frequency of progress payments which is also set at two levels, where a low level indicates that on average every seventh activity yields a positive progress payment upon its completion while a high level indicates that on average every third activity yields a positive progress payment; profit margin reflected by the final payment for the completed project which is set at two levels, where the low level is approximately equal to the sum of the cash outflows for the project plus 30% and a high level equal to the sum of the outflows plus 50%. A full factorial experiment was conducted which resulted in 144 different scheduling environments leading to 1,440 problems with 10 replicates in each environment. 
	Figure 2 depicts a data flow diagram of the problem that links the data, problem characteristics, and heuristics in the following manner: A set of resource-constrained project scheduling problems with cash flows is generated based on six parameters using a data generator. For each problem, a summary measure consisting of 30 variables is derived from the project network specifications. This summary measure then serves as a vector of teaching input. For each of the 1,440 problems, sixteen heuristic procedures are applied and sixteen corresponding NPVs are generated. The NPVs serve as the vector of teaching output. Before the teaching inputs and outputs are fed into neural network for training and testing, a preprocessing module which contains various preprocessing techniques is used to extract and refine relevant information about the problem and heuristics.

FIGURE 2. Data Flow Diagram
2.   Data Preprocessing and Data Representation
	After the data is collected, many techniques can be employed to analyze the data. Dumping the whole data set into a neural network without careful inspection and preprocessing will not work for nontrivial tasks. The effectiveness of neural network performance depends largely on good data preparation [17]. An innovative feature of this study is the investigation of diverse input and output preprocessing strategies and data representation issues that can enhance neural network performance. This section provides a description of these data preprocessing strategies used to preprocess the inputs and outputs. We also discuss the representation issue, i.e., the encoding of the data fed into the neural network. 
2.1  Data Statistics
As discussed in Section 1.2, we derived about 30 independent variables to describe the project which may contribute to heuristic performance. We call this set of input data N30. The table in Appendix A presents some summary statistics for the 1,440 cases of the 30 variables. 
2.2  Input Preprocessing
	It is desirable to reduce the input dimensions even further from 30 variables so that neural networks with few nodes and connection weights would be sufficient to capture the mappings between inputs and outputs. This helps to eliminate disturbing noise or redundant information and also facilitates neural network training and testing [1]. Two different approaches were taken to reduce input dimensionality. We employed the statistical technique of principal component analysis (PCA) and a neural network based self-organizing feature extraction network to condense input information.
2.2.1  Principal Component Analysis (PCA)
	Principal component analysis (PCA) is a multivariate statistical procedure which is concerned with explaining the observed variance-covariance structure, where each principal component is a linear combination of the original variables [24]. One of the main purposes of PCA is to summarize the data for dimensionality reduction while preserving most of the original information.
	PCA was performed on the 30 variables with normalized values. The normalization procedure will be further discussed in Section 2.4. The results show that the first six principal components collectively explain 86% of the total sample variance and provide a good summary of the 30 variables. Consequently, a reduction in the data from 1,440 observations on 30 variables to 1,440 observations on six principal components is reasonable. We call this set of reduced input data (with 6 input variables) PCA6.
2.2.2  Feature Extraction Network (FEN)
	Neural networks have self-organizing ability that can be used to extract critical features of a problem and thus reduce input dimensionality [15]. In [19], Oja showed that a simplified neuron model can serve as a principal component analyzer. A fan-in-fan-out 3 layer symmetric network was developed for feature extraction [16, 18]. Figure 3 depicts the architecture of the feature extraction network.

FIGURE 3. Feature Extraction Network
	In this network, the same 30 inputs also serve as teaching outputs. The six hidden nodes in the middle layer serve as feature extractors and the condensed information is stored in this layer of the network. The activation values of the six hidden units are continuous values in the range (-0.5, 0.5). The number of hidden nodes in the middle layer was derived in part through experimentation. It was also motivated by the fact that six parameters (SIZE, SHAPE, AUF, PM, FP and CC) were used to randomly generate the project networks. We used back-propagation to train the neural network until the average training error decreased to less than 0.01. The six units in the hidden layer act as inputs (FEN6) to the generalization network which will be discussed in Section 3.2. 
2.3  Output Preprocessing
	The net present values (NPV) from all sixteen heuristics were used to provide the teaching output. One significant observation about the output data was that some sets of heuristics behaved quite similarly and their performances were very close to each other. Thus there were not distinct differences among the sixteen NPVs. This can be traced back to the original design of the heuristics. As indicated in Section 1.3, some of heuristics were proposed based on the same criterion and they share some common properties. This similarity in the heuristics could prevent the neural network from learning to differentiate among the sixteen heuristics. However, with an appropriate categorization of heuristics that groups the heuristics based on their similarities in performance, the differences among the categories should be more significant, thus enabling the neural network to learn better. To facilitate learning, we employed two methods. The first is based on using expert knowledge to categorize the heuristic procedures. The second derives clusters by formulating and solving an integer programming model for cluster analysis. 
2.3.1  Heuristic Categorization by Expert 
	As discussed previously, the sixteen heuristics can be grouped into six categories such that the heuristics within a category share some common properties. We call this clustering the categorization of heuristics based on expert knowledge.
TABLE III.  Expert Grouping of Heuristics
Category
Heuristics
1
1, 2, 3
2
10, 12,15
3
11, 13, 14, 16
4
4, 5
5
6, 7, 8
6 
9
	Category 1 is a set of opportunity cost rules which consists of heuristics 1, 2, and 3. Category 2 is based on dual price and include heuristics 10, 12, and 15. Category 3 consists of cash flow weight rules that include heuristics 11, 13, 14, and 16. Duration based rules that include heuristics 4 and 5 make up category 4. Category 5 is a set of target scheduling rules that include heuristics 6, 7 and 8. The last category consists of a random heuristic, rule 9.
2.3.2  Cluster Analysis 
	Cluster analysis using an integer programming model was applied on the distance matrix of the sixteen NPVs. First, a 16 x 16 correlation matrix was computed from 1,440 cases on each of the 16 NPVs. The distance matrix was obtained from the correlation matrix as 1 - correlation. 
	Having computed the distance between all pairs of heuristics, we partitioned the 16 heuristics into N (N = 4, 5, 6) clusters. Each cluster is represented by a median which serves as the central point for its cluster so that the distance between the heuristics within each cluster is minimized. Our objective is to find the optimal selection of cluster medians and assignments of the remaining heuristics to the cluster median so that heuristics in a cluster are most similar. The integer programming (IP) model for this problem [27] is shown in Appendix B. In selecting the cluster medians and assigning the remaining heuristics to the cluster median, the sum of the total distances is minimized. The model was implemented in GAMS [4] and run on a VAX 11/780. 
When N=4, all sixteen heuristics are grouped into four clusters. The results are shown in the first figure in Table IV. Cluster 1 consists of heuristics 1, 2, 3 and 10. It is interesting to note that heuristic 10 (MTP/LAN) is closer to the opportunity cost rules in NPV performance than to the dual price rules. The second cluster consists of duration-based rules. The third cluster is dominated by the target scheduling rules plus the random heuristic. The last category consists of both dual price rules and cash flow weight rules. 	
When N=5, all sixteen heuristics are grouped into the five clusters as shown in the second figure in Table IV. Categories 1, 2, and 5 are the same as categories 1, 2 and 4 in the 4-cluster case, while the third category in the 4-cluster result is divided into two sub-categories. This may be indicative of the dissimilarity among the heuristics in category 3 compared to other categories when forced to separate into five clusters from four. 
	When N=6, all sixteen heuristics are grouped into six clusters. As we can see from the last figure in Table IV, the fifth category is further divided into two sub-categories: one for dual price rules and the other for cash flow weight rules, further illustrating the order and level of dissimilarity among the heuristics in each category.
TABLE IV. Heuristic Categorization by Cluster Analysis (4, 5 and 6 clusters)
Group
Median
Rule
Group
Median
Rule
Group
Median
Rule
1
1
1, 2, 3,10
1
1
1,2,3,10
1
1
1, 2, 3, 10
2
4
4, 5
2
4
4, 5
2
4
4, 5
3
9
6,7, 8, 
9, 11
3
7
7, 11
3
6
6, 8, 9
4
8
6, 8, 9
4
11
7, 11
4
13
12, 13
14, 15, 16
5
13
12,13,14,
15,16
5
12
12, 15
6
13
13, 14, 16

	An interesting observation is that the cluster analysis results are consistent with the heuristic categorization by expert. By comparing the results from 6-cluster analysis and expert categorization, we can see that the two categorizations are similar except for the following differences: heuristic 10 which is a dual price rule is clustered with the opportunity cost rules (category 1) whereas it belongs to the category 2 in the expert categorization; the random heuristic (heuristic 9) is clustered with the target scheduling rules whereas it forms a separate category in the expert categorization; and one target scheduling rule (heuristic 7) is clustered with a cash flow weight rule (heuristic 11) whereas heuristic 11 goes with all other cash flow weight rules and heuristic 7 with other target scheduling rules in the expert categorization. 	
After categorizing the heuristics, there should be exactly one value chosen from each group as a candidate value for that group. We considered two approaches here. In the first approach, we took the mean NPV of all the heuristics in each cluster; the second approach considered the NPV of the median from each cluster as the candidate value. The relative merits of each approach are discussed in Section 4. 
2.4  Data Representation
	The coding scheme and variable scaling employed in constructing an appropriate network are also critical to the success of neural network training. We investigated various strategies for variable encoding and scaling. 
	We examined the effects of different representation schemes on the performance of the neural network by evaluating both continuous and discrete representation of inputs. The 30 summary measures are all continuous-valued and they were normalized in the range between -0.5 and 0.5 for fast convergence [17]. The six inputs from PCA and feature extraction network are also continuous-valued. As indicated in Section 1.4, all 1,440 problems in the data set were generated based on six parameters set at two or three levels: interest rate, profit margin, frequency of payment, resource utilization, network topology and problem size. To test the hypothesis that there exists a functional mapping between these six parameters and the performance of heuristics (as measured by project NPV), we used these six variables as inputs to the neural network using a discrete representation for the different levels. 	For example, there are three levels for the parameter that measures resource constrainedness. We use -0.5, 0 and +0.5 to represent the low, middle and high levels respectively. [31] describes this representation in greater detail. 
With regard to the output representation, a continuous-valued vector was taken as the teaching output. For each instance of the problem, each of the 16 heuristics was run to obtain the NPVs [21]. These NPVs were normalized with respect to the best and the worst value. The best performing heuristic was given a value of 1.0 and the worst 0.0. The remaining NPVs ranged between these limits with the value indicating the distance from the best performing one. 
	Figure 4 gives a general overview of the data preprocessing and representation scheme. In summary, four types of input were fed into the neural network. The primary inputs are the problem characteristics described by the derived summary measures. They were represented by two encoding schemes, i.e. a continuous representation scheme (with thirty variables) and a discrete representation scheme (with six variables). Two different input preprocessing techniques, PCA and FEN were independently applied to reduce the input dimensionality from thirty to six. The resulting input used a continuous-valued representation. The output to the neural network was represented by a vector of normalized NPVs. Two output preprocessing techniques were applied to the outputs which reduced the output dimensionality from sixteen to four, five or six. In the next section, we describe the design of our experiments resulting from the different representation schemes and the different types of preprocessing. Results from training and testing the networks are analyzed and discussed in Section 4.

FIGURE 4. Data Preprocessing and Representation Scheme
3.   Experimental Design
The back-propagation learning paradigm along with its variant called quick-prop was employed in this problem [7]. We conducted extensive experimental analysis with network topology and learning parameters to derive the best results.
3.1  Learning Parameters
	Although many neural network applications have been shown to be successful in the literature, the conditions under which good performance can be obtained are not yet clear. We started with experiments on a benchmark network 30-6-16 (the neural network model with 30 input nodes, 6 hidden nodes and 16 output nodes) to investigate the effect of learning parameters (learning rate, momentum, initial weight range) on neural network performance. It is impractical to conduct an exhaustive experimentation of all possible combinations due to the wide ranges of all the parameter settings. Therefore, a series of exploratory experiments were first conducted to find the most promising range. Subsequently, extensive experimentation was conducted within these ranges.
Learning Rate : Learning rate is an important factor in controlling the gradient descent. To find an optimal learning rate is a very difficult task. A small learning rate affects the convergence speed and the training may take much longer time than with a higher learning rate. A large learning rate may help the network escape from local minima, but the neural network may go through large oscillations and may never converge. In this study, learning rates were explored from 0.001 to 0.25 with 0.001, 0.01, 0.05, 0.1, 0.15, 0.25 as the various values of . 
 Momentum : Another parameter to control convergence speed is called momentum. During the calculation of the weight-change value, a fraction of the previous change (i.e. momentum) is added. The main idea is to keep the weight changes going faster in the same direction. Momentum is usually set in the range between 0.1 and 0.9. Experience indicates that 0.9 is a reasonable choice [7]. In addition, we chose three other levels of 0.0, 0.5, 0.7 for the experiments. 
Initial Weight Range: Weights are generally initialized to small random values, as in the range between (-0.1, 0.1). In this study, we tested the initial weight range from (-0.5, 0.5), (-0.7, 0.7) and (-1.0, 1.0).
Experiments indicate that =0.01, = 0.9, and  =0.7 generally gave better results. 
3.2  Single Generalization Network (SGN)
Once we found a set of reasonably good learning parameters, we experimented with different network configurations. While learning parameters impact the convergence of the network, one of the factors that contributes to the generalization capability of the neural network is configuration [16]. A network configuration is characterized by the arrangement of processing units and weight connections. In Section 2, we discussed the representation of inputs and outputs which determine the size of the input and output layers of the neural network. Unfortunately, no prior estimate can be made to derive the optimal internal representation, i.e., the features of the hidden layer, for a given task. Previous studies [9] have shown that three layers are sufficient for generalization. Adding more hidden layers may only affect the convergence of the network. The single generalization network architecture thus included the input and output layers and a single hidden layer.
	We designed a total of 32 experiments which varied over the number of inputs, number of outputs (resulting from input and output preprocessing methods) and number of hidden nodes. 

FIGURE 5. Combinations of Inputs and Outputs
As illustrated in Figure 5, the four types of input representations included the continuous representation derived from 30 summary measures (N30), PCA reduction (PCA6), the reduction (FEN6) and the binary representation (BIN6). There are eight output representations: all sixteen heuristics (O16); 4, 5, and 6 clusters with mean NPV as candidate value (I4A, I5A and I6A); similarly, 4, 5, and 6 clusters with median NPV as the candidate value (I4M, I5M, and I6M); six-group categorization by expert (EXP). For each network, the number of hidden nodes was derived empirically to get the best generalization results. 
The use of FEN6 as input resulted in a special combined network system which consists of a feature extraction network and a generalization network as shown in Figure 6. In Section 4.2, we report and analyze our experimental results on the single generalization networks.

FIGURE 6. Combined Feature Extraction and Generalization Networks
3.3  Multiple Generalization Network (MGN)
We also investigated a multi-level neural network system motivated by the success of modular neural network systems in the literature [12, 18]. In contrast to single generalization network that is good at selecting a cluster of heuristics, this multi-level approach enables the selection of a single best heuristic. There are two steps involved in this approach. The first stage involves choosing the right cluster, and the second stage involves the selection of a specific heuristic from a particular cluster. Figure 7 illustrates the multi-level generalization network system. The top level is the cluster identification network and the bottom level consists of multiple generalization networks that select the most appropriate heuristic for the given instance of the problem. 
�	Cluster Identification Network 	
This network consists of 30 inputs, twelve hidden nodes (derived through experimentation), and 4 (also 5 and 6) outputs. The four output units, for example, correspond to the four clusters of the sixteen heuristics obtained from cluster analysis, and are used to establish the four most distinct classes. The neural network is trained to classify each observation into one of the pre-selected groups and specifically select exactly one of the generalization networks to be used at the next level. The output representation for the cluster identification network is a vector of four values in binary representation. For example, if an instance of the project scheduling problem falls in the first cluster/category (i.e, choose heuristic 1, 2, 3 or 10), then the desired output vector is of the form: (1, 0, 0, 0). After training, the network is tested on the testing set, and the largest of the four output activations is used to select a generalization network at the next level. For example, if the final output vector of an instance is (0.9687, 0.0003, 0.0096, 0.3607), and the desired output vector for this instance is (1, 0, 0, 0), then generalization network 1 should be chosen. Note that only part of the testing samples is classified correctly. Thus, only those correctly classified testing patterns will proceed to the next level of the network. 
FIGURE 7. Multiple Generalization Network System (4-cluster)
�	Generalization Network 
	The bottom level of the multi-level network system consists of multiple generalization networks. As shown in Figure 7, there are four generalization networks corresponding to the four clusters of heuristics. Each generalization network consists of the original 30 summary measures as the input units, a hidden layer with the number of nodes derived through experimentation, and the number of output nodes corresponding to the number of heuristics in each cluster. The output vector of each generalization network serves as the heuristic recommendation vector. That is, generalization Network I has four outputs representing heuristics 1, 2, 3 and 10. Generalization Network II has two outputs for heuristics 4 and 5. Generalization Network III has five outputs to include heuristics 6, 7, 8, 9 and 11. Generalization Network IV has five outputs representing heuristics 12, 13, 14, 15 and 16. Each of these four generalization networks was trained and tested using relevant problems from the set of 1,440 cases. We also experimented with the 6 categories provided by cluster analysis and expert categorization. The results of these experiments are presented in Section 4.3. 
4. Experimental Results and Discussion
4.1  Measures of Network Performance 
	The degree of convergence (measured by the average training and testing errors) and the ability to generalize are commonly used measures of neural network performance. We report the experimental results based on prediction accuracy on both the training and the testing data. The prediction accuracy is calculated by comparing the desired output with the highest value and the corresponding actual output with the highest value. If the index of the highest actual output for an instance is equal to the index of the highest desired output, that is, the best heuristic recommended by the neural network is the same as the prediction given by the real data, it is considered a first hit. Otherwise, if the best heuristic recommended by the neural network falls in the second best category of the real data, it is a second-hit and so on. Prediction accuracy is thus the percentage of first hits in the training and testing data sets. 
	All of the neural networks were trained on 1015 training samples randomly selected from the 1,440 problem instances described in Section 1. The resulting network was tested on the remaining 425 testing samples. Neural network performance was evaluated based on prediction accuracy as well as the sum of the first-hit, second-hit and third-hit percentages on the testing data. To minimize the error from overtraining, where the network memorizes the training results, training and testing were performed iteratively throughout the experiments. Testing was performed every 10 epochs so that the generalization could be observed during the training. Meanwhile, we were able to observe how the accuracy of neural network improved with increasing number of epochs. It was observed that nearly all the networks converged in about 200 to 300 epochs. The average training and testing error was between 0.04 to 0.09. After about 200 or 300 epochs, with decreasing improvement in error, the training was terminated. All of the training and testing were performed on a DEC 5000 using a revised version of quick-prop simulator (in C) [7]. 
4.2  Results and Discussion on Single Generalization Networks
A total of 32 experiments were conducted with single generalization networks. The neural network we employed was a multi-layered fully-connected feed forward network. Since there was no prior knowledge to decide any specific non-connections between nodes, all the nodes were fully connected with the nodes in their immediate preceding and succeeding layer, but not connected with the nodes across layers. Experiments were conducted on different configurations of networks. The input layer contained one of six or 30 nodes, the hidden layer around three to 18 nodes, and the output layer was made up of one of four, five, six or 16 nodes. In each case, we measured the prediction accuracy. 	The detailed results of these experiments are presented in the tables shown below. The network topology is given in the left most column. TRR and TSS represent the prediction accuracies for the training and testing data respectively. The shaded areas highlight the generalization performance on the testing set. 
TABLE V. Single Generalization Network with 16 Outputs (O16)
Topology
1st-hit
2nd-hit
3rd-hit
best three
TRR
TSS
TRR
TSS
TRR
TSS
TRR
TSS
N30-12-16
26
22
18
19
14
15
58
56
PCA6-4-16
21
17
13
16
12
12
46
45
FEN6-4-16
19
18
16
13
10
11
45
42
BIN6-4-16
23
21
14
15
10
10
47
46
In general, when the selection process includes all sixteen heuristics as shown in Table V, the network with 30 mixed-continuous inputs, 16 normalized NPVs as outputs and a hidden layer with 12 nodes performed relatively better, selecting one of the best three heuristics with about 58% success on the training data and 56% success on the testing data. For the same output structure, the network with 6 inputs obtained from PCA and 4 hidden nodes selected one of the best three heuristics with 45% success. Similarly, the network with 6 inputs obtained from feature extraction network was successful in only 42% of the problems. When a binary representation of the 6 inputs was used, the network had 46% success. 	
When the selection process is restricted to categories of heuristics as shown in Table VI through Table IX, the categories being derived through cluster analysis or expert opinion, the results are significantly different. The networks with four and five clusters in the output perform marginally better than the network with six clusters irrespective of whether mean or median value of the heuristics in each cluster is chosen as the candidate value for output. The prediction accuracy for choosing one of the best three categories ranges from 95% to 98% for four and five clusters and 85% to 92% for six clusters when mean value is chosen and the results are worse by about 3% - 5% when median value is chosen. In comparing the results from cluster analysis and expert categorization on a six output network, it can be concluded that the expert grouping produces marginally better prediction accuracies. 	 
TABLE VI. Single Generalization Network with 4 Clusters (Mean Approach)
Topology
1st-hit
2nd-hit
3rd-hit
best three
TRR
TSS
TRR
TSS
TRR
TSS
TRR
TSS
N30-12-i4a
67
67
25
26
7
5
99
98
PCA6-4-i4a
62
67
27
23
8
7
97
97
FEN6-4-i4a
63
66
26
23
8
7
97
96
BIN6-4-i4a
61
66
27
22
8
7
96
95


TABLE VII. Single Generalization Network with 5 clusters (Mean Approach)
Topology
1st-hit
2nd-hit
3rd-hit
best three
TRR
TSS
TRR
TSS
TRR
TSS
TRR
TSS
N30-12-i5a
53
57
28
29
16
11
97
97
PCA6-4-i5a
49
52
28
28
18
15
95
95
FEN6-4-i5a
50
52
29
28
16
14
95
94
BIN6-4-i5a
52
54
29
29
14
12
95
95

TABLE VIII. Single Generalization Network with 6 clusters (Mean Approach)
Topology
1st-hit
2nd-hit
3rd-hit
best three
TRR
TSS
TRR
TSS
TRR
TSS
TRR
TSS
N30-12-i6a
46
45
29
29
17
17
92
91
PCA6-4-i6a
34
36
30
29
19
22
83
87
FEN6-4-i6a
38
40
29
28
18 
17
85
85
BIN6-4-i6a
37
40
32
31
18
16
87
87

TABLE IX. Single Generalization Network with Expert Categorization (EXP)
Topology
1st-hit
2nd-hit
3rd-hit
best three
TRR
TSS
TRR
TSS
TRR
TSS
TRR
TSS
N30-12-6
49
48
27
31
15
12
91
91
PCA6-4-6
44
48
28
26
16
11
88
85
FEN6-4-6
43
46
29
30
15
11
87
87
BIN6-4-6
42
45
27
27
20
15
89
87
In general, it can be observed that the network with all 30 inputs performed slightly better than the networks resulting from other input preprocessing methods. One possible explanation for this is that the N30 summary measures retain the most information. The results from the remaining three preprocessing methods were very close. Their ability to accurately classify novel instances appears to be quite comparable. The results are fairly consistent across the experiments. This indicates that feature extraction by neural network and information condensation by PCA are both reasonable procedures for dimensionality reduction. 	
The overall neural network performance is not affected much by input preprocessing. Output preprocessing has a more significant impact on neural network performance. For example, the network with 30 inputs, twelve hidden nodes and sixteen outputs was able to correctly predict one of the best three heuristics with a prediction accuracy of 56%. With preprocessing on the 16 outputs, when all 16 outputs were clustered into four groups, the network with the same 30 inputs, four hidden nodes and four categorical outputs was able to select one of the best three heuristic categories with 98% success, and with first-hit rate of about 67%. This results in about 40% increase in prediction accuracy. For output preprocessing, the mean method is generally better than the median method. However, it should be emphasized that the improvement in prediction accuracy is achieved for selecting categories of heuristics, not for selecting a single best heuristic. 
4.3  Results and Discussion on Multiple Generalization Networks
As detailed earlier, MGN involves two levels of neural network training. The top level is the cluster identification network. Experiments were conducted on this network with four and six clusters generated from cluster analysis and the six categories generated by the expert. In the case of the 4-cluster network, for example, training continued until total error was less than 0.09. The classification accuracy was 73% for the training set, and 68% for the testing set. 	
The detailed results of the experiments on multiple generalization networks are shown in Table X through Table XII. At the bottom level, with four generalization networks, for example, the networks FCG2 and FCG3 (Four Cluster Generalization Network 2 and 3) show better performance in terms of prediction accuracy. The performance of FCG2 is the best since there are only two heuristics to select in that network. FCG3 also performs relatively well. This is probably due to the relative ease of differentiating amongst the heuristics in that network which is made up of some target scheduling heuristics, a random heuristic and a cash-flow weight heuristic. The performance of FCG4 (Four Cluster Generalization Network 4) is very poor because the heuristics in that category are extremely close to each other in NPV. 	
The performance of the six generalization network system (Table XI) is quite comparable to that based on expert categorization (Table XII). In comparison with SCG1 (Six Cluster Generalization Network 1), the performance of EG1 (Expert Generalization Network 1) is better on the training set for the first hit, but not on the testing set. The overall performance is slightly better. Notice that there are only 3 heuristics in EG1 while there are four in SCG1. The performance of the six generalization networks (either by cluster analysis or by expert categorization) is significantly better than that of the four generalization networks. As the heuristics get spread out over a larger number of networks, the number of heuristics in each cluster/network becomes smaller, thus enabling the neural network to learn to differentiate better. This result is in contrast to that from single generalization network where the performance improves as the number of clusters decreases. 

TABLE X. Results on Four Generalization Networks 
Net
Configuration
# Tr/ # Ts
1st Hit
2nd Hit
3rd Hit
Best Three
TRR
TSS
TRR
TSS
TRR
TSS
TRR
TSS
FCG1
30-12-4
169/52
60
4
20
83
19
8
99
95
FCG2
30-12-2
25/9
100
89
0
11
/
/
100
100
FCG3
30-12-5
39/11
85
55
13
18
2
9
100
82
FCG4
30-12-5
500/217
42
37
13
13
14
19
69
69

TABLE XI. Results on Six Generalization Networks
Net
Configuration
# Tr/ # Ts
1st Hit
2nd Hit
3rd Hit
Best Three
TRR
TSS
TRR
TSS
TRR 
TSS
TRR
TSS
SCG1
30-12-4
94/24
63
33
23
46
14
8
100
87
SCG2
30-12-3
26/1
88
100
12
0
/
/
100
100
SCG3
30-12-2
109/29
100
52
0
48
/
/
100
100
SCG4
30-12-2
21/8
100
88
0
12
/
/
100
100
SCG5
30-12-2
251/100
100
70
0
30
/
/
100
100
SCG6
30-12-3
199/39
70
36
29
36
1
28
100
100

TABLE XII. Results on Expert Categorization with Six Generalization Networks 
Net
Configuration
# Tr/ # Ts
1st Hit
2nd Hit
3rd Hit
Best Three
TRR
TSS
TRR
TSS
TRR
TSS
TRR
TSS
EG1
30-12-3
119/41
100
24
0
39
0
37
100
100
EG2
30-12-2
26/3
100
100
/
/
/
/
100
100
EG3
30-12-3
19/5
100
80
0
20
/
/
100
100
EG4
30-12-3
127/47
81
55
16
36
3
9
100
100
EG5
30-12-4
208/75
48
31
13
12
18
36
79
79
EG6
30-12-1
123/40
100
100
/
/
/
/
100
100
4.4  Multivariate Regression (MR) and Discriminant Analysis (DA)
	For comparison, we ran multivariate regression (MR) and discriminant analysis (DA) models [11] on the training data and tested the models on the test data. 	The multivariate regression model captures the relationship between multiple dependent variables and a set of independent variables. Let  represent the NPVs of each of the 16 heuristics (or the mean of each cluster of heuristics).  represent the project characteristics. Then the multivariate regression model can be specified as follows, where () represents the error term: 
The results of multivariate regression are presented in Table XIII. The left most column gives the names of the models. There are five models with 30 independent variables and the dependent variables range from 4 to 16. MR(O16), for example, is the regression model with 16 dependent variables. The prediction accuracies on both the training set and testing set (shaded area) are given in the table. As we can see from the table, the model with the smallest number of dependent variables, MR(I4A), performed the best while the model with the largest number of dependent variables, MR(O16), performed the worst. The regression model based on expert categorization MR (exp) did marginally better than that with six clusters MR(I6A). 
TABLE XIII. Results on Multivariate Regression 
Model
1st-hit
2nd-hit
3rd-hit
best three
TRR
TSS
TRR
TSS
TRR
TSS
TRR
TSS
MR (O16)
26
22
17
17
15
12
58
49
MR (i4a)
64
64
26
26
8
6
98
96
MR (i5a)
55
56
27
26
14
14
96
96
MR (i6a)
44
46
28
28
16
14
88
88
MR (exp)
49
52
25
26
17
12
91
90
	Another multivariate technique we employed was discriminant analysis (DA). DA is concerned with the separation of a set of observations and with classification of new observations into previously defined groups. Five linear discriminant models were constructed and tested. The results of discriminant analysis are presented in Table VI where only the first hit rate is reported. Once again, it can be seen that as the number of classes increases, the classification accuracy decreases.
TABLE XIV. Results on Discriminant Analysis
Model 
Classification Rate
TRR
TSS
DA (16)
25
16
DA (i4a)
54
52
DA (i5a)
47
45
DA (i6a)
44
40
DA (exp)
49
47
4.5  Discussion 
The results in Tables XIII and XIV were compared with those from single generalization networks shown in the first rows of Tables V to Table IX in Section 4.2. Interestingly enough, the first hit rate for MR(16) model is exactly the same as that of N30-12-16 network. In addition, the neural network's overall performance (sum of first, second and third hits) on the testing set was marginally higher than that of regression models in all the experiments considered in this study. Figure 8 through Figure 11 highlight the comparison between multiple regression and single generalization network on each of the first, second and third hit rates. The performance of discriminant analysis was consistently worse than that of multivariate regression models, thus DA constructs the worst models for classification in this case. 

FIGURE 8. Comparison of Generalization Performance (16 Heuristics)

FIGURE 9. Comparison of Generalization Performance (4 clusters) 
FIGURE 10. Comparison of Generalization Performance (5 clusters)

FIGURE 11. Comparison of Generalization Performance (6 clusters)
	In evaluating the effectiveness of the neural network system, one particular measure that might be of interest is the distance between the best heuristic result and the result provided by neural networks if the neural network system does not choose the best heuristic. Let  be the NPV produced by the best heuristic and  be the NPV produced by the heuristic recommended by the neural network. If  is the number of misclassified cases in the testing samples, we can calculate the average loss in dollar value, , as follows: 
	To illustrate, for the 4-cluster case, 140 out of 425 testing samples are misclassified; the average loss in dollar value is $13.08, if the neural network does not choose the best set of heuristics. In the 5-cluster case, it is $18.93 (179 misclassified); and in the six-cluster case, it is $19.74 (214 misclassified). This indicates that as the number of clusters increases, the heuristic selection procedure using neural networks becomes less accurate. 	
We further computed the mean absolute error in percentage (MAEP), as follows: 
	The MAEP for the 4-cluster case is 1.14%; for the 5-cluster case, it is 1.82%; and for the 6-cluster case, it is 1.78%. These results indicate that the loss incurred through the application of neural networks is not significant and will decrease even further as the neural network system improves in prediction ability. In real world applications, even experienced managers are not able to choose the optimal heuristic in all instances. Thus, neural network models are good consultants when the expert is not available or the best heuristic is not obvious. 
	In reality, a project manager on site may also want to know what percentage improvement in the NPV that he/she can expect to gain by using neural network over an arbitrary selection. We thus compared the percentage improvement of neural network's choice over a random selection. In comparison with the random selection of one set of good heuristics among four clusters, neural network achieved about 67% prediction accuracy based on the first hit rate while random selection can only achieve a 25% prediction accuracy. Similarly, for 5-cluster and 6-cluster cases, neural network achieved about 57% and 45% success respectively while random selection resulted in 20% and 16.7% success respectively. In the worst case, where selection is among all the sixteen heuristics, neural network is able to achieve 22% prediction accuracy in contrast to that of a random selection rule of 6.4%. 
For projects involving cash flows in thousands of dollars, these results indicate significant gains of the neural network models over a random selection. The neural network system can thus provide useful support in making important scheduling decisions.
5. Conclusion
	In this paper, we investigate the application of Artificial Neural Networks for the selection of an appropriate heuristic or category of heuristics for the resource-constrained project scheduling problem with cash flows. Neural network methodology can be used both to extract information about the project conditions, as illustrated by the feature extraction network, as well as to provide predictions for novel cases. We conducted extensive experiments on different preprocessing schemes and encoding of inputs and outputs in the neural network. We also experimented with both single generalization network and multi-level generalization networks. 	
The results show that the greater the number of inputs the better the performance for all the models. However, the results also indicate that both principal component analysis and feature detection by neural network are able to extract enough information to enable the neural network to generalize. What makes a significant contribution to the performance of neural network is the preprocessing of outputs. For single generalization network, the fewer the number of clusters, the better the performance. For the multiple generalization network system, however, as the number of clusters increases, the performance gets better. The advantage of using single generalizing network is that it is simple and it is able to give us several heuristics to choose among a few categories. But it cannot predict which one of those heuristics in that particular category to use. In contrast, the multiple generalization network system allows us to choose a specific heuristic to use and the performance is generally good. But it is more complex and requires a classification network at the top level to filter out some information. Also, the limitation of the classification network is that there is only about 70% chance that a heuristic falls in the right generalization network. The results on multivariate regression and discriminant analysis are consistent with the neural network results. In particular, the neural network models perform no worse than multivariate regression models or discriminant analysis. 	
Successful application development of neural network is not straightforward. It involves a considerable amount of expertise in two areas. First, understanding the application domain, i.e., a good deal of familiarity with the application is required to achieve good generalization performance. Important factors such as the choice of predictive variables, preprocessing, normalization and so on can be performed successfully only with a thorough knowledge of the domain. Second, achieving good network performance also involves extensive experimentation with network design and fine-tuning of control parameters. Neural network learning procedures are inherently statistical techniques. Statistical techniques not only can offer insights into the properties of data, but also provide means to preprocess the data. 
	In conclusion, the methodologies analyzed in this study have potential to be useful in many project management applications. These methods should also be generalizable to other difficult combinatorial optimization problems such as traveling salesman, vehicle routing, and job shop scheduling, where the difficulty in solving problems to optimality has resulted in the development of a number of heuristics.
Acknowledgment
This work is supported in part by the National Science Foundation under grant #9108386.

Appendix A - Summary Statistics for Independent Variables

Variable 
Mean 
Standard Dev.
NNODE
 156.000
62.022
NARC
198.615
74.830
NDUMMY
121.610
43.947
XDUR
 6.31673
0.24369
XDENSITY
 0.612639
0.125182
COMPLEXITY
 1.28629
0.05535
CPL
 87.6153
19.4848
NSLACK
187.258
73.916
PCTSLACK
 2.44423
0.12655
XFREESLK
27.3353
9.5306
TSLACK_R
 23.7824
11.0314
XSLACK_R
 0.312661
0.089286
PDENSITY
 0.202842
0.057052
TFREESLK
 201.826
92.526
NFREESLK
12367.0
8472.8
PCTFREESLK
 146.433
67.640
XFREESLK
 2.64899
0.70235
PDENSITY_F
 0.708246
0.055237
UTIL1
 1.72605
0.68505
UTIL2
 1.80528
0.68505
UTIL3
 1.70983
0.68698
XUTIL
 1.74704
0.68469
TCON1
0.028574
0.019511
TCON2
 0.029874
0.019918
TCON3
 0.028610
0.020131
XCON_TM
 0.029019
0.019812
PROFIT_MARGIN
1.90140
0.26390
FREQ_P
 4.51781
2.43473
INT_RATE
 0.003000
0.001000
NSTR
 0.506667
0.284039

Appendix B- Cluster Analysis Model Formulation

Let = 1 if heuristic  is a cluster median and 0 otherwise.
Let = 1 if heuristic  is assigned to cluster median heuristic  and 0 otherwise. 
Minimize 
subject to:
(1) for j = 1,..., 16 
(2) for j = 1,..., 16 
(3) for i = 1,..., 16 
(4) \x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11 \x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11
(5),  are binary variables;  

The constraints include: (1) No heuristic can be assigned to another heuristic  if  is not a cluster median. (2) At least one heuristic must be assigned to heuristic  if heuristic  is a cluster median. (3) A heuristic i can be either a cluster median or it is assigned to a cluster median. (4) There must be  clusters (N is a predefined constant. In this study, N ranges from 4 to 6). (5) Non-negativity and integrity constraints. 
References
1. S. Ahmad and G. Tesauro, 1988. Scaling and Generalization in Neural Networks: a Case Study, Connectionist Models Summer School, Carnegie Mellon University.
2. S. Baroum and J.H. Patterson, 1989. A Heuristic Algorithm for Maximizing the Net Present Value of Cash Flows in Resource-Constrained Project Schedule, Working Paper, Indiana University. 
3. R. Bey, R. H. Doersch and J. H. Patterson, 1981. The Net Present Value Criterion: Its Impact on Project Scheduling, Project Management Quarterly, 12: 2, 35-45.
4. A. Brooke, D. Kendrick, and A. Meeraus, 1988. GAMS, A User's Guide, The Scientific Press, CA.
5. E. W. Davis, 1973. Project Scheduling under Resource Constraints -- Historical Review and Categorization of Procedures, AIIE Transactions, 5:4, 297-313. 
6. E.W. Davis, and J. H. Patterson, 1975. A Comparison of Heuristic and Optimum Solutions in Resource-Constrained Project Scheduling, Management Science, 21: 8, 944-955.
7. S. E. Fahlman, 1988. An Empirical Study of Learning Speed in Back-Propagation Networks, Technical Report CMU-CS-88-162, Carnegie Mellon University. 
8. M.R. Garey and D.S. Johnson, 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman and Co., NY.
9. K. Hornik, M. Stinchcombe and H. White, 1989. Multilayered Feedforward Neural Networks are Universal Approximators, Neural Networks, 2, 359-366.
10. T. J.R. Johnson, 1967. An Algorithm for the Resource-Constrained Project Scheduling Problem, Unpublished Ph.D. Thesis, Massachusetts Institute of Technology. 
11. R. A. Johnson and D.W. Wichern, 1988. Applied Multivariate Statistical Analysis, Prentice-Hall, Inc. , NJ. 
12. N. Kadaba, 1990. XROUTE: A Knowledge-based Routing System using Neural Networks and Genetic Algorithms, Unpublished Ph.D. Thesis, North Dakota State University. 
13. J.E. Kelley, 1961. Critical Path Planning and Scheduling, Mathematical Basis, Operations Research, 9, 296-320.
14. I.S. Kurtulus and E.W. Davis, 1982. Multi-Project Scheduling: Categorization of Heuristic Rules Performance, Management Science, 28:2, 161-172. 
15. T. Kohonen, 1989. Self-Organization and Associative Memory, Springer-Verlag, Berlin-Heidelberg, New York, 3rd edition. 
16. J.L. McClelland and D.E. Rumelhart(eds.), 1986. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1 and 2. MIT Press, Cambridge, MA.
17. R. Mikkulainen and M. Dyer, 1988. Encoding Input/Output Representations in Connectionist Cognitive Systems, Connectionist Models Summer School, Carnegie Mellon University. 
18. K. E. Nygard, P. Juell and N. Kadaba, 1990. Neural Networks for Selecting Vehicle Routing Heuristics, ORSA Journal on Computing, 2:4, 353-364.
19. E. Oja, 1982. J. Math. Biology 16, 267-273. 
20. R. Padman, 1993. Choosing Solvers in Decision Support Systems: A Neural Network Application in Resource-Constrained Project Scheduling, Recent Developments in Decision Support Systems, Springer-Verlag, Berlin, 559-574. 
21. R. Padman, D.E. Smith-Daniels and V. L. Smith-Daniels, 1990. Heuristic Scheduling of Resource-Constrained Projects with Cash Flows: An Optimization-Based Approach, The Heinz School of Public Policy and Management Working Paper 90-6, Carnegie Mellon University. 
22. T.L. Pascoe, 1965. An Experimental Comparison of Heuristic Methods for Allocating Resources, Unpublished Ph.D. Thesis, Cambridge University. 
23. J. H. Patterson, 1976. Project Scheduling: The Effects of Problem Structure On Heuristic Performance, Naval Research Logistics Quarterly, 23:1, 95-122.
24. C.R. Rao, 1964. The Use and Interpretation of Principal Component Analysis in Applied Research, Sankhya A 26, 329-358. 
25. A. H. Russell, 1970. Cash Flows in Networks, Management Science, 16:5, 357-373. 
26. R.A. Russell, 1986. A Comparison of Heuristics for Scheduling Projects with Cash Flows and Resource Restrictions, Management Science, 32:10, 1291-1300. 
27. A. W. Shogan, 1988. Management Science, Prentice Hall, Inc., Englewood Cliffs, NJ.
28. K. Y. Tam and M. Y. Kiang, 1992. Managerial Applications of Neural Networks: The Case of Bank Failure Predictions, Management Science, 38:7, 926-947.
29. E. Wacholder, J. Han. and R. C. Mann, 1988. A Neural Network Algorithm for the Multiple Traveling Salesman Problem, Proceedings of the IEEE Annual International Conference on Neural Networks, San Diego, 305-324. 
30. H. White, 1989. Learning in Artificial Neural Network: A Statistical Perspective, Neural Computation, 1:4, 425-464. 
31. D. Zhu and R. Padman, 1993. Heuristic Selection in Resource-Constrained Project Scheduling: Experiments with Neural Networks, The Heinz School of Public Policy and Management Working Paper 93-43, Carnegie Mellon University.