Abstract This paper investigates the application of neural networks to aid the selection of the most appropriate heuristic procedure for the resource-constrained project scheduling problem with cash flows(RCPSP). This is a difficult combinatorial optimization problem that finds wide application. Many heuristic procedures have been developed for this problem with differing performance characteristics in different problem environments. This paper deals with the complex task of choosing the most relevant heuristic or heuristic category for a given instance of the problem. We apply neural networks to induce the relationship between project parameters and heuristic performance to guide the selection under different project environments. We also compare the results of the neural network approach with those from traditional statistical procedures. An innovative feature of our approach is the integration of statistical and optimization methods with neural networks to address data preprocessing, thereby improving the performance of the neural network. We demonstrate that neural network methodology can be employed both to extract information about project conditions as well as to provide predictions for novel cases. Extensive experimentation with network topologies and learning parameters indicate that this approach has significant promise in identifying categories of heuristics that are appropriate for any instance of the problem, rather than selecting a single best heuristic. The methods investigated in this article have potential to be useful in many project management applications as well as other hard combinatorial optimization problems where the difficulty in solving problems to optimality has resulted in the development of a number of heuristics. The resource-constrained project scheduling problem (RCPSP) is concerned with the scheduling of a collection of precedence related activities subject to resource constraints. When significant levels of cash flows are present over the duration of the project in the form of expenses and payments, determining a schedule that maximizes the Net Present Value (NPV) of the cash flows is an appropriate objective [3]. This is a hard combinatorial optimization problem [8] for which many heuristic procedures have been developed to obtain good solutions [2, 6, 21, 26]. However, it is not clear which procedures produce the best schedules and highest NPV under different scheduling environments. Hence project managers have difficulties making decisions regarding the appropriate scheduling rule to use. Once a rule is selected, it will generally be used throughout the project. The selection of an inappropriate heuristic may result in large monetary losses. Little work has been done on investigating the dependence of the various heuristic rules on project parameters for the RCPSP with cash flows. A major difficulty has been the lack of success in achieving a meaningful set of summary measures which characterize the problem. Furthermore, different heuristics perform well under different conditions, i.e., no single heuristic dominates in all problem settings, and the project parameters which may contribute to the heuristic performance are not readily apparent. The only way to choose a heuristic that exhibits good performance is to apply each heuristic to the problem and see what happens [5]. These are ad hoc methods that consume significant time and effort. This paper integrates several methodologies to address the issue of selecting heuristics in order to accomplish intelligent scheduling in the project management domain. The study aims to: (1) Construct a set of summary measures that characterize relevant information about projects; (2) Design a robust neural network based procedure to support the process of heuristic selection; (3) Compare the generalization performance of neural networks with some traditional statistical procedures. Neural networks are a promising vehicle for exploring the performance of these heuristics with respect to their dependencies on the project parameters. They have been generally successful in pattern recognition and classification [30], thus providing a useful tool in the classification and selection process [12, 18, 28]. Previous studies have shown encouraging results on the application of neural networks to combinatorial optimization problems such as the traveling salesman and vehicle routing problems [12, 18, 29]. Padman [20] conducted an exploratory study for predicting project scheduling heuristic performance using neural networks with encouraging results. In this paper, we elaborate on that study with significant emphasis on problem representation, preprocessing of input and output information, as well as extensive experimentation with neural network architectures. We have designed and conducted a series of experiments with a neural network system using a collection of problems from [21]. A systematic preprocessing approach consisting of various statistical analyses, optimization techniques, and feature detection networks has been used to enhance the system performance. We experimented with a single neural network system as well as a modular system to train and test the data. We also compared the results from applying neural networks with those from multivariate regression and discriminant analysis. Empirical results show that the neural network is successful in selecting categories of appropriate heuristics for an instance of the RCPSP rather than selecting a single heuristic. The methodology is generalizable and adaptable for other difficult combinatorial optimization problems as well. The paper is organized as follows. In Section 1, we describe the resource constrained project scheduling problem. In Section 2, we present various data preprocessing techniques and data representation schemes employed in this study. Section 3 describes the experimental design and computational experience with determining the parameters of the neural network. In Section 4, we discuss the experiments and results. Section 5 summarizes the conclusions. 1. Resource-Constrained Project Scheduling Problem with Cash Flows 1.1 Problem Statement Resource-constrained project scheduling with cash flows find wide application in environments such as research and development, manufacturing, and construction. As described in [26], a mathematical formulation of the problem with exponential discounting is as follows: In the above formulation, n is the number of nodes in the network, ( = ) represents the cash flow associated with the event (node) , and is the discount rate. The variable represents the scheduled time of each event . There are activities in the project network, each with duration (). represents the resource requirements for the set of activities in progress at time (), where denotes the duration of the project. is the resource limit vector. The objective is to obtain the optimal time for scheduling each task such that the Net Present Value (NPV) of the project is maximized. Figure 1 depicts an example of a project network with durations represented on the arcs and cash inflows (positive number) and outflows (negative number) at the nodes. The resource requirement for each activity is represented on the arc, where each activity requires three types of resources. Given a resource limit vector of [12, 12, 12], and assuming a discount rate of 10%, the maximum NPV for the above example is $547 with a scheduled time for each event as given by at the event nodes in the figure. As the project size increases, determining this optimal schedule becomes impractical. Hence considerable effort has been devoted to the development of efficient heuristic procedures to solve the RCPSP with cash flows [2, 6, 21, 26]. FIGURE 1. Example Project Network Recognizing the need to compare these heuristic procedures, previous research has utilized regression models to infer the relationship between heuristic performance and project parameters. For example, for the RCPSP with minimum duration objective, multiple regression models provide general guidance for heuristic selection [5, 23]. Davis [5] employed multiple regression models to predict the percentage increase in critical path duration for a particular sequencing rule (minimum late finish time heuristic) in RCPSP. The results demonstrate that a statistical relationship exists between project network and resource parameters and the heuristic rule selected for the problem. Patterson [23] made an extensive study of project scheduling summary measures and investigated eight heuristic rules along with three objective functions, also using multiple regression models to predict heuristic performance. While both studies demonstrate the possibility of predicting relative heuristic performance based on the project summary measures, they did not take into account cash flow features. Russell [26] used simple linear regression analysis to test and compare six heuristic scheduling rules for RCPSP with cash flows. Based on analysis of 80 problems, he concluded that level of resource constrainedness has a significant impact on heuristic performance. In this paper, we examine all the parameters that have been identified in the literature for the resource-constrained project scheduling problem. To analyze the effect of these problem parameters on heuristic performance, we utilize sixteen heuristic rules from Padman [21] that were developed for the RCPSP with cash flows. In the following, we describe these parameters and heuristic rules. 1.2 Project Descriptors In order to classify/predict scheduling rules, it is necessary to identify project summary measures as predictors that will characterize the problem. Pascoe [22] first introduced the concept of summary measure for categorization of heuristics which was later utilized by Johnson[10], Davis [5] and Patterson [23]. Overall, we identified more than 40 project scheduling characteristics from the literature which may be relevant to heuristic performance. We eliminated several of those variables due to multi-collinearity effects, finally selecting a set of 30 variables. We thus assume that project networks can be characterized by a vector of these 30 variables which measure the complexity of a given project scheduling problem along several dimensions. These summary measures are used in subsequent neural network training. As shown in Table I, they consist of categories such as project size parameters(size), parameters based on critical path analysis (cpl-based), resource based parameters, cash flow parameters, and project network shape. 1.3 Heuristic Rules Given the number of heuristics developed in the literature to address the scheduling of resource-constrained projects with cash flows, in this paper, we focus on the study by Padman et al. [21] who developed nine optimization guided heuristic rules designed to exploit different aspects of the project scheduling problem. These rules utilized information from a relaxed optimization model obtained by ignoring the resource constraints in the RCPSP [25]. These priority dispatching scheduling rules were embedded in a single-pass greedy algorithm and applied to a set of randomly generated problems describing 144 different project environments. Their performance was compared with seven other heuristics from past literature. A brief description of the rules is provided in Table II. It may be observed that these sixteen heuristic rules are proposed based on information such as opportunity cost, dual price, cash flow weight, and so on, and can thus be categorized into six distinct groups. Based on the ordering of the heuristics in Table II, heuristics 1, 2, 3, 10, 12, 13, 14, 15 and 16 are the nine optimization-based heuristics proposed in [21]. Within this large group, heuristics 1, 2 and 3 are rules based on opportunity cost; heuristics 10, 12 and 15 are rules based on dual prices while heuristics 13, 14 and 16 are rules based on cash flow weight. Duration-based rules include heuristics 4 and 5, which are a set of heuristics that employ information derived from Critical Path Method (CPM) [13]. Heuristic 11 is a cash flow weight rule proposed by Baroum et al. [2] that does not use optimization based methods. Heuristics 6, 7 and 8 are target scheduling rules proposed by Russell [26]. Heuristic 9 is a random heuristic. This information on categorization of heuristics is useful for output preprocessing, which is discussed in Section 2. TABLE I. Project Characteristics Category Variable Description S I Z E NNODE Number of nodes to be scheduled NARC Number of arcs (activities) to be scheduled NDUMMY Number of dummy activities XDUR Average activity duration XDENSITY Average activity density COMPLEXITY Project complexity C P L | B A S E D CPL Critical path length NSLACK Total slack of all activities PCTSLACK Percent of activities possessing positive total slack XFREESLK Average total slack per activity TSLACK_R Total slack ratio XSLACK_R Average slack ratio PDENSITY Project density - Total TFREESLK Free slack of all activities (Johnson) NFREESLK Number of activities possessing positive free slack PCTFREESLK Percent of activities possessing positive free slack XFREESLK Average free slack per activity PDENSITY_F project density - free(Pascoe) R E S O U R C E UTIL1 Utilization of resource 1 (R1) UTIL2 Utilization of resource 2 (R2) UTIL3 Utilization of resource 3 (R3) XUTIL Average resource utilization TCON1 Resource constrainedness over time for R1 TCON2 Resource constrainedness over time for R2 TCON3 Resource constrainedness over time for R3 XCON_TM Average resource constrainedness over time CASH FLOW- BASED PROFIT_MARGIN Profit margin FREQ_P Frequency of payment INT_RATE Interest rate SHAPE NSTR Network structure TABLE II. Heuristic Priority Rules and Descriptions Category Heuristics Descriptions OPPORTUNITY COST 1. OCS/LAN Activities are selected for scheduling in ascending order of opportunity cost of scheduling. Activities with zero tardiness penalties are scheduled according to lowest activity number (LAN). 2. OCR/LAN Activities are scheduled in ascending order of the opportunity cost of resources, with zero tardiness penalty activities scheduled according to LAN. 3. NOC/LAN Activities are scheduled in increasing order of net opportunity cost which balances the cost of delaying some activities against scheduling others. Zero tardiness penalty activities are scheduled according to LAN. DURATION- BASED 4. LFT/LAN Priority is given to the activities with the minimum latest finish time as determined by critical path analysis. LAN is used to break ties between the activities. 5. MS/LAN Select activity with the smallest slack, where the critical path is continuously revised as each activity is scheduled. LAN is used to break ties. TARGET SCHEDULING 6. TS/LAN Select activity with the maximum difference between current early finish time and optimal finish time as given by the relaxation model. LAN is used to break ties. 7. DUAL/TS Activities with the highest dual price are selected for scheduling. Ties are broken using TS rule. 8. TS/DUAL Select activities with maximum difference between current early finish time and optimal finish time as given by the relaxation model. Maximum dual price is used to break ties. RANDOM 9. RAND-50 Random selection of eligible activities is chosen for scheduling, with the best of 50 replications being reported. DUAL PRICE 10. MTP/LAN Activities are selected from the schedule queue in descending order of their tardiness penalties. Activities with zero tardiness penalties are scheduled according to LAN. 12. LTP/LAN Activities are scheduled in ascending order of tardiness penalties. Activities with zero tardiness penalties are scheduled according to LAN. 15. MTP/ET Same as MTP/LAN, except that activities with zero tardiness penalties are scheduled according to early time rule. CASH FLOW WEIGHT 11. CFW/LAN Select activity with highest cash flow weight for scheduling. LAN is used to break ties. 13. CFW-CFW Use highest cash flow weight activities for scheduling. Activities with zero tardiness penalties are then scheduled using highest cash flow weights. 14. CFW-OCC Use cash flow weights for the scheduling from the first queue, and opportunity cost of cash flows for scheduling from the second queue. 16. TSCFW Schedule activities based on highest cash flow weight. 1.4 Experimental Data The data set consists of 1,440 randomly generated project networks with resource constraints and cash flows. These 1,440 problems were generated from six experimental parameters including project size(size), project network structure(shape), resource constrainedness (AUF), interest rate (CC), profit margin (PM), and frequency of progress payments (FP). These six parameters form a significant subset of the project descriptors presented in Table I. All six parameters were each varied over several levels in constructing the project networks. Project size is characterized by the number of activities involved, i.e., all small size project networks have 48 activities while the medium size problems have 110 activities. There are three types of resources associated with each activity. The degree of resource constrainedness is characterized by AUF (average utilization factor) [14]. It varies from a low level of 1.0 to a medium level of 1.5 and a high level of 2.0. Network shape varies over three levels: balanced, skewed to the right, and skewed to the left. There are three cash flow parameters; the interest rate which varies over two levels from a low of 10% to a high of 20% annually; frequency of progress payments which is also set at two levels, where a low level indicates that on average every seventh activity yields a positive progress payment upon its completion while a high level indicates that on average every third activity yields a positive progress payment; profit margin reflected by the final payment for the completed project which is set at two levels, where the low level is approximately equal to the sum of the cash outflows for the project plus 30% and a high level equal to the sum of the outflows plus 50%. A full factorial experiment was conducted which resulted in 144 different scheduling environments leading to 1,440 problems with 10 replicates in each environment. Figure 2 depicts a data flow diagram of the problem that links the data, problem characteristics, and heuristics in the following manner: A set of resource-constrained project scheduling problems with cash flows is generated based on six parameters using a data generator. For each problem, a summary measure consisting of 30 variables is derived from the project network specifications. This summary measure then serves as a vector of teaching input. For each of the 1,440 problems, sixteen heuristic procedures are applied and sixteen corresponding NPVs are generated. The NPVs serve as the vector of teaching output. Before the teaching inputs and outputs are fed into neural network for training and testing, a preprocessing module which contains various preprocessing techniques is used to extract and refine relevant information about the problem and heuristics. FIGURE 2. Data Flow Diagram 2. Data Preprocessing and Data Representation After the data is collected, many techniques can be employed to analyze the data. Dumping the whole data set into a neural network without careful inspection and preprocessing will not work for nontrivial tasks. The effectiveness of neural network performance depends largely on good data preparation [17]. An innovative feature of this study is the investigation of diverse input and output preprocessing strategies and data representation issues that can enhance neural network performance. This section provides a description of these data preprocessing strategies used to preprocess the inputs and outputs. We also discuss the representation issue, i.e., the encoding of the data fed into the neural network. 2.1 Data Statistics As discussed in Section 1.2, we derived about 30 independent variables to describe the project which may contribute to heuristic performance. We call this set of input data N30. The table in Appendix A presents some summary statistics for the 1,440 cases of the 30 variables. 2.2 Input Preprocessing It is desirable to reduce the input dimensions even further from 30 variables so that neural networks with few nodes and connection weights would be sufficient to capture the mappings between inputs and outputs. This helps to eliminate disturbing noise or redundant information and also facilitates neural network training and testing [1]. Two different approaches were taken to reduce input dimensionality. We employed the statistical technique of principal component analysis (PCA) and a neural network based self-organizing feature extraction network to condense input information. 2.2.1 Principal Component Analysis (PCA) Principal component analysis (PCA) is a multivariate statistical procedure which is concerned with explaining the observed variance-covariance structure, where each principal component is a linear combination of the original variables [24]. One of the main purposes of PCA is to summarize the data for dimensionality reduction while preserving most of the original information. PCA was performed on the 30 variables with normalized values. The normalization procedure will be further discussed in Section 2.4. The results show that the first six principal components collectively explain 86% of the total sample variance and provide a good summary of the 30 variables. Consequently, a reduction in the data from 1,440 observations on 30 variables to 1,440 observations on six principal components is reasonable. We call this set of reduced input data (with 6 input variables) PCA6. 2.2.2 Feature Extraction Network (FEN) Neural networks have self-organizing ability that can be used to extract critical features of a problem and thus reduce input dimensionality [15]. In [19], Oja showed that a simplified neuron model can serve as a principal component analyzer. A fan-in-fan-out 3 layer symmetric network was developed for feature extraction [16, 18]. Figure 3 depicts the architecture of the feature extraction network. FIGURE 3. Feature Extraction Network In this network, the same 30 inputs also serve as teaching outputs. The six hidden nodes in the middle layer serve as feature extractors and the condensed information is stored in this layer of the network. The activation values of the six hidden units are continuous values in the range (-0.5, 0.5). The number of hidden nodes in the middle layer was derived in part through experimentation. It was also motivated by the fact that six parameters (SIZE, SHAPE, AUF, PM, FP and CC) were used to randomly generate the project networks. We used back-propagation to train the neural network until the average training error decreased to less than 0.01. The six units in the hidden layer act as inputs (FEN6) to the generalization network which will be discussed in Section 3.2. 2.3 Output Preprocessing The net present values (NPV) from all sixteen heuristics were used to provide the teaching output. One significant observation about the output data was that some sets of heuristics behaved quite similarly and their performances were very close to each other. Thus there were not distinct differences among the sixteen NPVs. This can be traced back to the original design of the heuristics. As indicated in Section 1.3, some of heuristics were proposed based on the same criterion and they share some common properties. This similarity in the heuristics could prevent the neural network from learning to differentiate among the sixteen heuristics. However, with an appropriate categorization of heuristics that groups the heuristics based on their similarities in performance, the differences among the categories should be more significant, thus enabling the neural network to learn better. To facilitate learning, we employed two methods. The first is based on using expert knowledge to categorize the heuristic procedures. The second derives clusters by formulating and solving an integer programming model for cluster analysis. 2.3.1 Heuristic Categorization by Expert As discussed previously, the sixteen heuristics can be grouped into six categories such that the heuristics within a category share some common properties. We call this clustering the categorization of heuristics based on expert knowledge. TABLE III. Expert Grouping of Heuristics Category Heuristics 1 1, 2, 3 2 10, 12,15 3 11, 13, 14, 16 4 4, 5 5 6, 7, 8 6 9 Category 1 is a set of opportunity cost rules which consists of heuristics 1, 2, and 3. Category 2 is based on dual price and include heuristics 10, 12, and 15. Category 3 consists of cash flow weight rules that include heuristics 11, 13, 14, and 16. Duration based rules that include heuristics 4 and 5 make up category 4. Category 5 is a set of target scheduling rules that include heuristics 6, 7 and 8. The last category consists of a random heuristic, rule 9. 2.3.2 Cluster Analysis Cluster analysis using an integer programming model was applied on the distance matrix of the sixteen NPVs. First, a 16 x 16 correlation matrix was computed from 1,440 cases on each of the 16 NPVs. The distance matrix was obtained from the correlation matrix as 1 - correlation. Having computed the distance between all pairs of heuristics, we partitioned the 16 heuristics into N (N = 4, 5, 6) clusters. Each cluster is represented by a median which serves as the central point for its cluster so that the distance between the heuristics within each cluster is minimized. Our objective is to find the optimal selection of cluster medians and assignments of the remaining heuristics to the cluster median so that heuristics in a cluster are most similar. The integer programming (IP) model for this problem [27] is shown in Appendix B. In selecting the cluster medians and assigning the remaining heuristics to the cluster median, the sum of the total distances is minimized. The model was implemented in GAMS [4] and run on a VAX 11/780. When N=4, all sixteen heuristics are grouped into four clusters. The results are shown in the first figure in Table IV. Cluster 1 consists of heuristics 1, 2, 3 and 10. It is interesting to note that heuristic 10 (MTP/LAN) is closer to the opportunity cost rules in NPV performance than to the dual price rules. The second cluster consists of duration-based rules. The third cluster is dominated by the target scheduling rules plus the random heuristic. The last category consists of both dual price rules and cash flow weight rules. When N=5, all sixteen heuristics are grouped into the five clusters as shown in the second figure in Table IV. Categories 1, 2, and 5 are the same as categories 1, 2 and 4 in the 4-cluster case, while the third category in the 4-cluster result is divided into two sub-categories. This may be indicative of the dissimilarity among the heuristics in category 3 compared to other categories when forced to separate into five clusters from four. When N=6, all sixteen heuristics are grouped into six clusters. As we can see from the last figure in Table IV, the fifth category is further divided into two sub-categories: one for dual price rules and the other for cash flow weight rules, further illustrating the order and level of dissimilarity among the heuristics in each category. TABLE IV. Heuristic Categorization by Cluster Analysis (4, 5 and 6 clusters) Group Median Rule Group Median Rule Group Median Rule 1 1 1, 2, 3,10 1 1 1,2,3,10 1 1 1, 2, 3, 10 2 4 4, 5 2 4 4, 5 2 4 4, 5 3 9 6,7, 8, 9, 11 3 7 7, 11 3 6 6, 8, 9 4 8 6, 8, 9 4 11 7, 11 4 13 12, 13 14, 15, 16 5 13 12,13,14, 15,16 5 12 12, 15 6 13 13, 14, 16 An interesting observation is that the cluster analysis results are consistent with the heuristic categorization by expert. By comparing the results from 6-cluster analysis and expert categorization, we can see that the two categorizations are similar except for the following differences: heuristic 10 which is a dual price rule is clustered with the opportunity cost rules (category 1) whereas it belongs to the category 2 in the expert categorization; the random heuristic (heuristic 9) is clustered with the target scheduling rules whereas it forms a separate category in the expert categorization; and one target scheduling rule (heuristic 7) is clustered with a cash flow weight rule (heuristic 11) whereas heuristic 11 goes with all other cash flow weight rules and heuristic 7 with other target scheduling rules in the expert categorization. After categorizing the heuristics, there should be exactly one value chosen from each group as a candidate value for that group. We considered two approaches here. In the first approach, we took the mean NPV of all the heuristics in each cluster; the second approach considered the NPV of the median from each cluster as the candidate value. The relative merits of each approach are discussed in Section 4. 2.4 Data Representation The coding scheme and variable scaling employed in constructing an appropriate network are also critical to the success of neural network training. We investigated various strategies for variable encoding and scaling. We examined the effects of different representation schemes on the performance of the neural network by evaluating both continuous and discrete representation of inputs. The 30 summary measures are all continuous-valued and they were normalized in the range between -0.5 and 0.5 for fast convergence [17]. The six inputs from PCA and feature extraction network are also continuous-valued. As indicated in Section 1.4, all 1,440 problems in the data set were generated based on six parameters set at two or three levels: interest rate, profit margin, frequency of payment, resource utilization, network topology and problem size. To test the hypothesis that there exists a functional mapping between these six parameters and the performance of heuristics (as measured by project NPV), we used these six variables as inputs to the neural network using a discrete representation for the different levels. For example, there are three levels for the parameter that measures resource constrainedness. We use -0.5, 0 and +0.5 to represent the low, middle and high levels respectively. [31] describes this representation in greater detail. With regard to the output representation, a continuous-valued vector was taken as the teaching output. For each instance of the problem, each of the 16 heuristics was run to obtain the NPVs [21]. These NPVs were normalized with respect to the best and the worst value. The best performing heuristic was given a value of 1.0 and the worst 0.0. The remaining NPVs ranged between these limits with the value indicating the distance from the best performing one. Figure 4 gives a general overview of the data preprocessing and representation scheme. In summary, four types of input were fed into the neural network. The primary inputs are the problem characteristics described by the derived summary measures. They were represented by two encoding schemes, i.e. a continuous representation scheme (with thirty variables) and a discrete representation scheme (with six variables). Two different input preprocessing techniques, PCA and FEN were independently applied to reduce the input dimensionality from thirty to six. The resulting input used a continuous-valued representation. The output to the neural network was represented by a vector of normalized NPVs. Two output preprocessing techniques were applied to the outputs which reduced the output dimensionality from sixteen to four, five or six. In the next section, we describe the design of our experiments resulting from the different representation schemes and the different types of preprocessing. Results from training and testing the networks are analyzed and discussed in Section 4. FIGURE 4. Data Preprocessing and Representation Scheme 3. Experimental Design The back-propagation learning paradigm along with its variant called quick-prop was employed in this problem [7]. We conducted extensive experimental analysis with network topology and learning parameters to derive the best results. 3.1 Learning Parameters Although many neural network applications have been shown to be successful in the literature, the conditions under which good performance can be obtained are not yet clear. We started with experiments on a benchmark network 30-6-16 (the neural network model with 30 input nodes, 6 hidden nodes and 16 output nodes) to investigate the effect of learning parameters (learning rate, momentum, initial weight range) on neural network performance. It is impractical to conduct an exhaustive experimentation of all possible combinations due to the wide ranges of all the parameter settings. Therefore, a series of exploratory experiments were first conducted to find the most promising range. Subsequently, extensive experimentation was conducted within these ranges. Learning Rate : Learning rate is an important factor in controlling the gradient descent. To find an optimal learning rate is a very difficult task. A small learning rate affects the convergence speed and the training may take much longer time than with a higher learning rate. A large learning rate may help the network escape from local minima, but the neural network may go through large oscillations and may never converge. In this study, learning rates were explored from 0.001 to 0.25 with 0.001, 0.01, 0.05, 0.1, 0.15, 0.25 as the various values of . Momentum : Another parameter to control convergence speed is called momentum. During the calculation of the weight-change value, a fraction of the previous change (i.e. momentum) is added. The main idea is to keep the weight changes going faster in the same direction. Momentum is usually set in the range between 0.1 and 0.9. Experience indicates that 0.9 is a reasonable choice [7]. In addition, we chose three other levels of 0.0, 0.5, 0.7 for the experiments. Initial Weight Range: Weights are generally initialized to small random values, as in the range between (-0.1, 0.1). In this study, we tested the initial weight range from (-0.5, 0.5), (-0.7, 0.7) and (-1.0, 1.0). Experiments indicate that =0.01, = 0.9, and =0.7 generally gave better results. 3.2 Single Generalization Network (SGN) Once we found a set of reasonably good learning parameters, we experimented with different network configurations. While learning parameters impact the convergence of the network, one of the factors that contributes to the generalization capability of the neural network is configuration [16]. A network configuration is characterized by the arrangement of processing units and weight connections. In Section 2, we discussed the representation of inputs and outputs which determine the size of the input and output layers of the neural network. Unfortunately, no prior estimate can be made to derive the optimal internal representation, i.e., the features of the hidden layer, for a given task. Previous studies [9] have shown that three layers are sufficient for generalization. Adding more hidden layers may only affect the convergence of the network. The single generalization network architecture thus included the input and output layers and a single hidden layer. We designed a total of 32 experiments which varied over the number of inputs, number of outputs (resulting from input and output preprocessing methods) and number of hidden nodes. FIGURE 5. Combinations of Inputs and Outputs As illustrated in Figure 5, the four types of input representations included the continuous representation derived from 30 summary measures (N30), PCA reduction (PCA6), the reduction (FEN6) and the binary representation (BIN6). There are eight output representations: all sixteen heuristics (O16); 4, 5, and 6 clusters with mean NPV as candidate value (I4A, I5A and I6A); similarly, 4, 5, and 6 clusters with median NPV as the candidate value (I4M, I5M, and I6M); six-group categorization by expert (EXP). For each network, the number of hidden nodes was derived empirically to get the best generalization results. The use of FEN6 as input resulted in a special combined network system which consists of a feature extraction network and a generalization network as shown in Figure 6. In Section 4.2, we report and analyze our experimental results on the single generalization networks. FIGURE 6. Combined Feature Extraction and Generalization Networks 3.3 Multiple Generalization Network (MGN) We also investigated a multi-level neural network system motivated by the success of modular neural network systems in the literature [12, 18]. In contrast to single generalization network that is good at selecting a cluster of heuristics, this multi-level approach enables the selection of a single best heuristic. There are two steps involved in this approach. The first stage involves choosing the right cluster, and the second stage involves the selection of a specific heuristic from a particular cluster. Figure 7 illustrates the multi-level generalization network system. The top level is the cluster identification network and the bottom level consists of multiple generalization networks that select the most appropriate heuristic for the given instance of the problem. · Cluster Identification Network This network consists of 30 inputs, twelve hidden nodes (derived through experimentation), and 4 (also 5 and 6) outputs. The four output units, for example, correspond to the four clusters of the sixteen heuristics obtained from cluster analysis, and are used to establish the four most distinct classes. The neural network is trained to classify each observation into one of the pre-selected groups and specifically select exactly one of the generalization networks to be used at the next level. The output representation for the cluster identification network is a vector of four values in binary representation. For example, if an instance of the project scheduling problem falls in the first cluster/category (i.e, choose heuristic 1, 2, 3 or 10), then the desired output vector is of the form: (1, 0, 0, 0). After training, the network is tested on the testing set, and the largest of the four output activations is used to select a generalization network at the next level. For example, if the final output vector of an instance is (0.9687, 0.0003, 0.0096, 0.3607), and the desired output vector for this instance is (1, 0, 0, 0), then generalization network 1 should be chosen. Note that only part of the testing samples is classified correctly. Thus, only those correctly classified testing patterns will proceed to the next level of the network. FIGURE 7. Multiple Generalization Network System (4-cluster) · Generalization Network The bottom level of the multi-level network system consists of multiple generalization networks. As shown in Figure 7, there are four generalization networks corresponding to the four clusters of heuristics. Each generalization network consists of the original 30 summary measures as the input units, a hidden layer with the number of nodes derived through experimentation, and the number of output nodes corresponding to the number of heuristics in each cluster. The output vector of each generalization network serves as the heuristic recommendation vector. That is, generalization Network I has four outputs representing heuristics 1, 2, 3 and 10. Generalization Network II has two outputs for heuristics 4 and 5. Generalization Network III has five outputs to include heuristics 6, 7, 8, 9 and 11. Generalization Network IV has five outputs representing heuristics 12, 13, 14, 15 and 16. Each of these four generalization networks was trained and tested using relevant problems from the set of 1,440 cases. We also experimented with the 6 categories provided by cluster analysis and expert categorization. The results of these experiments are presented in Section 4.3. 4. Experimental Results and Discussion 4.1 Measures of Network Performance The degree of convergence (measured by the average training and testing errors) and the ability to generalize are commonly used measures of neural network performance. We report the experimental results based on prediction accuracy on both the training and the testing data. The prediction accuracy is calculated by comparing the desired output with the highest value and the corresponding actual output with the highest value. If the index of the highest actual output for an instance is equal to the index of the highest desired output, that is, the best heuristic recommended by the neural network is the same as the prediction given by the real data, it is considered a first hit. Otherwise, if the best heuristic recommended by the neural network falls in the second best category of the real data, it is a second-hit and so on. Prediction accuracy is thus the percentage of first hits in the training and testing data sets. All of the neural networks were trained on 1015 training samples randomly selected from the 1,440 problem instances described in Section 1. The resulting network was tested on the remaining 425 testing samples. Neural network performance was evaluated based on prediction accuracy as well as the sum of the first-hit, second-hit and third-hit percentages on the testing data. To minimize the error from overtraining, where the network memorizes the training results, training and testing were performed iteratively throughout the experiments. Testing was performed every 10 epochs so that the generalization could be observed during the training. Meanwhile, we were able to observe how the accuracy of neural network improved with increasing number of epochs. It was observed that nearly all the networks converged in about 200 to 300 epochs. The average training and testing error was between 0.04 to 0.09. After about 200 or 300 epochs, with decreasing improvement in error, the training was terminated. All of the training and testing were performed on a DEC 5000 using a revised version of quick-prop simulator (in C) [7]. 4.2 Results and Discussion on Single Generalization Networks A total of 32 experiments were conducted with single generalization networks. The neural network we employed was a multi-layered fully-connected feed forward network. Since there was no prior knowledge to decide any specific non-connections between nodes, all the nodes were fully connected with the nodes in their immediate preceding and succeeding layer, but not connected with the nodes across layers. Experiments were conducted on different configurations of networks. The input layer contained one of six or 30 nodes, the hidden layer around three to 18 nodes, and the output layer was made up of one of four, five, six or 16 nodes. In each case, we measured the prediction accuracy. The detailed results of these experiments are presented in the tables shown below. The network topology is given in the left most column. TRR and TSS represent the prediction accuracies for the training and testing data respectively. The shaded areas highlight the generalization performance on the testing set. TABLE V. Single Generalization Network with 16 Outputs (O16) Topology 1st-hit 2nd-hit 3rd-hit best three TRR TSS TRR TSS TRR TSS TRR TSS N30-12-16 26 22 18 19 14 15 58 56 PCA6-4-16 21 17 13 16 12 12 46 45 FEN6-4-16 19 18 16 13 10 11 45 42 BIN6-4-16 23 21 14 15 10 10 47 46 In general, when the selection process includes all sixteen heuristics as shown in Table V, the network with 30 mixed-continuous inputs, 16 normalized NPVs as outputs and a hidden layer with 12 nodes performed relatively better, selecting one of the best three heuristics with about 58% success on the training data and 56% success on the testing data. For the same output structure, the network with 6 inputs obtained from PCA and 4 hidden nodes selected one of the best three heuristics with 45% success. Similarly, the network with 6 inputs obtained from feature extraction network was successful in only 42% of the problems. When a binary representation of the 6 inputs was used, the network had 46% success. When the selection process is restricted to categories of heuristics as shown in Table VI through Table IX, the categories being derived through cluster analysis or expert opinion, the results are significantly different. The networks with four and five clusters in the output perform marginally better than the network with six clusters irrespective of whether mean or median value of the heuristics in each cluster is chosen as the candidate value for output. The prediction accuracy for choosing one of the best three categories ranges from 95% to 98% for four and five clusters and 85% to 92% for six clusters when mean value is chosen and the results are worse by about 3% - 5% when median value is chosen. In comparing the results from cluster analysis and expert categorization on a six output network, it can be concluded that the expert grouping produces marginally better prediction accuracies. TABLE VI. Single Generalization Network with 4 Clusters (Mean Approach) Topology 1st-hit 2nd-hit 3rd-hit best three TRR TSS TRR TSS TRR TSS TRR TSS N30-12-i4a 67 67 25 26 7 5 99 98 PCA6-4-i4a 62 67 27 23 8 7 97 97 FEN6-4-i4a 63 66 26 23 8 7 97 96 BIN6-4-i4a 61 66 27 22 8 7 96 95 TABLE VII. Single Generalization Network with 5 clusters (Mean Approach) Topology 1st-hit 2nd-hit 3rd-hit best three TRR TSS TRR TSS TRR TSS TRR TSS N30-12-i5a 53 57 28 29 16 11 97 97 PCA6-4-i5a 49 52 28 28 18 15 95 95 FEN6-4-i5a 50 52 29 28 16 14 95 94 BIN6-4-i5a 52 54 29 29 14 12 95 95 TABLE VIII. Single Generalization Network with 6 clusters (Mean Approach) Topology 1st-hit 2nd-hit 3rd-hit best three TRR TSS TRR TSS TRR TSS TRR TSS N30-12-i6a 46 45 29 29 17 17 92 91 PCA6-4-i6a 34 36 30 29 19 22 83 87 FEN6-4-i6a 38 40 29 28 18 17 85 85 BIN6-4-i6a 37 40 32 31 18 16 87 87 TABLE IX. Single Generalization Network with Expert Categorization (EXP) Topology 1st-hit 2nd-hit 3rd-hit best three TRR TSS TRR TSS TRR TSS TRR TSS N30-12-6 49 48 27 31 15 12 91 91 PCA6-4-6 44 48 28 26 16 11 88 85 FEN6-4-6 43 46 29 30 15 11 87 87 BIN6-4-6 42 45 27 27 20 15 89 87 In general, it can be observed that the network with all 30 inputs performed slightly better than the networks resulting from other input preprocessing methods. One possible explanation for this is that the N30 summary measures retain the most information. The results from the remaining three preprocessing methods were very close. Their ability to accurately classify novel instances appears to be quite comparable. The results are fairly consistent across the experiments. This indicates that feature extraction by neural network and information condensation by PCA are both reasonable procedures for dimensionality reduction. The overall neural network performance is not affected much by input preprocessing. Output preprocessing has a more significant impact on neural network performance. For example, the network with 30 inputs, twelve hidden nodes and sixteen outputs was able to correctly predict one of the best three heuristics with a prediction accuracy of 56%. With preprocessing on the 16 outputs, when all 16 outputs were clustered into four groups, the network with the same 30 inputs, four hidden nodes and four categorical outputs was able to select one of the best three heuristic categories with 98% success, and with first-hit rate of about 67%. This results in about 40% increase in prediction accuracy. For output preprocessing, the mean method is generally better than the median method. However, it should be emphasized that the improvement in prediction accuracy is achieved for selecting categories of heuristics, not for selecting a single best heuristic. 4.3 Results and Discussion on Multiple Generalization Networks As detailed earlier, MGN involves two levels of neural network training. The top level is the cluster identification network. Experiments were conducted on this network with four and six clusters generated from cluster analysis and the six categories generated by the expert. In the case of the 4-cluster network, for example, training continued until total error was less than 0.09. The classification accuracy was 73% for the training set, and 68% for the testing set. The detailed results of the experiments on multiple generalization networks are shown in Table X through Table XII. At the bottom level, with four generalization networks, for example, the networks FCG2 and FCG3 (Four Cluster Generalization Network 2 and 3) show better performance in terms of prediction accuracy. The performance of FCG2 is the best since there are only two heuristics to select in that network. FCG3 also performs relatively well. This is probably due to the relative ease of differentiating amongst the heuristics in that network which is made up of some target scheduling heuristics, a random heuristic and a cash-flow weight heuristic. The performance of FCG4 (Four Cluster Generalization Network 4) is very poor because the heuristics in that category are extremely close to each other in NPV. The performance of the six generalization network system (Table XI) is quite comparable to that based on expert categorization (Table XII). In comparison with SCG1 (Six Cluster Generalization Network 1), the performance of EG1 (Expert Generalization Network 1) is better on the training set for the first hit, but not on the testing set. The overall performance is slightly better. Notice that there are only 3 heuristics in EG1 while there are four in SCG1. The performance of the six generalization networks (either by cluster analysis or by expert categorization) is significantly better than that of the four generalization networks. As the heuristics get spread out over a larger number of networks, the number of heuristics in each cluster/network becomes smaller, thus enabling the neural network to learn to differentiate better. This result is in contrast to that from single generalization network where the performance improves as the number of clusters decreases. TABLE X. Results on Four Generalization Networks Net Configuration # Tr/ # Ts 1st Hit 2nd Hit 3rd Hit Best Three TRR TSS TRR TSS TRR TSS TRR TSS FCG1 30-12-4 169/52 60 4 20 83 19 8 99 95 FCG2 30-12-2 25/9 100 89 0 11 / / 100 100 FCG3 30-12-5 39/11 85 55 13 18 2 9 100 82 FCG4 30-12-5 500/217 42 37 13 13 14 19 69 69 TABLE XI. Results on Six Generalization Networks Net Configuration # Tr/ # Ts 1st Hit 2nd Hit 3rd Hit Best Three TRR TSS TRR TSS TRR TSS TRR TSS SCG1 30-12-4 94/24 63 33 23 46 14 8 100 87 SCG2 30-12-3 26/1 88 100 12 0 / / 100 100 SCG3 30-12-2 109/29 100 52 0 48 / / 100 100 SCG4 30-12-2 21/8 100 88 0 12 / / 100 100 SCG5 30-12-2 251/100 100 70 0 30 / / 100 100 SCG6 30-12-3 199/39 70 36 29 36 1 28 100 100 TABLE XII. Results on Expert Categorization with Six Generalization Networks Net Configuration # Tr/ # Ts 1st Hit 2nd Hit 3rd Hit Best Three TRR TSS TRR TSS TRR TSS TRR TSS EG1 30-12-3 119/41 100 24 0 39 0 37 100 100 EG2 30-12-2 26/3 100 100 / / / / 100 100 EG3 30-12-3 19/5 100 80 0 20 / / 100 100 EG4 30-12-3 127/47 81 55 16 36 3 9 100 100 EG5 30-12-4 208/75 48 31 13 12 18 36 79 79 EG6 30-12-1 123/40 100 100 / / / / 100 100 4.4 Multivariate Regression (MR) and Discriminant Analysis (DA) For comparison, we ran multivariate regression (MR) and discriminant analysis (DA) models [11] on the training data and tested the models on the test data. The multivariate regression model captures the relationship between multiple dependent variables and a set of independent variables. Let represent the NPVs of each of the 16 heuristics (or the mean of each cluster of heuristics). represent the project characteristics. Then the multivariate regression model can be specified as follows, where () represents the error term: The results of multivariate regression are presented in Table XIII. The left most column gives the names of the models. There are five models with 30 independent variables and the dependent variables range from 4 to 16. MR(O16), for example, is the regression model with 16 dependent variables. The prediction accuracies on both the training set and testing set (shaded area) are given in the table. As we can see from the table, the model with the smallest number of dependent variables, MR(I4A), performed the best while the model with the largest number of dependent variables, MR(O16), performed the worst. The regression model based on expert categorization MR (exp) did marginally better than that with six clusters MR(I6A). TABLE XIII. Results on Multivariate Regression Model 1st-hit 2nd-hit 3rd-hit best three TRR TSS TRR TSS TRR TSS TRR TSS MR (O16) 26 22 17 17 15 12 58 49 MR (i4a) 64 64 26 26 8 6 98 96 MR (i5a) 55 56 27 26 14 14 96 96 MR (i6a) 44 46 28 28 16 14 88 88 MR (exp) 49 52 25 26 17 12 91 90 Another multivariate technique we employed was discriminant analysis (DA). DA is concerned with the separation of a set of observations and with classification of new observations into previously defined groups. Five linear discriminant models were constructed and tested. The results of discriminant analysis are presented in Table VI where only the first hit rate is reported. Once again, it can be seen that as the number of classes increases, the classification accuracy decreases. TABLE XIV. Results on Discriminant Analysis Model Classification Rate TRR TSS DA (16) 25 16 DA (i4a) 54 52 DA (i5a) 47 45 DA (i6a) 44 40 DA (exp) 49 47 4.5 Discussion The results in Tables XIII and XIV were compared with those from single generalization networks shown in the first rows of Tables V to Table IX in Section 4.2. Interestingly enough, the first hit rate for MR(16) model is exactly the same as that of N30-12-16 network. In addition, the neural network's overall performance (sum of first, second and third hits) on the testing set was marginally higher than that of regression models in all the experiments considered in this study. Figure 8 through Figure 11 highlight the comparison between multiple regression and single generalization network on each of the first, second and third hit rates. The performance of discriminant analysis was consistently worse than that of multivariate regression models, thus DA constructs the worst models for classification in this case. FIGURE 8. Comparison of Generalization Performance (16 Heuristics) FIGURE 9. Comparison of Generalization Performance (4 clusters) FIGURE 10. Comparison of Generalization Performance (5 clusters) FIGURE 11. Comparison of Generalization Performance (6 clusters) In evaluating the effectiveness of the neural network system, one particular measure that might be of interest is the distance between the best heuristic result and the result provided by neural networks if the neural network system does not choose the best heuristic. Let be the NPV produced by the best heuristic and be the NPV produced by the heuristic recommended by the neural network. If is the number of misclassified cases in the testing samples, we can calculate the average loss in dollar value, , as follows: To illustrate, for the 4-cluster case, 140 out of 425 testing samples are misclassified; the average loss in dollar value is $13.08, if the neural network does not choose the best set of heuristics. In the 5-cluster case, it is $18.93 (179 misclassified); and in the six-cluster case, it is $19.74 (214 misclassified). This indicates that as the number of clusters increases, the heuristic selection procedure using neural networks becomes less accurate. We further computed the mean absolute error in percentage (MAEP), as follows: The MAEP for the 4-cluster case is 1.14%; for the 5-cluster case, it is 1.82%; and for the 6-cluster case, it is 1.78%. These results indicate that the loss incurred through the application of neural networks is not significant and will decrease even further as the neural network system improves in prediction ability. In real world applications, even experienced managers are not able to choose the optimal heuristic in all instances. Thus, neural network models are good consultants when the expert is not available or the best heuristic is not obvious. In reality, a project manager on site may also want to know what percentage improvement in the NPV that he/she can expect to gain by using neural network over an arbitrary selection. We thus compared the percentage improvement of neural network's choice over a random selection. In comparison with the random selection of one set of good heuristics among four clusters, neural network achieved about 67% prediction accuracy based on the first hit rate while random selection can only achieve a 25% prediction accuracy. Similarly, for 5-cluster and 6-cluster cases, neural network achieved about 57% and 45% success respectively while random selection resulted in 20% and 16.7% success respectively. In the worst case, where selection is among all the sixteen heuristics, neural network is able to achieve 22% prediction accuracy in contrast to that of a random selection rule of 6.4%. For projects involving cash flows in thousands of dollars, these results indicate significant gains of the neural network models over a random selection. The neural network system can thus provide useful support in making important scheduling decisions. 5. Conclusion In this paper, we investigate the application of Artificial Neural Networks for the selection of an appropriate heuristic or category of heuristics for the resource-constrained project scheduling problem with cash flows. Neural network methodology can be used both to extract information about the project conditions, as illustrated by the feature extraction network, as well as to provide predictions for novel cases. We conducted extensive experiments on different preprocessing schemes and encoding of inputs and outputs in the neural network. We also experimented with both single generalization network and multi-level generalization networks. The results show that the greater the number of inputs the better the performance for all the models. However, the results also indicate that both principal component analysis and feature detection by neural network are able to extract enough information to enable the neural network to generalize. What makes a significant contribution to the performance of neural network is the preprocessing of outputs. For single generalization network, the fewer the number of clusters, the better the performance. For the multiple generalization network system, however, as the number of clusters increases, the performance gets better. The advantage of using single generalizing network is that it is simple and it is able to give us several heuristics to choose among a few categories. But it cannot predict which one of those heuristics in that particular category to use. In contrast, the multiple generalization network system allows us to choose a specific heuristic to use and the performance is generally good. But it is more complex and requires a classification network at the top level to filter out some information. Also, the limitation of the classification network is that there is only about 70% chance that a heuristic falls in the right generalization network. The results on multivariate regression and discriminant analysis are consistent with the neural network results. In particular, the neural network models perform no worse than multivariate regression models or discriminant analysis. Successful application development of neural network is not straightforward. It involves a considerable amount of expertise in two areas. First, understanding the application domain, i.e., a good deal of familiarity with the application is required to achieve good generalization performance. Important factors such as the choice of predictive variables, preprocessing, normalization and so on can be performed successfully only with a thorough knowledge of the domain. Second, achieving good network performance also involves extensive experimentation with network design and fine-tuning of control parameters. Neural network learning procedures are inherently statistical techniques. Statistical techniques not only can offer insights into the properties of data, but also provide means to preprocess the data. In conclusion, the methodologies analyzed in this study have potential to be useful in many project management applications. These methods should also be generalizable to other difficult combinatorial optimization problems such as traveling salesman, vehicle routing, and job shop scheduling, where the difficulty in solving problems to optimality has resulted in the development of a number of heuristics. Acknowledgment This work is supported in part by the National Science Foundation under grant #9108386. Appendix A - Summary Statistics for Independent Variables Variable Mean Standard Dev. NNODE 156.000 62.022 NARC 198.615 74.830 NDUMMY 121.610 43.947 XDUR 6.31673 0.24369 XDENSITY 0.612639 0.125182 COMPLEXITY 1.28629 0.05535 CPL 87.6153 19.4848 NSLACK 187.258 73.916 PCTSLACK 2.44423 0.12655 XFREESLK 27.3353 9.5306 TSLACK_R 23.7824 11.0314 XSLACK_R 0.312661 0.089286 PDENSITY 0.202842 0.057052 TFREESLK 201.826 92.526 NFREESLK 12367.0 8472.8 PCTFREESLK 146.433 67.640 XFREESLK 2.64899 0.70235 PDENSITY_F 0.708246 0.055237 UTIL1 1.72605 0.68505 UTIL2 1.80528 0.68505 UTIL3 1.70983 0.68698 XUTIL 1.74704 0.68469 TCON1 0.028574 0.019511 TCON2 0.029874 0.019918 TCON3 0.028610 0.020131 XCON_TM 0.029019 0.019812 PROFIT_MARGIN 1.90140 0.26390 FREQ_P 4.51781 2.43473 INT_RATE 0.003000 0.001000 NSTR 0.506667 0.284039 Appendix B- Cluster Analysis Model Formulation Let = 1 if heuristic is a cluster median and 0 otherwise. Let = 1 if heuristic is assigned to cluster median heuristic and 0 otherwise. Minimize subject to: (1) for j = 1,..., 16 (2) for j = 1,..., 16 (3) for i = 1,..., 16 (4) \x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11 \x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11 (5), are binary variables; The constraints include: (1) No heuristic can be assigned to another heuristic if is not a cluster median. (2) At least one heuristic must be assigned to heuristic if heuristic is a cluster median. (3) A heuristic i can be either a cluster median or it is assigned to a cluster median. (4) There must be clusters (N is a predefined constant. In this study, N ranges from 4 to 6). (5) Non-negativity and integrity constraints. References 1. S. Ahmad and G. Tesauro, 1988. Scaling and Generalization in Neural Networks: a Case Study, Connectionist Models Summer School, Carnegie Mellon University. 2. S. Baroum and J.H. Patterson, 1989. A Heuristic Algorithm for Maximizing the Net Present Value of Cash Flows in Resource-Constrained Project Schedule, Working Paper, Indiana University. 3. R. Bey, R. H. Doersch and J. H. Patterson, 1981. The Net Present Value Criterion: Its Impact on Project Scheduling, Project Management Quarterly, 12: 2, 35-45. 4. A. Brooke, D. Kendrick, and A. Meeraus, 1988. GAMS, A User's Guide, The Scientific Press, CA. 5. E. W. Davis, 1973. Project Scheduling under Resource Constraints -- Historical Review and Categorization of Procedures, AIIE Transactions, 5:4, 297-313. 6. E.W. Davis, and J. H. Patterson, 1975. A Comparison of Heuristic and Optimum Solutions in Resource-Constrained Project Scheduling, Management Science, 21: 8, 944-955. 7. S. E. Fahlman, 1988. An Empirical Study of Learning Speed in Back-Propagation Networks, Technical Report CMU-CS-88-162, Carnegie Mellon University. 8. M.R. Garey and D.S. Johnson, 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman and Co., NY. 9. K. Hornik, M. Stinchcombe and H. White, 1989. Multilayered Feedforward Neural Networks are Universal Approximators, Neural Networks, 2, 359-366. 10. T. J.R. Johnson, 1967. An Algorithm for the Resource-Constrained Project Scheduling Problem, Unpublished Ph.D. Thesis, Massachusetts Institute of Technology. 11. R. A. Johnson and D.W. Wichern, 1988. Applied Multivariate Statistical Analysis, Prentice-Hall, Inc. , NJ. 12. N. Kadaba, 1990. XROUTE: A Knowledge-based Routing System using Neural Networks and Genetic Algorithms, Unpublished Ph.D. Thesis, North Dakota State University. 13. J.E. Kelley, 1961. Critical Path Planning and Scheduling, Mathematical Basis, Operations Research, 9, 296-320. 14. I.S. Kurtulus and E.W. Davis, 1982. Multi-Project Scheduling: Categorization of Heuristic Rules Performance, Management Science, 28:2, 161-172. 15. T. Kohonen, 1989. Self-Organization and Associative Memory, Springer-Verlag, Berlin-Heidelberg, New York, 3rd edition. 16. J.L. McClelland and D.E. Rumelhart(eds.), 1986. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1 and 2. MIT Press, Cambridge, MA. 17. R. Mikkulainen and M. Dyer, 1988. Encoding Input/Output Representations in Connectionist Cognitive Systems, Connectionist Models Summer School, Carnegie Mellon University. 18. K. E. Nygard, P. Juell and N. Kadaba, 1990. Neural Networks for Selecting Vehicle Routing Heuristics, ORSA Journal on Computing, 2:4, 353-364. 19. E. Oja, 1982. J. Math. Biology 16, 267-273. 20. R. Padman, 1993. Choosing Solvers in Decision Support Systems: A Neural Network Application in Resource-Constrained Project Scheduling, Recent Developments in Decision Support Systems, Springer-Verlag, Berlin, 559-574. 21. R. Padman, D.E. Smith-Daniels and V. L. Smith-Daniels, 1990. Heuristic Scheduling of Resource-Constrained Projects with Cash Flows: An Optimization-Based Approach, The Heinz School of Public Policy and Management Working Paper 90-6, Carnegie Mellon University. 22. T.L. Pascoe, 1965. An Experimental Comparison of Heuristic Methods for Allocating Resources, Unpublished Ph.D. Thesis, Cambridge University. 23. J. H. Patterson, 1976. Project Scheduling: The Effects of Problem Structure On Heuristic Performance, Naval Research Logistics Quarterly, 23:1, 95-122. 24. C.R. Rao, 1964. The Use and Interpretation of Principal Component Analysis in Applied Research, Sankhya A 26, 329-358. 25. A. H. Russell, 1970. Cash Flows in Networks, Management Science, 16:5, 357-373. 26. R.A. Russell, 1986. A Comparison of Heuristics for Scheduling Projects with Cash Flows and Resource Restrictions, Management Science, 32:10, 1291-1300. 27. A. W. Shogan, 1988. Management Science, Prentice Hall, Inc., Englewood Cliffs, NJ. 28. K. Y. Tam and M. Y. Kiang, 1992. Managerial Applications of Neural Networks: The Case of Bank Failure Predictions, Management Science, 38:7, 926-947. 29. E. Wacholder, J. Han. and R. C. Mann, 1988. A Neural Network Algorithm for the Multiple Traveling Salesman Problem, Proceedings of the IEEE Annual International Conference on Neural Networks, San Diego, 305-324. 30. H. White, 1989. Learning in Artificial Neural Network: A Statistical Perspective, Neural Computation, 1:4, 425-464. 31. D. Zhu and R. Padman, 1993. Heuristic Selection in Resource-Constrained Project Scheduling: Experiments with Neural Networks, The Heinz School of Public Policy and Management Working Paper 93-43, Carnegie Mellon University.