The partitioning algorithm has been tested and compared to previous circuit partitioning algorithms [4, 8, 2, 3]. The parallel synthesis tool has been compared to another parallel tool that also use logic partitioning in parallel [5]. These points will be detailled in the full paper. Table 1 shows results for 1, 4, and 8 partitions running on 8 Sun SS5 workstations with 32 Mb of memory. In each case, we present the number of elementary cells (CLB) for the circuit mapped on a Xilinx FPGA before placement and routing. Overall times are in seconds. Times include partitioning, parallel application of a standard synthesis script of SIS and merging of optimized subnetworks. The tested circuits are from ISCAS'89 public benchmark suite [11].
1P | 4P | 8P | ||||
Circuit | CLB | time | CLB | time | CLB | time |
C2670 | 105 | 793 | 133 | 138 | 140 | 111 |
dalu | 192 | 2257 | 255 | 228 | 305 | 225 |
i8 | 232 | 759 | 326 | 143 | 434 | 325 |
C3540 | 229 | 1251 | 241 | 256 | 233 | 105 |
i10 | 435 | 37272 | 448 | 379 | 476 | 257 |
C5315 | 273 | 953 | 292 | 314 | 306 | 167 |
k2 | 253 | 15838 | 385 | 379 | 454 | 248 |
C6288 | 463 | 992 | 537 | 302 | 551 | 231 |
C7552 | 656 | 1295 | 297 | 480 | 321 | 266 |
des | 1070 | 3055 | 701 | 1451 | 693 | 873 |
Total | 3908 | 64465 | 3615 | 4070 | 3913 | 2808 |
Speed up and solution qualities are very encouraging on a network of up to 8 workstations. The best results, as compared to sequential execution, are obtained with difficult circuits taking long delays on one processor, e.g. C3540, i10, k2, C7552 and des. Splitting avoids the degradation of performances observed for these circuits; in sequential, these bad runtimes are coming mainly from memory swapping involved by the circuit sizes.
Solution quality is surprisingly sometimes better in parallel. This is due to
the automatic switching of some algorithms from heuristic to exact methods,
according to the size of the circuits processed.
Software will be soon available for public use at
http://ubolib.univ-brest.fr/ lemarch/PPart.