Deep convolutional neural network architecture design as a bi-level optimization problem
During the last decade, deep neural networks have shown a great performance in many machine learning tasks such as classification and clustering. One of the most successful networks is the CNN (Convolutional Neural Network), which has been applied in many application domains such as pattern recognition, medical diagnosis, and signal processing. Despite the very interesting performance of CNNs, their architecture design is still so far a major challenge for researchers and practitioners. Several works have been proposed in the literature with the aim to find optimized architectures such as ResNet and VGGNet. Unfortunately, most of these architectures are either manually defined by experts or automatically designed by greedy induction algorithms. Recent works suggest the use of Evolutionary Algorithms (EAs) thanks to their ability to escape locally-optimal architectures. Despite the fact that EAs have shown interesting performance, researchers in this direction have considered the design task as a single-level optimization problem; which represents the main research gap we tackle in this paper. The main contribution behind our work consists in the fact that CNN architecture design has a hierarchical nature and thus could be seen as a Bi-Level Optimization Problem (BLOP) where: (1) the upper level minimizes the network complexity defined by the number of blocks and the number of nodes per block; and (2) the lower level optimizes the convolution block ‘graphs’ topologies by maximizing the classification accuracy. Motivated by the originality of our observation with respect to the state of the art, we frame for the first time the CNN architecture design problem as a BLOP and then solve it using an adapted version of an existing efficient bi-level EA; through the definition of the solution encoding, the fitness function, and the variation operators at each level. The adapted EA is named BLOP-CNN and is assessed on the image classification task using the commonly employed CIFAR-10 and CIFAR-100 benchmark data sets. The analysis of our experimental results show the merits of our proposed method in providing the user with optimized architectures that outperform many recent and prominent architectures coming from the three different approaches, namely: manual design, reinforcement learning-based generation, and evolutionary optimization. Moreover, to show the applicability of our approach, we have conducted a case study on the detection of the COVID-19 using a set of benchmark chest X-ray and Computed Tomography (CT) images.
Digital Object Identifier (DOI)