Visual clustering is an essential technology for many real-world applications, for instance, automatic photo grouping, image tagging and data annotation for machine learning systems. We propose a hierarchical graph neural network framework as well as a joint linkage prediction and density estimation graph model for visual clustering. Visual clustering is the task of grouping visually similar images together from an unorganized set. Graph neural network deals with complex graph structure to extract useful semantic information. Graph neural network based visual clustering methods rely on the construction of a k-nearest neighbor graph built from the visual embedding set of the input images. However, single graph neural network based visual clustering is very sensitive to the selection of the k-bandwidth parameter during the construction process of the k-nearest neighbor graph. For instance, one might need very different k to make the model work on face image and clothing image clustering. To this end, this invention proposes a method to deal with the sensitivity of the k-bandwidth parameter and bypass the need to tune it for different applications.