From detecting potential threats to search for some desired object like food, visual search is one of the most essential visual abilities for humans. In the last decades, there was a large development of models that accurately predict the most likely fixation locations (saliency maps), although they are not able to follow the sequence of eye movements (scanpaths). Today, one of the biggest challenges in the field is to go beyond saliency maps to predict task-specific scanpaths. Particularly, in visual search tasks in artificial images, Ideal Bayesian observers have been proposed to model the visual search behavior as an active sampling process. In this process, during each fixation, humans incorporate new information and update the probability of finding a target at every location.
Here, we propose a combined approach for predicting scanpaths, using state-of-the-art saliency maps to model prior image information in a Bayesian searcher framework. We collected eye-movement visual search data (N=57) in natural indoor scenes and compare different variants of the model. First, we compare different state-of-the-art saliency maps with human fixations, reaching similar AUC performances in the first fixations as in other datasets, but AUC strongly drops after that. Second, we compare different search strategies against human’s scanpaths. Our model achieves the best agreement between metrics and outperforms other strategies, generating scanpaths almost indistinguishable from humans.