Abstract Aim Orofacial clefts are the most common congenital anomaly to affect the craniofacial region. Surgical repair is usually performed in infancy; however, there are concerning inequalities in access to and quality of surgical care. Scoring aesthetic results after surgery is crucial when determining the success of a repair. A reliable and accurate scoring system utilising large numbers of unstandardised 2-dimentional (2D) photographs of ethnically diverse patients, which is inexpensive, widely accepted and easily applicable, does not exist. Artificial Intelligence (AI) has been applied in various surgical specialities with beneficial results; however, its advantages have not yet been harnessed in cleft care. We aimed to evaluate the potential use of routinely collected 2D photographs of patients with an orofacial cleft and determine if non-standardised data could be used for machine learning (ML) analysis in cleft research. Method A database comprising over 5 million photographs, collected over 20 years, and developed by the international non-governmental organisation Smile Train, was described, and analysed using RStudio and Microsoft Excel. Results Description and analysis of the dataset demonstrated that it is the largest and most ethnically inclusive and diverse dataset that currently exists. Preliminary AI analysis confirmed that ML could be used to analyse the data. Conclusion The quality of routinely collected data presents challenges for use in research. Addressing such challenges helps ensure that findings are more representative of global burden of disease and will deliver outcomes that are more relevant to a diverse global population. Evidence based minimum standards to optimise future data collection have been identified.