In recent years, several tools for building energy performance simulation and analysis have been developed to assist in increasing building energy performance, harvesting its computing capabilities for a reliable and accurate energy performance prediction. To perform this analysis, energy tools typically require crucial data regarding the building's surrounding environment, which is acquired from neighbouring weather stations. However, these stations often experience hardware malfunctions, resulting in either erroneous or missing data. Traditionally, these values are rectified through empirical and geostatistical methods, which, while reflecting several decades of practice, may prove to be inadequate when considering a purely data-driven approach. To this end, the present study introduces a machine learning methodology proposing the application of regression algorithms to rectify the erroneous values in datasets, and the clustering of weather stations, based on their recorded climatic conditions, to enhance the regression models. A shape-based approach for clustering time series of different climatic parameters and weather stations is pursued, using the k-medoids algorithm alongside dynamic time warping as the similarity measure. Both Artificial Neural Networks (ANN) and Support Vector Regression (SVR) models are evaluated as exemplary regression algorithms, with different sets of predictors. Mean Squared Error is used as the performance metric. A data set of different climatic parameters from southeastern Brazil was used , with air temperature being chosen as the response variable, given its importance in energy consumption. Results indicate that a machine learning approach to the problem is indeed viable. ANN slightly outperforms SVR in the prediction of the studied weather variable.