Synthetic Data Generation (Prototeam 4)

A mobility data methodology for Mobility-as-a-Service (MaaS) personalization

Mobility-as-a-Service (MaaS) has gained popularity because it seamlessly integrates multiple transportation modes into a single user experience. Due to data privacy concerns and a lack of comprehensive datasets, acquiring reliable real-world data for developing and assessing Mobility-as-a-Service (MaaS) solutions remains difficult. This article proposes using a deep learning (DL) model trained on an established dataset to generate synthetic data to improve human mobility data (HMD) practicality.

Data created artificially to replicate real-world statistical properties and features is called “synthetic data” (SD). Synthetic data is generated from existing datasets or models (Dankar & Ibrahim, 2021). This method is useful when empirical data collection is expensive, time-consuming, or impossible (Nikolenko, 2021). The flexibility of this approach makes it useful for data augmentation (Frid-Adar et al., 2018), missing data imputation (Chen et al., 2019), bias correction to restore fairness (Xu et al., 2018), and data confidentiality (Dash et al., 2019). It also lets researchers create and test algorithms and applications in a controlled environment without disclosing confidential information.

Mobile sensing technologies have grown in popularity. These technologies include smartphones, wearables, and IoT devices. This surge in mobile sensing technologies has increased data accessibility. These technologies frequently collect data on people’s activities, interactions, and behaviors, increasing data collection. The abundance of data can help us understand human mobility patterns, but it also raises privacy, security, and management concerns.

Synthetic data generation (SDG) methodologies offer many disruptive opportunities in mobility research, especially in tailored travel suggestions. This emerging field recommends routes that meet user needs to improve user experiences. POI recommendation, trip planning, and trajectory modeling are related domains.

This research reveals significant constraints within each group. In POI recommendation systems, the “cold start” problem among new users with no contact experience makes it difficult to predict their preferences. Limited data and contextual gaps cause poor recommended outcomes. Travel planning often requires compromises due to time, distance, and user preferences. Managing unexpected real-time events or changes is important. Noise in incomplete or inaccurate trajectory data reduces trajectory modeling prediction accuracy. User behavior variability makes future predictions uncertain. Infrequent travelers’ trajectories are scarce, reducing predictive model accuracy. In conclusion, location data privacy must be carefully addressed.

In the present scenario, synthetic data holds significant value for Mobility-as-a-Service (MaaS) platforms. This interactive collaboration facilitates precise and tailored route suggestions that cater to individual requirements. The feasibility of this approach is underscored by the integration of synthetic data with real-time data, social interactions, and environmental fluctuations, thereby highlighting its significance. The study examines a number of essential components. Firstly, we will provide an explanation of the Sussex-Huawei Locomotion (SHL) Dataset, which is publicly accessible and was introduced by Gjoreski et al. in 2018. Following the pre-processing stage, this dataset demonstrates its utility in the realm of personalization. The research demonstrates the efficacy of an RNN-based model in producing synthetic data of superior quality that faithfully reproduces the statistical attributes and individual mobility metrics of the datasetIn order to validate the findings, it is necessary to evaluate the representativeness, realism, novelty, and coherence of the synthesized information using suitable quality criteria. To enhance clarity, the outcomes are visually presented on a cartographic representation.

To conclude, this research investigates the place of Synthetic Data Generation techniques within personalized trip recommendation algorithms. Furthermore, it attempts to establish a versatile and adaptable framework for generating travel recommendations by integrating both empirical and artificially generated datasets.

In this prototeam we partnered with ClearboxAI.


  • Dankar, F. K., & Ibrahim, M. (2021). Fake it till you make it: Guidelines for effective synthetic data generation. Applied Sciences, 11(5), 2158.
  • Nikolenko, S. I. (2021). Synthetic data for deep learning (Vol. 174). Springer Nature.
  • Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., & Greenspan, H. (2018, April). Synthetic data augmentation using GAN for improved liver lesion classification. In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018) (pp. 289-293). IEEE.
  • Chen, Y., Lv, Y., & Wang, F. Y. (2019). Traffic flow imputation using parallel data and generative adversarial networks. IEEE Transactions on Intelligent Transportation Systems, 21(4), 1624-1630.
  • Xu, D., Yuan, S., Zhang, L., & Wu, X. (2018, December). FairGAN: Fairness-aware generative adversarial networks. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 570-575). IEEE.
  • Dash, S., Dutta, R., Guyon, I., Pavao, A., & Bennett, K. P. (2019, April). Privacy preserving synthetic health data. In ESANN 2019-European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning.
  • Gjoreski, H., Ciliberto, M., Wang, L., Morales, F. J. O., Mekki, S., Valentin, S., & Roggen, D. (2018). The university of Sussex Huawei locomotion and transportation dataset for multimodal analytics with mobile devices. IEEE Access, 6, 42592-42604. 
  • Xu, L., & Veeramachaneni, K. (2018). Synthesising tabular data using generative adversarial networks. arXiv preprint arXiv:1811.11264
  • Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modelling tabular data using conditional GAN. Advances in Neural Information Processing Systems, 32.
Share On:
Twitter LinkedIn