#DataMufti I get a lot of questions on my Facebook and LinkedIn regularly asking about data sciences, NLP, predictions, BlockChain, Self-Development, Spirituality, Pakistan, US Politics and what not. Usually, I don’t know the answer of those interesting questions, but sometimes, (being a student myself) I do, I can help and guide the person in the right direction.
I am starting the series of Q&A from today and will tag it under the hashtag of #DataMufti to answer some of the interesting questions that can help others as well. Due to my hectic reading routine I cannot promise to answer all of the questions I receive but will try my best to answer the ones I can in a week or so. Here is the first one
Q: How do I get any Call-Detail-Records (CDR) dataset to learn, analyze and implement my models? Or how can I do my FYP on CDRs
You should start your work by reading this paper “Politics and Ethics of CDR Analysis” by the World Bank.
The next reading should be The ABCDE of Big Data: Assessing Biases in Call-detail records for Development Estimate followed by the CDR analysis of the Republic of Liberia. It will give you the basic understanding of the field and what you can and should do with it. Here is another good paper by MIT on Urban Computing using CDRs. Do checkout FlowMider’s human mobility patterns in West Africa as well.
Here is the NoDoBo CDR Dump (13,035 call records, 83,542 message records, 5292,103 presence records). You can download the complete dataset and find more details from this link.
You can also use CDR Random Generator tools to create your own sample. Here is one such tool, and here is another.
Carto DB is the best commercial tool to analyze CDRs. There is another one called Visallo, it was initially released as open-source on GitHub (you can still find the forked repos though). For students, BandiCoot by MIT should be the starting point to analyze CDRs.
And no one can beat GeoTime when it comes to features and support. They also offer trainings as well.
Good luck with your project