Categories: Pakistan Democracy

Pakistan General Elections 2018 – How Data Science Can Help?

Here comes the July 25th 2018 and Pakistan will see the 13th election (1954, 1962, 1970, 1977, 1985, 1988, 1990, 1993, 1997, 2002, 2008 and 2013) since independence. It’s middle of the week (Wednesday) with an expected temperature of 27-33 degree Celsius with almost no chances of rain anywhere in the country.

We predict the historic voters’ turn out in this election of 57-61%. Historically the average turn out is 45% since 1977 (lowest 35% in 1997, the highest 55% in 1977 and 53% in last elections). Pakistan ranked 164th out of 169 nations in voters’ turn out; Australia is the first with 94.5% turn out.

Voters’ participation in the country is very diverse, historically Musakhel and Kohlu yield less than 25% whereas Layyah and Khanewal yield more than 60% and everything else is in between. Punjab has the highest and Baluchistan has the lowest voters’ turnout.

The contest will bring 3,675 candidates for 272 national assembly seats, that is 13 candidates on average per seat. PTI has unleashed 244 candidates (highest in number by any political party). Islamabad will see 76 candidates just for three seats fighting to rule the capital and guarantee a psychological edge. Here is an excellent guide to explain the first-past-the-post (FPTP) system of voting, reserved and minority seats. 172 is the number of seats every party wants to secure the simple majority.

There a quite a few interesting facts about these elections, for example, we will see the highest number of Lotas (turncoats – candidates who often change their party affiliation) ever. PTI believes in winning the election no matter what may come while the survey pundits predicts the PML(N) lead of at least 13% over PTI. Candidates from the military background will have a meager chance of winning seats. In the last five general elections (since 1993), 138 candidates with ex-military profile contested the elections, but only 16 were managed to win. Not a single independent candidate with the military background has ever won a seat since 1993.  The independent candidates and their ability to remain independent after winning the elections is highly questionable. Almost 80% (77 out of 96) have presented their “independence” to a political party within the government’s tenure. There are 86,436 polling stations (Punjab: 48,667; Sindh: 18,647; FATA and KPK: 14,655; Baluchistan: 4,467). KLI (Karachi, Lahore, and Islamabad) makes up 85% of Social-Sphere (PTI Die Hard Fans Club) which is only 9% of current voters’ bank (8 million registered voters on social platforms). Nearly 55% are male, and 45% are female voters. Youth (18-25) makes only 19% of the voters’ bank.

According to one analysis, PML(N) has at least 30% safe seats which we can call easy wins. 20% of the seats fall in the “Hat-Trick” category – seats that won by the same party in last three elections.

There are few initiatives to safeguard this election against the normal malpractices. For example, NADRA has developed a system to electronically transmit the results from polling stations with the picture of each voter for accountability. NADRA is also offering Voter Management and Electoral Rolls system to the world, and we hope to see some of it being used in the home-ground as well.

There are 105.96 million registered voters who can cast their vote, but statistically speaking, we are looking at 45 million maximum. While women account for 45% of voter’s bank, their participation in general elections is very low. Women voters have been missing from many KP districts in the last few elections, and there are five constituencies in Punjab where women participation is less than 5%. NA-152 was able to fetch only 1.9% women voters in last elections.

There are quite a few who cannot vote for one reason or another. For example, NADRA’s i-Voting initiative to help 7.9 million overseas Pakistanis to vote over the internet has been parked for no apparent reasons. There are 350,000 soldiers to provide security and thus cannot vote, and then there are 635,000 polling staff members who will be performing their duties and wouldn’t be able to vote either. Thirteen transgender candidates are contesting the elections, but 10,000 of their own cannot even vote. ECP gives the total of 1.6 million staff members on the polling day.

The history of elections and the charges of corruption, voters’ fraud, ghost votes, interferences by deep state or violence go hand in hand. Here is a good guide on how to steal a mandate. There is (almost) no country in the world without the fear or accusations of such incidents in their elections. Whether it is Russia’s meddling in US elections or the alleged role of Cambridge Analytica to sway voters in one way or other (Blockchain is a promising solution to take care of these problems, here is how one can develop it), 92 people getting killed in Kenya’s election or 31 in Honduras80 candidates in Mexico or 11 in Assam, India, or even 74 in Pakistan’s last elections. The deadly cycle of violence has already begun for this election. Haroon Bilour of Awami National Party (ANP) got killed with 20 others and 65 wounded in a suicide bombing attack in Peshawar. 149 got killed, and 186 left injured in a deadly suicide bombing attack on BAP’s leader Siraj Raisani. Four people died, and 10 got injured after an explosion near JUI-F’s Akram Durrani rally in Bannu. The total tally comes to 174 dead and 261 injured so far, making it one of the deadliest elections in Pakistan.  Mastung blast is the second deadliest suicide bombing attack in the history of Pakistan with 149 dead, 139 people died in 2007 in an attack on Benazir Bhutto in Karachi, and 150 killed in the APS attack in Peshawar in 2014.  Complete dataset of violence and assassination of politicians in Pakistan is available here.

Data science can help us answer a few questions and predict the election results. The complete dataset of past election results is available here and here is a good write-up on prediction.

The margin of victory between winner and runner-up is explained in this Kaggle Kernel. One can see the confidence each party will have on winning a particular constituency. Here is the heat map of the total number of votes secured by each party in each constituency. Moreover, here is a complete map of hat-trick seats (the same constituency won by one party in last three elections), we may consider these seats as quick-wins for a respective political party, but there are few constituencies like NA-247which nullifies the logic. This Kaggle kernel of exploratory data analysis of elections’ data has mapped the strength and numbers of each political party across all constituencies in previous elections. This kernel has extended the work to visualize voter’s turn out and the number of votes in each constituency.

Here is a basic formula to go about predicting the election results for Pakistan’s General Election 2018. What we need to do is to calculate the combined probability of a party/candidate to win a particular seat. I would use the following parameters and approximate weights to calculate the winner for each seat:

1.      Winning Party from Last Elections: We can give close to 40% weight to the winning party of the last election for that seat. If you look deeper, few parties have their confirmed seats (as explained in the hat-trick kernels above).

2.      Winning Candidate from Last Elections(s): Say another 20% will go to the winning candidate. If he/she is from the same party it will increase their chances of winning; if the candidate has changed the loyalty, the weight should go towards the new party.

3.      Vote Margin and Voter’s Turn Out: Another 5-10% of weight should go there. Say if you have won with 30% or more margin, you are quite save to lead this time too, but if you have won with a thin margin, the seat can swing. It also depends on voter’s turn out, if the margin was only 5% and 20% more voters come out to vote this time, your lead may increase or disappear based on choices of new voters. You can also assume that new voters would proportionally vote the same way (or otherwise).

4.      Polls: I would only give 3-5% of weight to poll results like Gallup

5.      GPIs (Geo-Political Indicators): This is the most important set of variables for your analysis. It contains several factors decisive for swing seats and overall election results. It can include the sentiments of that constituency (code a python script to fetch Top 20 Google search results for a respective constituency and automatically classify it using NLP as +ve or –ve). More +ve will give you a good score while –ve would give you a zero or even –ve result. That would be the indicator of incumbent performance in the last tenure. Another variable is to search for the candidate’s family, education and political background, if he/she has any cases of corruption against it, was it named in any significant scandal (Panama leaks, etc.)

6.      Rigging: This would be the heart of your analysis. You should calculate all three forms of pre-poll, polling-day and post-poll rigging and what are the chances of it happening in a respective seat. Skimming through media headlines and talking to local folks would give you a good idea to start with.

One can do feature engineering, time-series or regression analysis to find out the correlation of data-fields with election results and to estimate the right weights for assignment for these or new variables. I have calculated the weights for my module and working on results but like to see your analysis. Do contribute your work as a Kaggle Kernel or share the results.

Let’s hope we can produce a New Pakistan with old voters, old tactics, old constituencies, old candidates, old ballot, old mandates and the old way of doing what we do best.

Happy Elections!

admin

Share
Published by
admin

Recent Posts

The Future of Communication: Forecasting Optical Fiber

Introduction Communication has come a long way since the days of smoke signals and carrier…

11 months ago

The Importance of Diversity and Inclusion in Management and Leadership

Diversity and Inclusion in the Workplace Diversity and inclusion are two important concepts in the…

11 months ago

Artificial Intelligence: The Future of Background Checks with Machine Learning

Introduction Background checks have been an essential part of the recruitment process for decades. However,…

11 months ago

Climate Change: The Log Jam in Canada’s Carbon Reduction Efforts

Introduction Climate change is a significant threat to the planet, and Canada is one of…

11 months ago

How Artificial Intelligence is Revolutionizing Supply Chain Management

Introduction Supply chain management has always been a complex process, involving multiple stakeholders, inventory management,…

11 months ago

The Future of Exercise Equipment: A Look at Life Fitness and Technogym

Introduction Exercise equipment has come a long way since the days of simple barbells and…

11 months ago