How to Import Data from Wikipedia? by Dhafer Malouche
Wikipedia is the famous website containing a huge number of tables and it could be interesting to write an R code in order to be able to extract any data you would like to collect from it.
I will show in this tutorial how can you collect such data. I’m showing two examples: One on ISO codes of countries and another one on Tunisian Elections.
I’m interested here in the table that gives the alphs-ISO2 code of all countries. when you visit the following URL: https://en.wikipedia.org/wiki/ISO_3166-1, you can notice that this table is the second table in this page.
For this purpose we need to install in our R environment htmltab
R package.
> library(htmltab)
> u<-"https://en.wikipedia.org/wiki/ISO_3166-1"
> doc <- htmltab(u,2)
> class(doc)
## [1] "data.frame"
> dim(doc)
## [1] 249 6
> colnames(doc)
## [1] "English short name (using title case)"
## [2] "Alpha-2 code"
## [3] "Alpha-3 code"
## [4] "Numeric code"
## [5] "Link to ISO 3166-2 subdivision codes"
## [6] "Independent"
Let’s now use DT
to display the table concerning only independent countries.
> library(DT)
> i=which(doc$Independent=="Yes")
> doc=doc[i,1:5]
> datatable(doc, filter = 'top',rownames = F,
+ extensions = 'Buttons',
+ options = list( dom = 'Bfrtip',pageLength = 25,
+ autoWidth = TRUE,
+ buttons=c('copy','csv','excel','pdf','print',I('colvis'))
+ ))
We’re now interested in the Wikipedia pages dealing with Tunisian Elections after 2011. We can find four pages :
> dt_muni <- htmltab("https://fr.wikipedia.org/wiki/%C3%89lections_municipales_tunisiennes_de_2018",4)
> dt_pres<-htmltab("https://en.wikipedia.org/wiki/2014_Tunisian_presidential_election",8)
> dt_par<-htmltab("https://en.wikipedia.org/wiki/2014_Tunisian_parliamentary_election",7)
> dt_anc<-htmltab("https://en.wikipedia.org/wiki/2011_Tunisian_Constituent_Assembly_election",8)
> head(dt_anc)
## Parties >> Valid votes >> Total
## 2 Ennahda Movement
## 3 Congress for the Republic
## 4 Popular Petition
## 5 Democratic Forum for Labour and Liberties
## 6 Progressive Democratic Party
## 7 The Initiative
## Votes >> 4,053,148 >> 4,308,888 % >> 94.06 >> 100.00 NA
## 2 1,501,320 37.04 89
## 3 353,041 8.71 29
## 4 273,362 6.74 26
## 5 284,989 7.03 20
## 6 159,826 3.94 16
## 7 129,120 3.19 5
> head(dt_par)
## Party, coalition and independent lists Votes % Votes Seats % Seats
## 3 Nidaa Tounes 1,279,941 37.56% 86 39.63%
## 4 Ennahda Movement 947,014 27.79% 69 31.79%
## 5 Free Patriotic Union 140,873 4.13% 16 7.37%
## 6 Popular Front 124,046 3.64% 15 6.91%
## 7 Afek Tounes 102,915 3.02% 8 3.68%
## 8 Congress for the Republic 69,794 2.05% 4 1.84%
## Swing
## 3 N/A
## 4 −20
## 5 +15
## 6 +11
## 7 +5
## 8 −25
> head(dt_muni)
## Parti, coalition ou liste Voix % Conseillers %.1 Maires
## 2 Ennahdha 517 234 28,64 2 135 / 7 212 29,68 131 / 350
## 3 Nidaa Tounes 377 121 20,85 1 600 / 7 212 22,17 76 / 350
## 4 Courant démocrate 75 619 4,19 205 / 7 212 2,85 3 / 350
## 5 Front populaire 71 551 3,95 261 / 7 212 3,60 8 / 350
## 6 Union civile 31 883 1,77 66 / 7 212 0,92 2 / 350
## 7 Machrouu Tounes 26 013 1,44 124 / 7 212 1,72 0 / 350
> head(dt_pres)
## Candidates >> Total Parties >> Total
## 3 Beji Caid Essebsi Nidaa Tounes
## 4 Moncef Marzouki Congress for the Republic
## 5 Hamma Hammami Popular Front
## 6 Hechmi Hamdi Current of Love
## 7 Slim Riahi Free Patriotic Union
## 8 Kamel Morjane National Destourian Initiative
## First round >> Votes >> 3,267,569 First round >> % >> 100%
## 3 1,289,384 39.46%
## 4 1,092,418 33.43%
## 5 255,529 7.82%
## 6 187,923 5.75%
## 7 181,407 5.55%
## 8 41,614 1.27%