Objective

This notebook presents the development of a statistical model for building damage due to typhoons in the Philippines, using data from past 12 typhoons during the last 2 decades. The damage model is constructed to predict the proportion of damaged houses per municipality, countrywide, using a set of hazard related metrics, such as wind speed, and socio-economic indicators, such as poverty incidence or wall constructions quality, as predictors.

Two damage models are built: the first one to predict the proportion of damaged houses in a municipality, and the second one for the proportion of completely damaged houses in a municipality. Both models are chosen to be multivariate logistic for interpretation purposes (the estimated coefficients relate to log-odds) and because the a probit model gives similar performance.

chosen_link <- "logit"
if(chosen_link == "logit"){link_fn <- logit}
if(chosen_link == "probit"){link_fn <- qnorm}

include_transform <- TRUE # Set = TRUE if we want to compute cross-validation errors accounting for variable transformation, scaling, VIF and stepAIC.

The database

The damage model is built based on past observations of building damages due to typhoons in the last decades (510 Global). The dataset has 1,638 observations corresponding to 1,034 countrywide municipalities and impacts from 12 typhoons. It has 34 columns corresponding to data surveyed in the aftermath of the typhoons and other site and socio-economic metrics corresponding to each municipality.

# Read CSV with complete dataset into R
raw_data <- read.csv(file="data//All.csv", header=TRUE, sep=",")
summary(raw_data)
##  disaster_type   disaster_name         pcode      Completely.damaged..abs..
##  Typhoon:1638   Haima   :301   PH012815000:   6   Min.   :    1.0          
##                 Rammasun:294   PH025012000:   6   1st Qu.:    1.0          
##                 Koppu   :203   PH143206000:   6   Median :   16.0          
##                 Bopha   :175   PH143209000:   6   Mean   :  582.9          
##                 Haiyan  :162   PH148102000:   6   3rd Qu.:  269.5          
##                 Melor   :105   PH012804000:   5   Max.   :14132.0          
##                 (Other) :398   (Other)    :1603                            
##  Partly.damaged..abs.. Total.damaged.houses..abs..
##  Min.   :    1.0       Min.   :    2              
##  1st Qu.:   14.0       1st Qu.:   19              
##  Median :  179.5       Median :  225              
##  Mean   : 1139.0       Mean   : 1722              
##  3rd Qu.: 1411.5       3rd Qu.: 1962              
##  Max.   :46553.0       Max.   :58823              
##                                                   
##  total_damage_houses_0p25weight Partly.damaged..rel.. Completely.damaged..rel..
##  Min.   :    1.25               Min.   :  0.00097     Min.   : 0.00117         
##  1st Qu.:    7.00               1st Qu.:  0.19783     1st Qu.: 0.02581         
##  Median :   75.50               Median :  2.58792     Median : 0.22321         
##  Mean   :  867.68               Mean   : 11.86016     Mean   : 6.10332         
##  3rd Qu.:  694.56               3rd Qu.: 18.69875     3rd Qu.: 3.02683         
##  Max.   :23908.25               Max.   :115.34310     Max.   :95.60731         
##                                                                                
##  Total.damaged.houses..rel.. total_damage_houses_0p25weight_perc
##  Min.   :  0.00265           Min.   : 0.00165                   
##  1st Qu.:  0.27720           1st Qu.: 0.10543                   
##  Median :  3.25947           Median : 1.10478                   
##  Mean   : 17.96349           Mean   : 9.06837                   
##  3rd Qu.: 27.26852           3rd Qu.: 9.26261                   
##  Max.   :129.97414           Max.   :98.54451                   
##                                                                 
##  ratio_comp_part    Total...of.houses    Wind.speed     Distance.to.typhoon
##  Min.   : 0.01125   Min.   :   324.2   Min.   : 40.00   Min.   :  0.1225   
##  1st Qu.: 0.24230   1st Qu.:  4700.8   1st Qu.: 60.00   1st Qu.: 24.3993   
##  Median : 0.57941   Median :  8017.6   Median : 80.00   Median : 50.3040   
##  Mean   : 0.97434   Mean   : 11767.5   Mean   : 84.68   Mean   : 74.8998   
##  3rd Qu.: 0.97201   3rd Qu.: 13664.4   3rd Qu.:102.69   3rd Qu.:104.9571   
##  Max.   :88.20134   Max.   :194096.5   Max.   :298.46   Max.   :347.2922   
##                                                                            
##     rainfall      distance_first_impact     Slope           Elevation       
##  Min.   :  45.0   Min.   :   4.358      Min.   : 0.4569   Min.   :   2.516  
##  1st Qu.: 244.9   1st Qu.: 110.800      1st Qu.: 3.9357   1st Qu.:  59.706  
##  Median : 303.5   Median : 175.940      Median : 7.1648   Median : 138.018  
##  Mean   : 323.9   Mean   : 197.212      Mean   : 8.1863   Mean   : 258.570  
##  3rd Qu.: 376.5   3rd Qu.: 263.415      3rd Qu.:11.3319   3rd Qu.: 305.599  
##  Max.   :1964.0   Max.   :1223.384      Max.   :25.2295   Max.   :1899.717  
##                                                                             
##  ruggedness_stdev   Ruggedness       slope_stdev     
##  Min.   : 1.629   Min.   :  3.497   Min.   : 0.2861  
##  1st Qu.:18.188   1st Qu.: 21.022   1st Qu.: 3.9357  
##  Median :29.284   Median : 36.834   Median : 6.3751  
##  Mean   :28.602   Mean   : 41.833   Mean   : 6.0110  
##  3rd Qu.:39.950   3rd Qu.: 57.244   3rd Qu.: 8.4818  
##  Max.   :86.704   Max.   :128.965   Max.   :13.0245  
##                                                      
##  Predicted.damage.class..1.5.   Population       land_area       
##  Mode:logical                 Min.   :  1291   Min.   :   1.857  
##  NA's:1638                    1st Qu.: 18807   1st Qu.:  80.481  
##                               Median : 32108   Median : 146.901  
##                               Mean   : 47066   Mean   : 215.934  
##                               3rd Qu.: 54659   3rd Qu.: 264.231  
##                               Max.   :776382   Max.   :2428.147  
##                                                                  
##  Population.density Poverty.incidence X..strong.roof.type X..strong.wall.type
##  Min.   :    4.5    Min.   :0.0085    Min.   :0.2362      Min.   :0.05375    
##  1st Qu.:  115.5    1st Qu.:0.1363    1st Qu.:0.7404      1st Qu.:0.43032    
##  Median :  229.1    Median :0.2250    Median :0.9155      Median :0.59095    
##  Mean   :  452.2    Mean   :0.2579    Mean   :0.8345      Mean   :0.58940    
##  3rd Qu.:  434.3    3rd Qu.:0.3585    3rd Qu.:0.9630      3rd Qu.:0.77660    
##  Max.   :34384.3    Max.   :0.8476    Max.   :1.0000      Max.   :0.95228    
##                                                                              
##  X..skilled.Agriculture.Forestry.Fishermen         Region            prov     
##  Min.   :0.001049                          central    :533   0         :1597  
##  1st Qu.:0.068022                          metro luzon:263   albay     :  25  
##  Median :0.103011                          minandao   :140   catanduane:  16  
##  Mean   :0.107656                          north luzon:702                    
##  3rd Qu.:0.138443                                                             
##  Max.   :0.340328                                                             
##  NA's   :21                                                                   
##    Experience     Bicol.region   
##  Min.   :1.000   Min.   :0.0000  
##  1st Qu.:1.278   1st Qu.:0.0000  
##  Median :2.034   Median :0.0000  
##  Mean   :2.100   Mean   :0.1172  
##  3rd Qu.:2.800   3rd Qu.:0.0000  
##  Max.   :4.280   Max.   :1.0000  
##  NA's   :1

The database is preprocessed to remove rows (observations) with empty values, and remove certain columns that are not going to be used in the model training.

# Remove rows with NA in skilled.Agriculture.Forestry.Fishermen (21 rows):
raw_data <- raw_data[!is.na(raw_data$X..skilled.Agriculture.Forestry.Fishermen), ]

# Remove row with NA in experience.
raw_data <- raw_data[!is.na(raw_data$Experience), ]

# Remove unnecessary columns
raw_data <- raw_data[, -which(colnames(raw_data) %in% c("Completely.damaged..abs..", "Partly.damaged..abs..", "Total.damaged.houses..abs..", "total_damage_houses_0p25weight", "Partly.damaged..rel..", "total_damage_houses_0p25weight_perc", "ratio_comp_part", "Total...of.houses", "Predicted.damage.class..1.5.", "Bicol.region", "Population", "land_area", "Ruggedness","ruggedness_stdev"))]

Caution: The variables ‘Ruggedness’ and ‘ruggedness_stdev’ that describe the variation rate of slope in the topography, were ruled out in favour of the ‘slope’ variable. They both have a ver high correlation and thus carry similar information regarding to damage prediction. This decision was taken due to the relative easier interpretation of the slope variable.

Next, we check for and remove outliers.

raw_data %>%
  keep(is.numeric) %>% 
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Remove outlier:
raw_data <- raw_data[-which(raw_data$Wind.speed == max(raw_data$Wind.speed)),]
sum(raw_data$Total.damaged.houses..abs.. > raw_data$Total...of.houses)
## [1] 0
max(raw_data$Total.damaged.houses..rel..[raw_data$Total.damaged.houses..rel..<100])
## [1] 99.92221

Since there are 13 municipalities where the number of total damaged houses exceed the number of houses on the record likely due to different data sources, we set an upper bound of 99.99% on the proportion of damaged houses. This is higher than the maximum observed value which is less than 100%.

# Replace >100 Total.damaged.houses..rel.. with 99.99
raw_data$Total.damaged.houses..rel..[raw_data$Total.damaged.houses..rel..>100] <- 99.99

Next, we convert the building damage proportions to a scale of 0-1 for modelling.

# Create proportion of houses which are damaged:
raw_data$Total.damaged.houses..rel.. <- raw_data$Total.damaged.houses..rel../100

# Create proportion of houses which are completely damaged:
raw_data$Completely.damaged..rel.. <- raw_data$Completely.damaged..rel../100

Data preparation

Variable transformation

Each of the potential predictors of damage is tested using the Box-Cox transformation to determine if a transformation of power-law transformation of the variable is necessary to normalise the variables relative to the link-transformed damaged proportions.

# Link-transformed damage proportions 
link_fn_td <- link_fn(raw_data$Total.damaged.houses..rel..)
link_fn_pc <- link_fn(raw_data$Completely.damaged..rel..)

# Round lambda est to the nearest half integer
power_est <- function(bC){l.est <- bC$x[bC$y == max(bC$y)]
                          return(round(l.est/0.5)*0.5)}

# Set up dataframe with first row for td and second row for pc
var_names <- colnames(raw_data)
var_names <- var_names[!(var_names %in% c("disaster_type", "disaster_name", "pcode", "Completely.damaged..rel..", "Total.damaged.houses..rel..", "Region", "prov"))]
power_df <- matrix(NA, nrow = 2, ncol = length(var_names))
colnames(power_df) <- var_names

for (i in 1:length(var_names)){
  
  var_i <- var_names[i]
  power_df[1, var_i] <- power_est(boxCox(link_fn_td - min(link_fn_td)*1.01 ~ raw_data[, var_i],
                                         plotit = FALSE))
  power_df[2, var_i] <- power_est(boxCox(link_fn_pc - min(link_fn_pc)*1.01 ~ raw_data[, var_i],
                                         plotit = FALSE))
  
}

power_df
##      Wind.speed Distance.to.typhoon rainfall distance_first_impact Slope
## [1,]        1.0                 0.5      0.5                   0.5   0.5
## [2,]        0.5                 0.5      0.5                   0.5   0.5
##      Elevation slope_stdev Population.density Poverty.incidence
## [1,]       0.5         0.5                0.5               0.5
## [2,]       0.5         0.5                0.5               0.5
##      X..strong.roof.type X..strong.wall.type
## [1,]                 0.5                 0.5
## [2,]                 0.5                 0.5
##      X..skilled.Agriculture.Forestry.Fishermen Experience
## [1,]                                       0.5        0.5
## [2,]                                       0.5        0.5

Apply the obtained transformations:

td_data <- raw_data
pc_data <- raw_data

for (i in 1:length(var_names)){
      
    var_i <- var_names[i]
     if(power_df[1, var_i]==0){ 
       td_data[, var_i] <- log(td_data[, var_i])
     }else{td_data[, var_i] <- td_data[, var_i]^power_df[1, var_i]}
  
     if(power_df[2, var_i]==0){ 
       pc_data[, var_i] <- log(pc_data[, var_i])
     }else{pc_data[, var_i] <- pc_data[, var_i]^power_df[2, var_i]}
}

Data scaling

All numeric columns are standardize to have mean 0 and standard deviation 1.

# Standardise data:

# Save to apply to Goni 2020 dataset later

cols.mean.td <- colMeans(td_data[, var_names] ) 
cols.sd.td <- apply(td_data[, var_names], 2, sd )
scaled.data.td <- lapply(td_data[, var_names], scale)

cols.mean.pc <- colMeans(td_data[, var_names] ) 
cols.sd.pc <- apply(td_data[, var_names], 2, sd )
scaled.data.pc <- lapply(pc_data[, var_names], scale)

# Switch out scaled numerical columns (apart from response variables)
td_data[, var_names] <- as.data.frame(scaled.data.td)
pc_data[, var_names] <- as.data.frame(scaled.data.pc)

Exploratory data analysis

In this section, we explore the data to identify possible relations between the predictors and the building damage proportions.

For the proportion of houses damaged:

td_data %>%
  keep(is.numeric) %>% 
  gather(-Total.damaged.houses..rel.., -Completely.damaged..rel.., key = "var", value = "value") %>% 
  ggplot(aes(x = value, y = link_fn(Total.damaged.houses..rel..))) +
    geom_point() +
    facet_wrap(~ var, scales = "free") +
    theme_bw()

For the proportion of houses completely damaged:

pc_data %>%
  keep(is.numeric) %>% 
  gather(-Total.damaged.houses..rel.., -Completely.damaged..rel.., key = "var", value = "value") %>% 
  ggplot(aes(x = value, y = link_fn(Completely.damaged..rel..))) +
    geom_point() +
    facet_wrap(~ var, scales = "free") +
    theme_bw()

For both models, a distinctive behavior can be seen for the Wind Speed predictor depending on its magnitude. While for lower wind speeds a rather scattered and significant positive slope can be observed, this increment of damage seems to saturate for higher wind speed (e.g. above average). This non-linearity was also explored in the model although without satisfactory results, so no changes were made. Similar trends can be seen for other variables, such as elevation and rainfall, but since wind speed is a major predictor variable, the analysis was constrained to it.

Datasets

The dataset for the 1st damage model (Proportion of damage houses):

# Dataset for Proportion Damaged:
# Omit Population, Land.Area, Region and prov.
omit_var_td <- which(colnames(raw_data) %in% c("disaster_type", "disaster_name", "pcode", "Completely.damaged..rel..", "Region", "prov"))
td_data <- td_data[, -omit_var_td]
head(td_data)
##   Total.damaged.houses..rel.. Wind.speed Distance.to.typhoon   rainfall
## 1                0.0002225499 -1.2571410           1.6699987 -2.8985630
## 2                0.0018975332 -0.6908614           0.3872697 -1.2770629
## 3                0.0101243535 -0.9740012           1.0062739  0.3332327
## 4                0.0005569479 -0.6908614           0.5339093 -1.4256505
## 5                0.0006712911 -0.9740012           0.7843221 -0.8008816
## 6                0.0014214182 -0.9740012           1.1039695  0.3698862
##   distance_first_impact      Slope  Elevation slope_stdev Population.density
## 1              5.129665 -1.0718403 -1.0334861 -0.51418359       1.0187587782
## 2              1.349430 -0.6883624 -0.5532139 -0.69190964      -0.0002248347
## 3              1.435343 -0.4932481 -0.4260822 -0.66493207      -0.2569326702
## 4              1.251112 -0.0203109  0.1860411 -0.05896435      -0.2237533693
## 5              1.176279  0.4651641  0.7762894  0.38980071      -0.1999884083
## 6              1.464872 -0.5844099 -0.7985256 -0.45485839      -0.0838508353
##   Poverty.incidence X..strong.roof.type X..strong.wall.type
## 1        -0.9156615           0.7800073          0.92881818
## 2        -0.5707727           0.4770982          0.09587523
## 3         1.2687909          -0.2126670         -0.69068122
## 4         0.4591817           0.7754063          0.36482433
## 5         0.3988127           0.4541748          0.28660244
## 6         1.1709577          -0.5112979         -0.75516651
##   X..skilled.Agriculture.Forestry.Fishermen Experience
## 1                                -0.4856129   1.245906
## 2                                -0.8788238  -1.401257
## 3                                 0.9973094  -1.401257
## 4                                 0.4261285  -1.401257
## 5                                 0.6513256  -1.401257
## 6                                 0.7471066  -1.401257

The dataset for the 2nd damage model (Proportion of completely damage houses):

# Dataset for Proportion of Completely Damaged:
omit_var_pc <- which(colnames(raw_data) %in% c("disaster_type", "disaster_name", "pcode", "Total.damaged.houses..rel..", "Region", "prov"))
pc_data <- pc_data[, -omit_var_pc]
head(pc_data)
##   Completely.damaged..rel.. Wind.speed Distance.to.typhoon   rainfall
## 1              0.0001112749 -1.4437362           1.6699987 -2.8985630
## 2              0.0003795066 -0.6762612           0.3872697 -1.2770629
## 3              0.0014672976 -1.0406652           1.0062739  0.3332327
## 4              0.0002784740 -0.6762612           0.5339093 -1.4256505
## 5              0.0002237637 -1.0406652           0.7843221 -0.8008816
## 6              0.0012921983 -1.0406652           1.1039695  0.3698862
##   distance_first_impact      Slope  Elevation slope_stdev Population.density
## 1              5.129665 -1.0718403 -1.0334861 -0.51418359       1.0187587782
## 2              1.349430 -0.6883624 -0.5532139 -0.69190964      -0.0002248347
## 3              1.435343 -0.4932481 -0.4260822 -0.66493207      -0.2569326702
## 4              1.251112 -0.0203109  0.1860411 -0.05896435      -0.2237533693
## 5              1.176279  0.4651641  0.7762894  0.38980071      -0.1999884083
## 6              1.464872 -0.5844099 -0.7985256 -0.45485839      -0.0838508353
##   Poverty.incidence X..strong.roof.type X..strong.wall.type
## 1        -0.9156615           0.7800073          0.92881818
## 2        -0.5707727           0.4770982          0.09587523
## 3         1.2687909          -0.2126670         -0.69068122
## 4         0.4591817           0.7754063          0.36482433
## 5         0.3988127           0.4541748          0.28660244
## 6         1.1709577          -0.5112979         -0.75516651
##   X..skilled.Agriculture.Forestry.Fishermen Experience
## 1                                -0.4856129   1.245906
## 2                                -0.8788238  -1.401257
## 3                                 0.9973094  -1.401257
## 4                                 0.4261285  -1.401257
## 5                                 0.6513256  -1.401257
## 6                                 0.7471066  -1.401257

Model training

Check for multicollinearity

Variance Inflation Factors (VIF) are computed to check for multicollinearity. Variables with VIF>10 are removed from consideration.

# Fit Multi-Logistic Model for td
td_full <- glm(Total.damaged.houses..rel.. ~ ., data= td_data, family = binomial(link=chosen_link), trace = FALSE)

## Check on variable collinearity
temp_model <- td_full
temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
temp_var <- "dummy"
temp_data <- td_data

while(max(temp_vif)>10){
  
  var_remove <- which(temp_vif == max(temp_vif))
  temp_var <- append(temp_var, names(var_remove))
  temp_data <- temp_data[, -which(colnames(temp_data) == names(var_remove))]
  temp_model <- (glm(Total.damaged.houses..rel.. ~ ., data= temp_data, family = binomial(link=chosen_link), trace = FALSE))
  temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
}

td_col_to_use <- colnames(temp_data)
temp_vif
##                                Wind.speed 
##                                  2.852955 
##                       Distance.to.typhoon 
##                                  2.435281 
##                                  rainfall 
##                                  1.419057 
##                     distance_first_impact 
##                                  1.505244 
##                                     Slope 
##                                  9.059720 
##                                 Elevation 
##                                  4.198276 
##                               slope_stdev 
##                                  5.967166 
##                        Population.density 
##                                  2.670852 
##                         Poverty.incidence 
##                                  3.453024 
##                       X..strong.roof.type 
##                                  2.201079 
##                       X..strong.wall.type 
##                                  3.058554 
## X..skilled.Agriculture.Forestry.Fishermen 
##                                  2.347577 
##                                Experience 
##                                  1.837720
temp_var
## [1] "dummy"
# Fit Multi-Logistic Model for pc
pc_full <- glm(Completely.damaged..rel.. ~ ., data= pc_data, family = binomial(link=chosen_link), trace = FALSE)

## Check on variable collinearity
temp_model <- pc_full
temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
temp_var <- "dummy"
temp_data <- pc_data

while(max(temp_vif)>10){
  
  var_remove <- which(temp_vif == max(temp_vif))
  temp_var <- append(temp_var, names(var_remove))
  temp_data <- temp_data[, -which(colnames(temp_data) == names(var_remove))]
  temp_model <- (glm(Completely.damaged..rel.. ~ ., data= temp_data, family = binomial(link=chosen_link), trace = FALSE))
  temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
}

pc_col_to_use <- colnames(temp_data)
temp_vif
##                                Wind.speed 
##                                  3.320326 
##                       Distance.to.typhoon 
##                                  2.202872 
##                                  rainfall 
##                                  1.755294 
##                     distance_first_impact 
##                                  1.708302 
##                                     Slope 
##                                  9.612807 
##                                 Elevation 
##                                  4.386630 
##                               slope_stdev 
##                                  6.468108 
##                        Population.density 
##                                  2.757635 
##                         Poverty.incidence 
##                                  2.831486 
##                       X..strong.roof.type 
##                                  2.072141 
##                       X..strong.wall.type 
##                                  2.747146 
## X..skilled.Agriculture.Forestry.Fishermen 
##                                  2.376611 
##                                Experience 
##                                  2.161044
temp_var
## [1] "dummy"
# Subset variables based on VIF:
td_data <- td_data[, colnames(td_data) %in% td_col_to_use]
pc_data <- pc_data[, colnames(pc_data) %in% pc_col_to_use]

Backwards stepwise variable selection

The stepwise variable selection uses AIC to compare between models.

# Fit Multi-Logistic Model
td_full <- glm(Total.damaged.houses..rel.. ~ ., data= td_data, family = binomial(link=chosen_link), trace = FALSE)
td_step <- stepAIC(td_full)
# Check the variable significance
summary(td_step)
## 
## Call:
## glm(formula = Total.damaged.houses..rel.. ~ Wind.speed + distance_first_impact + 
##     Slope + Elevation + Population.density + X..strong.roof.type + 
##     Experience, family = binomial(link = chosen_link), data = td_data, 
##     trace = FALSE)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.22294  -0.30991  -0.15599   0.08836   2.67136  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -2.16986    0.10147 -21.384  < 2e-16 ***
## Wind.speed             1.18801    0.08946  13.280  < 2e-16 ***
## distance_first_impact -0.35929    0.08475  -4.239 2.24e-05 ***
## Slope                  0.45756    0.18188   2.516 0.011876 *  
## Elevation             -0.73948    0.19996  -3.698 0.000217 ***
## Population.density    -0.33519    0.15741  -2.129 0.033225 *  
## X..strong.roof.type   -0.23936    0.07781  -3.076 0.002097 ** 
## Experience            -0.16023    0.09002  -1.780 0.075091 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 721.58  on 1614  degrees of freedom
## Residual deviance: 285.80  on 1607  degrees of freedom
## AIC: 752.7
## 
## Number of Fisher Scoring iterations: 6

The correlation with the predictors seem as expected. Higher windspeeds give raise to higher damage, the same as municipalities with higher average slope where wind effects can be more severe. On the other hand, the distance to the typhoon’s first impact in land, the percentage of houses with strong roof and municipalities with more experience from past typhoons yield a lesser damage. The negative correlation with population density might be discriminating between municipalities with more dense urban land use and more rural ones, where it is expected to find a lower proportion of damage (although a larger absolute amount of damage). The reason for the sign of the correlation with elevation is not yet clear.

# Fit Multi-Logistic Model
pc_full <- glm(Completely.damaged..rel.. ~ ., data= pc_data, family = binomial(link=chosen_link), trace = FALSE)
pc_step <- stepAIC(pc_full)
# Check resulting model
pc_step
## 
## Call:  glm(formula = Completely.damaged..rel.. ~ Wind.speed + distance_first_impact + 
##     Elevation + Population.density + X..strong.roof.type + Experience, 
##     family = binomial(link = chosen_link), data = pc_data, trace = FALSE)
## 
## Coefficients:
##           (Intercept)             Wind.speed  distance_first_impact  
##               -4.1679                 1.5099                -0.3467  
##             Elevation     Population.density    X..strong.roof.type  
##               -0.3597                -0.5586                -0.2420  
##            Experience  
##               -0.3302  
## 
## Degrees of Freedom: 1614 Total (i.e. Null);  1608 Residual
## Null Deviance:       346.5 
## Residual Deviance: 96.17     AIC: 276.4
# Check the variable significance
summary(pc_step)
## 
## Call:
## glm(formula = Completely.damaged..rel.. ~ Wind.speed + distance_first_impact + 
##     Elevation + Population.density + X..strong.roof.type + Experience, 
##     family = binomial(link = chosen_link), data = pc_data, trace = FALSE)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.73483  -0.15798  -0.06738  -0.01244   1.78359  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -4.1679     0.2372 -17.573  < 2e-16 ***
## Wind.speed              1.5099     0.1547   9.757  < 2e-16 ***
## distance_first_impact  -0.3467     0.1154  -3.006  0.00265 ** 
## Elevation              -0.3597     0.2072  -1.736  0.08257 .  
## Population.density     -0.5586     0.2700  -2.069  0.03857 *  
## X..strong.roof.type    -0.2420     0.1230  -1.967  0.04921 *  
## Experience             -0.3302     0.1637  -2.017  0.04371 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 346.490  on 1614  degrees of freedom
## Residual deviance:  96.169  on 1608  degrees of freedom
## AIC: 276.4
## 
## Number of Fisher Scoring iterations: 7

The variables selected are the same than in the previous model (with less significance in most of the predictors), with the exception that ‘slope’ does not appear. This might as well be a statistical artifact.

Model validation

Cross-validation errors

To evaluate our fitted models, we perform cross-validation on the model structure with data from each typhoon as test sets.

typhoons <- unique(raw_data$disaster_name)
table(raw_data$disaster_name)
## 
##    Bopha     Goni  Hagupit    Haima   Haiyan Kalmaegi    Koppu    Melor 
##      173       32       77      299      159       92      202      102 
## Nock-Ten Rammasun   Sarika     Utor 
##       80      289       57       53
cv_pc <- data.frame("Split" = 1:length(typhoons))
cv_pc$RMSE <- NA
cv_pc$MAPE <- NA

cv_td <- data.frame("Split" = 1:length(typhoons))
cv_td$RMSE <- NA
cv_td$MAPE <- NA

for (i in 1:length(typhoons)){
  
 train.i <- which(raw_data$disaster_name != typhoons[i])
 
 if(include_transform){
   
   # Link-transformed damage proportions 
   link_fn_td.i <- link_fn(raw_data[train.i, ]$Total.damaged.houses..rel..)
   link_fn_pc.i <- link_fn(raw_data[train.i, ]$Completely.damaged..rel..)
   
   # Set up dataframe with first row for td and second row for pc
   power_df.i <- matrix(NA, nrow = 2, ncol = length(var_names))
   colnames(power_df.i) <- var_names
   
   td_data.i <- raw_data
   pc_data.i <- raw_data
   
   td_data.i <- td_data.i[, -omit_var_td]
   pc_data.i <- pc_data.i[, -omit_var_pc]   
   
   for (j in 1:length(var_names)){
     
     var_i <- var_names[j]
     power_df.i[1, var_i] <- power_est(boxCox(link_fn_td.i - min(link_fn_td.i)*1.01 ~ raw_data[train.i, ][, var_i], plotit = FALSE))
     power_df.i[2, var_i] <- power_est(boxCox(link_fn_pc.i - min(link_fn_pc.i)*1.01 ~ raw_data[train.i, ][, var_i], plotit = FALSE))
     
     if(power_df.i[1, var_i]==0){ 
       td_data.i[, var_i] <- log(td_data.i[, var_i])
     }else{td_data.i[, var_i] <- td_data.i[, var_i]^power_df.i[1, var_i]}
     
     
     cols.mean.td.i <- mean(td_data.i[train.i, var_i]) 
     cols.sd.td.i <- sd(td_data.i[train.i, var_i])
     td_data.i[, var_i] <- (td_data.i[, var_i] - cols.mean.td.i)/cols.sd.td.i
     
     if(power_df.i[2, var_i]==0){
       pc_data.i[, var_i] <- log(pc_data.i[, var_i])
     }else{pc_data.i[, var_i] <- pc_data.i[, var_i]^power_df.i[2, var_i]}
     
     cols.mean.pc.i <- mean(pc_data.i[train.i, var_i]) 
     cols.sd.pc.i <- sd(pc_data.i[train.i, var_i])
     pc_data.i[, var_i] <- (pc_data.i[, var_i] - cols.mean.pc.i)/cols.sd.pc.i
     
   }
   
   # Fit Multi-Logistic Model
    td_full.i <- glm(Total.damaged.houses..rel.. ~ ., data= td_data.i, family = binomial(link=chosen_link), trace = FALSE)

    ## Check on variable collinearity
    temp_model <- td_full.i
    temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
    temp_var <- "dummy"
    temp_data <- td_data.i

    while(max(temp_vif)>10){
  
     var_remove <- which(temp_vif == max(temp_vif))
     temp_var <- append(temp_var, names(var_remove))
     temp_data <- temp_data[, -which(colnames(temp_data) == names(var_remove))]
     temp_model <- (glm(Total.damaged.houses..rel.. ~ ., data= temp_data, family = binomial(link=chosen_link), trace = FALSE))
     temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
    }

   td_col_to_use.i <- colnames(temp_data)

   # Fit Multi-Logistic Model
   pc_full.i <- glm(Completely.damaged..rel.. ~ ., data= pc_data.i, family = binomial(link=chosen_link), trace = FALSE)
   
   ## Check on variable collinearity
   temp_model <- pc_full.i
   temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
   temp_var <- "dummy"
   temp_data <- pc_data.i
   
   while(max(temp_vif)>10){
     
     var_remove <- which(temp_vif == max(temp_vif))
     temp_var <- append(temp_var, names(var_remove))
     temp_data <- temp_data[, -which(colnames(temp_data) == names(var_remove))]
     temp_model <- (glm(Completely.damaged..rel.. ~ ., data= temp_data, family = binomial(link=chosen_link), trace = FALSE))
     temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
   }
   
   pc_col_to_use.i <- colnames(temp_data)

   td_data.i <- td_data.i[, colnames(td_data.i) %in% td_col_to_use.i]
   pc_data.i <- pc_data.i[, colnames(pc_data.i) %in% pc_col_to_use.i]
   
   td.training.data.i <- td_data.i[train.i, ]
   td.test.data.i <- td_data.i[-train.i, ]
   
   pc.training.data.i <- pc_data.i[train.i, ]
   pc.test.data.i <- pc_data.i[-train.i, ]
   
   pc_full.i <- glm(Completely.damaged..rel.. ~ ., data= pc.training.data.i, family = binomial(link=chosen_link), trace = FALSE)

   pc_model.i <-  stepAIC(pc_full.i, trace = 0)

   td_full.i <- glm(Total.damaged.houses..rel.. ~ ., data= td.training.data.i, family = binomial(link=chosen_link), trace = FALSE)

   td_model.i <-  stepAIC(td_full.i, trace = 0)
   
   }else{

   pc.training.data.i <- pc_data[train.i, ]
   pc.test.data.i <- pc_data[-train.i, ]
     
   pc_model <- pc_step$formula
   pc_model.i <- glm(pc_model, data= pc.training.data.i, family = binomial(link=chosen_link), trace = FALSE)
  
   td.training.data.i <- td_data[train.i, ]
   td.test.data.i <- td_data[-train.i, ]
   
   td_model <- td_step$formula
   td_model.i <- glm(td_model, data= td.training.data.i, family = binomial(link=chosen_link), trace = FALSE)
 
 }
 
 pc.test.pred.i <- predict(pc_model.i, newdata = pc.test.data.i, type = "response")
 pc.test.res.i <- pc.test.pred.i - pc.test.data.i$Completely.damaged..rel..
 pc.test.rmse.i <- sqrt(mean(pc.test.res.i^2))
 pc.test.mape.i <- mean(abs(pc.test.res.i/pc.test.data.i$Completely.damaged..rel..))
 cv_pc[i, c("RMSE", "MAPE")] <- c(pc.test.rmse.i, pc.test.mape.i)

 td.test.pred.i <- predict(td_model.i, newdata = td.test.data.i, type = "response")
  
 # If the estimate of houses damaged is less than the estimate of houses completely damaged, set it to the latter.
 td.test.pred.i[td.test.pred.i<pc.test.pred.i] <- pc.test.pred.i
 td.test.res.i <- td.test.pred.i - td.test.data.i$Total.damaged.houses..rel..
 td.test.rmse.i <- sqrt(mean(td.test.res.i^2))
 td.test.mape.i <- mean(abs(td.test.res.i/td.test.data.i$Total.damaged.houses..rel..))
 cv_td[i, c("RMSE", "MAPE")] <- c(td.test.rmse.i, td.test.mape.i)

}

colMeans(cv_pc[, c("RMSE", "MAPE")])
##        RMSE        MAPE 
##  0.07390853 38.76647185
cv_pc
##    Split       RMSE        MAPE
## 1      1 0.13166103  97.2087230
## 2      2 0.01922374   5.3899594
## 3      3 0.03078913  15.4966904
## 4      4 0.04749458  56.6814891
## 5      5 0.20752031   0.9356902
## 6      6 0.02426268  53.5214649
## 7      7 0.02930979  39.2189082
## 8      8 0.17069310  12.0584787
## 9      9 0.09489228  26.0775407
## 10    10 0.05420394   7.5692131
## 11    11 0.01772350  31.9201821
## 12    12 0.05912826 119.1193224
colMeans(cv_td[, c("RMSE", "MAPE")])
##       RMSE       MAPE 
##  0.1668861 53.3935719
cv_td
##    Split       RMSE       MAPE
## 1      1 0.19376474  38.410240
## 2      2 0.13677559  13.626954
## 3      3 0.12920052  17.831413
## 4      4 0.13590789  85.054196
## 5      5 0.18251489   0.504323
## 6      6 0.09491613  50.191799
## 7      7 0.13318883  32.099336
## 8      8 0.26227135  24.188147
## 9      9 0.25684684  24.080305
## 10    10 0.16329686   6.365589
## 11    11 0.10466218  44.615731
## 12    12 0.20928757 303.754830

The total damage model has a significantly higher cross-validation error. In both cases, a large portion of the error seems to be driven by a couple of events that lead to errors which are much larger than the average. Note that by treating each typhoon as a test set, we interpret the Experience variable as a static, long-run frequency of experiencing typhoons instead of the way it is actually calculated: as a dynamic variable which accumulates for each municipality over the timeframe of the data (2012-2016). An alternative cross-validation method would be to only use past typhoons to predict for later ones.

Residual analysis

Next, we check if the model residuals are symmetrically distributed about zero and if the model has any bias in terms of over or under-prediction.

pc.pred <- predict(pc_step, type = "response")
pc.res <- pc.pred - pc_data$Completely.damaged..rel..
 
hist(pc.res)

plot(link_fn(pc_data$Completely.damaged..rel..), link_fn(pc.pred), asp = 1)
abline(a = 0, b = 1, add = TRUE)
## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): "add" is
## not a graphical parameter

td.pred <- predict(td_step, type = "response")
# If the estimate of houses damaged is less than the estimate of houses completely damaged, set it to the latter.
td.pred[td.pred<pc.pred] <- pc.pred
 
td.res <- td.pred - td_data$Total.damaged.houses..rel..

hist(td.res)

plot(link_fn(td_data$Total.damaged.houses..rel..), link_fn(td.pred), asp = 1)
abline(a = 0, b = 1, add = TRUE)
## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): "add" is
## not a graphical parameter

In both cases, the models seems to overpredict damage for low damages. This is important to take into account when evaluating the predictions done with this model. Further analysis to correct this bias can be explored, such us non-linearity in some of the predictors (e.g. wind speed), and eventually update the model.

Prediction for Goni Typhoon

A new dataset is prepared for the new Goni Typhoon. The same columns from the original dataset are used, but only one row per municipality is left. The columns corresponding to Wind speed, Distance to typhoon and Distance to first impact are completed based on recently acquired information (see other reports from DASL).

# Read CSV with new dataset
new_data <- read.csv(file="data//All_Goni_pred.csv", header=TRUE, sep=",")

# Choose which wind speed data to use
new_data$Wind.speed <- new_data$Wind.speed.max
# new_data$Wind.speed <- new_data$Wind.speed.mean

# Set a minimum distance from typhoon and distance from first impact
min_dist <- 0.01*min(new_data$distance_first_impact[new_data$distance_first_impact>0], new_data$Distance.to.typhoon[new_data$Distance.to.typhoon>0])
new_data$distance_first_impact[new_data$distance_first_impact==0] <- min_dist
new_data$Distance.to.typhoon[new_data$Distance.to.typhoon==0] <- min_dist

head(new_data)
##   X disaster_type disaster_name       pcode Completely.damaged..abs..
## 1 0       Typhoon          Goni PH012801000                0.05559265
## 2 1       Typhoon          Goni PH012802000                1.05953427
## 3 2       Typhoon          Goni PH012803000                1.06816249
## 4 3       Typhoon          Goni PH012804000                0.55749366
## 5 4       Typhoon          Goni PH012805000                1.82812867
## 6 5       Typhoon          Goni PH012806000                0.48772622
##   Total.damaged.houses..abs.. Completely.damaged..rel..
## 1                    1.547008              0.0001240907
## 2                  106.291491              0.0001315579
## 3                   65.854439              0.0001351420
## 4                   67.084419              0.0001519885
## 5                   93.473126              0.0001324707
## 6                   35.297867              0.0001995402
##   Total.damaged.houses..rel.. ratio_comp_part Total...of.houses Wind.speed.mean
## 1                 0.003453144       0.1589327            448.00              10
## 2                 0.013197764       1.0122643           8053.75              31
## 3                 0.008331786       0.1273040           7904.00              17
## 4                 0.018289100       0.1184767           3668.00              27
## 5                 0.006773292       0.3971472          13800.25              17
## 6                 0.014441185       0.6618326           2444.25              45
##   Wind.speed.max Distance.to.typhoon rainfall distance_first_impact     Slope
## 1             19            442.1175   493.20              650.4285 16.827038
## 2             60            412.2540   528.00              653.2009  2.127614
## 3             36            370.8016   256.20              629.5410  7.368456
## 4             60            440.0760   204.96              664.1004 11.938009
## 5             36            387.0933   217.60              635.4122  5.234450
## 6             60            435.9028   202.40              669.5462  4.794050
##   Elevation ruggedness_stdev Ruggedness slope_stdev
## 1 558.75002         39.76238   84.40795    8.536798
## 2  23.79743         12.14331   12.05175    2.491546
## 3  97.69650         39.58078   37.36870    8.268917
## 4 220.46549         41.11407   59.74080    8.831068
## 5  91.41104         23.80632   27.48998    5.180507
## 6 168.20983         17.87840   24.79369    3.935199
##   Predicted.damage.class..1.5. Population land_area Population.density
## 1                           NA       1808  127.0823            14.2270
## 2                           NA      32496   63.0110           515.7043
## 3                           NA      31893   91.8767           347.1266
## 4                           NA      14802  131.2743           112.7567
## 5                           NA      55635  180.1741           308.7803
## 6                           NA       9777  156.2251            62.5827
##   Poverty.incidence X..strong.roof.type X..strong.wall.type
## 1            0.1791           0.9732360           0.6107056
## 2            0.1141           0.9942983           0.9113645
## 3            0.1404           0.9164221           0.8213836
## 4            0.1197           0.9612337           0.8902094
## 5            0.1100           0.9774862           0.8666991
## 6            0.1378           0.8979592           0.8078231
##   X..skilled.Agriculture.Forestry.Fishermen      Region prov Experience
## 1                                0.15569196 north luzon    0   3.262295
## 2                                0.10277821 north luzon    0   3.262295
## 3                                0.13199013 north luzon    0   3.262295
## 4                                0.09385224 north luzon    0   3.262295
## 5                                0.09592217 north luzon    0   3.262295
## 6                                0.13020354 north luzon    0   3.262295
##   Bicol.region Wind.speed Total.damaged.houses..rel..se
## 1            0         19                   0.001366342
## 2            0         60                   0.004624503
## 3            0         36                   0.002811769
## 4            0         60                   0.006337689
## 5            0         36                   0.002338853
## 6            0         60                   0.005598435
##   Total.damaged.houses..abs..se Completely.damaged..rel..se
## 1                     0.6121211                8.993182e-05
## 2                    37.2445947                9.808454e-05
## 3                    22.2242194                9.346595e-05
## 4                    23.2466444                1.066735e-04
## 5                    32.2767539                9.414416e-05
## 6                    13.6839749                1.412399e-04
##   Completely.damaged..abs..se
## 1                  0.04028946
## 2                  0.78994834
## 3                  0.73875483
## 4                  0.39127852
## 5                  1.29921298
## 6                  0.34522561

Data transformation and scaling

The variables are power-transformed to be able to be used as predictors in the models calibrated previously. Then, the data is scaled using the mean and standard deviation for each covariate from the training set from the model calibration.

new_td_data <- new_data
new_pc_data <- new_data

for (i in 1:length(var_names)){

  if(power_df[1, var_names[i]]==0){
    new_td_data[, var_names[i]] <- (log(new_td_data[, var_names[i]]) - cols.mean.td[i])/cols.sd.td[i]
  }else{
    new_td_data[, var_names[i]] <- ((new_td_data[, var_names[i]]^power_df[1, var_names[i]]) - cols.mean.td[i])/cols.sd.td[i]
  }
  
  if(power_df[2, var_names[i]]==0){
    new_pc_data[, var_names[i]] <- (log(new_pc_data[, var_names[i]]) - cols.mean.pc[i])/cols.sd.pc[i]
  }else{
    new_pc_data[, var_names[i]] <- (new_pc_data[, var_names[i]]^power_df[2, var_names[i]] - cols.mean.pc[i])/cols.sd.pc[i]
  }
  
}

Computed predictions for the 1st model: Total proportion of damaged houses

The model calibrated makes use of wind speed, distance to first impact, elevation, slope, strong roof type type and experience as predictors of damage.

td_step
## 
## Call:  glm(formula = Total.damaged.houses..rel.. ~ Wind.speed + distance_first_impact + 
##     Slope + Elevation + Population.density + X..strong.roof.type + 
##     Experience, family = binomial(link = chosen_link), data = td_data, 
##     trace = FALSE)
## 
## Coefficients:
##           (Intercept)             Wind.speed  distance_first_impact  
##               -2.1699                 1.1880                -0.3593  
##                 Slope              Elevation     Population.density  
##                0.4576                -0.7395                -0.3352  
##   X..strong.roof.type             Experience  
##               -0.2394                -0.1602  
## 
## Degrees of Freedom: 1614 Total (i.e. Null);  1607 Residual
## Null Deviance:       721.6 
## Residual Deviance: 285.8     AIC: 752.7
# Predict proportion of all damaged houses and standard errors
pred.td_prop_bldgs = predict(td_step, newdata = new_td_data, type = "response", se.fit = TRUE)
# Predict number of all damaged houses
pred.td_num_bldgs = pred.td_prop_bldgs$fit * new_data$Total...of.houses
# Add predicted values and standard errors to data set
new_data$Total.damaged.houses..rel.. = pred.td_prop_bldgs$fit
new_data$Total.damaged.houses..abs.. = pred.td_num_bldgs
new_data$Total.damaged.houses..rel..se = pred.td_prop_bldgs$se.fit
new_data$Total.damaged.houses..abs..se = pred.td_prop_bldgs$se.fit * new_data$Total...of.houses

Computed predictions for the 2nd model: Proportion of completely damaged houses

The model calibrated makes use of wind speed, distance to first impact, elevation, strong roof type type and experience as predictors of damage.

pc_step
## 
## Call:  glm(formula = Completely.damaged..rel.. ~ Wind.speed + distance_first_impact + 
##     Elevation + Population.density + X..strong.roof.type + Experience, 
##     family = binomial(link = chosen_link), data = pc_data, trace = FALSE)
## 
## Coefficients:
##           (Intercept)             Wind.speed  distance_first_impact  
##               -4.1679                 1.5099                -0.3467  
##             Elevation     Population.density    X..strong.roof.type  
##               -0.3597                -0.5586                -0.2420  
##            Experience  
##               -0.3302  
## 
## Degrees of Freedom: 1614 Total (i.e. Null);  1608 Residual
## Null Deviance:       346.5 
## Residual Deviance: 96.17     AIC: 276.4
# Predict proportion of all damaged houses
pred.pc_prop_bldgs = predict(pc_step, newdata =new_pc_data, type = "response", se.fit = TRUE)
# Predict number of all damaged houses
pred.pc_num_bldgs = pred.pc_prop_bldgs$fit * new_data$Total...of.houses
# Add predicted values and standard errors to data set
new_data$Completely.damaged..rel.. = pred.pc_prop_bldgs$fit
new_data$Completely.damaged..abs.. = pred.pc_num_bldgs
new_data$Completely.damaged..rel..se = pred.pc_prop_bldgs$se.fit
new_data$Completely.damaged..abs..se = pred.pc_prop_bldgs$se.fit * new_data$Total...of.houses


head(new_data)
##   X disaster_type disaster_name       pcode Completely.damaged..abs..
## 1 0       Typhoon          Goni PH012801000                0.05559265
## 2 1       Typhoon          Goni PH012802000                1.05953427
## 3 2       Typhoon          Goni PH012803000                1.06816249
## 4 3       Typhoon          Goni PH012804000                0.55749366
## 5 4       Typhoon          Goni PH012805000                1.82812867
## 6 5       Typhoon          Goni PH012806000                0.48772622
##   Total.damaged.houses..abs.. Completely.damaged..rel..
## 1                    1.547008              0.0001240907
## 2                  106.291491              0.0001315579
## 3                   65.854439              0.0001351420
## 4                   67.084419              0.0001519885
## 5                   93.473126              0.0001324707
## 6                   35.297867              0.0001995402
##   Total.damaged.houses..rel.. ratio_comp_part Total...of.houses Wind.speed.mean
## 1                 0.003453144       0.1589327            448.00              10
## 2                 0.013197764       1.0122643           8053.75              31
## 3                 0.008331786       0.1273040           7904.00              17
## 4                 0.018289100       0.1184767           3668.00              27
## 5                 0.006773292       0.3971472          13800.25              17
## 6                 0.014441185       0.6618326           2444.25              45
##   Wind.speed.max Distance.to.typhoon rainfall distance_first_impact     Slope
## 1             19            442.1175   493.20              650.4285 16.827038
## 2             60            412.2540   528.00              653.2009  2.127614
## 3             36            370.8016   256.20              629.5410  7.368456
## 4             60            440.0760   204.96              664.1004 11.938009
## 5             36            387.0933   217.60              635.4122  5.234450
## 6             60            435.9028   202.40              669.5462  4.794050
##   Elevation ruggedness_stdev Ruggedness slope_stdev
## 1 558.75002         39.76238   84.40795    8.536798
## 2  23.79743         12.14331   12.05175    2.491546
## 3  97.69650         39.58078   37.36870    8.268917
## 4 220.46549         41.11407   59.74080    8.831068
## 5  91.41104         23.80632   27.48998    5.180507
## 6 168.20983         17.87840   24.79369    3.935199
##   Predicted.damage.class..1.5. Population land_area Population.density
## 1                           NA       1808  127.0823            14.2270
## 2                           NA      32496   63.0110           515.7043
## 3                           NA      31893   91.8767           347.1266
## 4                           NA      14802  131.2743           112.7567
## 5                           NA      55635  180.1741           308.7803
## 6                           NA       9777  156.2251            62.5827
##   Poverty.incidence X..strong.roof.type X..strong.wall.type
## 1            0.1791           0.9732360           0.6107056
## 2            0.1141           0.9942983           0.9113645
## 3            0.1404           0.9164221           0.8213836
## 4            0.1197           0.9612337           0.8902094
## 5            0.1100           0.9774862           0.8666991
## 6            0.1378           0.8979592           0.8078231
##   X..skilled.Agriculture.Forestry.Fishermen      Region prov Experience
## 1                                0.15569196 north luzon    0   3.262295
## 2                                0.10277821 north luzon    0   3.262295
## 3                                0.13199013 north luzon    0   3.262295
## 4                                0.09385224 north luzon    0   3.262295
## 5                                0.09592217 north luzon    0   3.262295
## 6                                0.13020354 north luzon    0   3.262295
##   Bicol.region Wind.speed Total.damaged.houses..rel..se
## 1            0         19                   0.001366342
## 2            0         60                   0.004624503
## 3            0         36                   0.002811769
## 4            0         60                   0.006337689
## 5            0         36                   0.002338853
## 6            0         60                   0.005598435
##   Total.damaged.houses..abs..se Completely.damaged..rel..se
## 1                     0.6121211                8.993182e-05
## 2                    37.2445947                9.808454e-05
## 3                    22.2242194                9.346595e-05
## 4                    23.2466444                1.066735e-04
## 5                    32.2767539                9.414416e-05
## 6                    13.6839749                1.412399e-04
##   Completely.damaged..abs..se
## 1                  0.04028946
## 2                  0.78994834
## 3                  0.73875483
## 4                  0.39127852
## 5                  1.29921298
## 6                  0.34522561

The predicted values are saved back in the csv.

# Write new dataset in csv
write.csv(new_data, 'data\\All_Goni_pred.csv', row.names=F)

Read and plot shapefile

library(rgdal)
## Loading required package: sp
## rgdal: version: 1.4-8, (SVN revision 845)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
##  Path to GDAL shared files: C:/Users/michele.nguyen/Documents/R/win-library/3.6/rgdal/gdal
##  GDAL binary built with GEOS: TRUE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: C:/Users/michele.nguyen/Documents/R/win-library/3.6/rgdal/proj
##  Linking to sp version: 1.3-2
library(sf)
## Warning: package 'sf' was built under R version 3.6.3
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(ggplot2)
dsn <- system.file("vectors", package = "rgdal")
# Read shapefiles
ph_mun = st_read('data\\Shapefiles\\PH_municipality.shp')
## Reading layer `PH_municipality' from data source `C:\Users\michele.nguyen\Documents\GitHub\storm-goni\GoniDamageModel\data\Shapefiles\PH_municipality.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1647 features and 13 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 114.2815 ymin: 4.587399 xmax: 126.605 ymax: 21.12176
## geographic CRS: WGS 84
goni_trace = st_read('data\\Shapefiles\\goni-path.shp')
## Reading layer `goni-path' from data source `C:\Users\michele.nguyen\Documents\GitHub\storm-goni\GoniDamageModel\data\Shapefiles\goni-path.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 2 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 114.2 ymin: 13.5 xmax: 140.1 ymax: 16.8
## geographic CRS: WGS 84
# Create new shapefile
ph_mun_new = ph_mun

tdprop.list = replicate(length(ph_mun$Mun_Code),NA)
tdnum.list = replicate(length(ph_mun$Mun_Code),NA)
pcprop.list = replicate(length(ph_mun$Mun_Code),NA)
pcnum.list = replicate(length(ph_mun$Mun_Code),NA)
tdprop.se.list = replicate(length(ph_mun$Mun_Code),NA)
tdnum.se.list = replicate(length(ph_mun$Mun_Code),NA)
pcprop.se.list = replicate(length(ph_mun$Mun_Code),NA)
pcnum.se.list = replicate(length(ph_mun$Mun_Code),NA)
windspeed.list = replicate(length(ph_mun$Mun_Code),NA)
dist_first_impact.list = replicate(length(ph_mun$Mun_Code),NA)
Slope.list  = replicate(length(ph_mun$Mun_Code),NA)
Elevation.list  = replicate(length(ph_mun$Mun_Code),NA)
Population.density.list  = replicate(length(ph_mun$Mun_Code),NA)
X..strong.roof.type.list = replicate(length(ph_mun$Mun_Code),NA)
Experience.list  = replicate(length(ph_mun$Mun_Code),NA)

for (i in 1:length(new_data$pcode)){
  pcode = new_data$pcode[i]
  windspeed = new_data$Wind.speed[i]
  dist_first_impact = new_data$distance_first_impact[i]
  tdprop = new_data$Total.damaged.houses..rel..[i]
  tdnum = new_data$Total.damaged.houses..abs..[i]
  pcprop = new_data$Completely.damaged..rel..[i]
  pcnum = new_data$Completely.damaged..abs..[i]
  tdprop.se = new_data$Total.damaged.houses..rel..se[i]
  tdnum.se = new_data$Total.damaged.houses..abs..se[i]
  pcprop.se = new_data$Completely.damaged..rel..se[i]
  pcnum.se = new_data$Completely.damaged..abs..se[i]
  Slope  = new_data$Slope[i]
  Elevation  = new_data$Elevation[i]
  Population.density  = new_data$Population.density[i]
  X..strong.roof.type = new_data$X..strong.roof.type[i]
  Experience  = new_data$Experience[i]

  
  # Look for pcode in shapefile attributes
  index = which(ph_mun$Mun_Code == as.character(pcode))
  tdprop.list[index] = tdprop
  tdnum.list[index] = tdnum
  pcprop.list[index] = pcprop
  pcnum.list[index] = pcnum
  windspeed.list[index] = windspeed
  dist_first_impact.list[index] = dist_first_impact
  tdprop.se.list[index] = tdprop.se
  tdnum.se.list[index] = tdnum.se
  pcprop.se.list[index] = pcprop.se
  pcnum.se.list[index] = pcnum.se
  Slope.list[index]  = Slope
  Elevation.list[index]  = Elevation
  Population.density.list[index]  = Population.density
  X..strong.roof.type.list[index] = X..strong.roof.type
  Experience.list[index]  = Experience
}
# Save new fields in output shapefile
ph_mun_new$Wind_Speed = windspeed.list
ph_mun_new$dist_first_impact = dist_first_impact.list
ph_mun_new$tdprop = tdprop.list
ph_mun_new$tdnum = tdnum.list
ph_mun_new$pcprop = pcprop.list
ph_mun_new$pcnum = pcnum.list
ph_mun_new$tdprop.se = tdprop.se.list
ph_mun_new$tdnum.se = tdnum.se.list
ph_mun_new$pcprop.se = pcprop.se.list
ph_mun_new$pcnum.se = pcnum.se.list
ph_mun_new$Slope = Slope.list 
ph_mun_new$Elevation = Elevation.list 
ph_mun_new$Population.density = Population.density.list 
ph_mun_new$X..strong.roof.type = X..strong.roof.type.list 
ph_mun_new$Experience = Experience.list 

# st_write(ph_mun_new, 'data\\Shapefiles\\PH_municipality_results.shp')
cap.df <- data.frame(x = 126.8, y = 15.75, text = "Developed by the Disaster Analytics for Society Lab (DASL) at the \n Earth Observatory of Singapore and Nanyang Technological University.")

# Plot tdprop
ggplot() + theme(
  panel.background = element_rect(fill = "lightblue",
                                colour = "lightblue",
                                size = 0.5, linetype = "solid"),
  panel.grid.major = element_line(size = 0.2, linetype = 'dashed',
                                colour = "white")
  ) +
  geom_sf(data = ph_mun_new, aes(fill=tdprop), size = 0.1, color = "black") + 
  scale_fill_fermenter(breaks = c(0,0.2,0.4,0.6,0.8,1), type = "seq", palette = 7, direction=1,
                       na.value='grey50') + 
  geom_sf(data = goni_trace, size = 0.2, color = "red", fill = "cyan1") + 
  coord_sf(xlim=c(118,127), ylim=c(11,16), expand=F) + labs(fill = "Damage rate", x = "", y = "") + geom_text(data = cap.df, aes(x = x, y = y, label = text), size = 2.5, hjust = 1) +
  theme(legend.position =  c(0.025, 0.15), legend.justification = c(0, 0), legend.title = element_text(size = 7.5), legend.text = element_text(size = 7.5))

ggsave("graphics/goni-damagerate.png", dpi = 300, height = 6, width = 10, units = "in")
# Plot tdprop standard errors:

ggplot() + theme(
  panel.background = element_rect(fill = "lightblue",
                                colour = "lightblue",
                                size = 0.5, linetype = "solid"),
  panel.grid.major = element_line(size = 0.2, linetype = 'dashed',
                                colour = "white")
  ) +
  geom_sf(data = ph_mun_new, aes(fill=tdprop.se), size = 0.1, color = "black") + 
  scale_fill_gradient(low="white", high="red") + 
  geom_sf(data = goni_trace, size = 0.2, color = "red", fill = "cyan1") + 
  coord_sf(xlim=c(118,127), ylim=c(11,16), expand=F) + labs(fill = "Damage rate \n standard error", x = "", y = "") + geom_text(data = cap.df, aes(x = x, y = y, label = text), size = 2.5, hjust = 1) +
  theme(legend.position =  c(0.025, 0.15), legend.justification = c(0, 0), legend.title = element_text(size = 7.5), legend.text = element_text(size = 7.5))

# Plot tdnum
ggplot() + theme(
  panel.background = element_rect(fill = "lightblue",
                                colour = "lightblue",
                                size = 0.5, linetype = "solid"),
  panel.grid.major = element_line(size = 0.2, linetype = 'dashed',
                                colour = "white")
  ) +
  geom_sf(data = ph_mun_new, aes(fill=tdnum), size = 0.1, color = "black") + 
  scale_fill_fermenter(breaks = c(0,0.2,0.4,0.6,0.8)*10000, type = "seq", palette = 7, direction=1,
                       na.value='grey50') + 
  geom_sf(data = goni_trace, size = 0.2, color = "red", fill = "cyan1") + 
  coord_sf(xlim=c(118,127), ylim=c(11,16), expand=F) + labs(fill = "# Damaged houses", x = "", y = "") + geom_text(data = cap.df, aes(x = x, y = y, label = text), size = 2.5, hjust = 1) +
  theme(legend.position =  c(0.025, 0.15), legend.justification = c(0, 0), legend.title = element_text(size = 7.5), legend.text = element_text(size = 7.5))

ggsave("graphics/goni-damagedhouses.png", dpi = 300, height = 6, width = 10, units = "in")
# Plot pcprop
ggplot() + theme(
  panel.background = element_rect(fill = "lightblue",
                                colour = "lightblue",
                                size = 0.5, linetype = "solid"),
  panel.grid.major = element_line(size = 0.2, linetype = 'dashed',
                                colour = "white")
  ) +
  geom_sf(data = ph_mun_new, aes(fill=pcprop), size = 0.1, color = "black") + 
   scale_fill_gradient(low="white", high="red")  +
  geom_sf(data = goni_trace, size = 0.2, color = "red", fill = "cyan1") + 
  coord_sf(xlim=c(118,127), ylim=c(11,16), expand=F) + labs(fill = "Complete damage rate", x = "", y = "") + geom_text(data = cap.df, aes(x = x, y = y, label = text), size = 2.5, hjust = 1) +
  theme(legend.position =  c(0.025, 0.15), legend.justification = c(0, 0), legend.title = element_text(size = 7.5), legend.text = element_text(size = 7.5))

# Plot pcprop standard errors
ggplot() + theme(
  panel.background = element_rect(fill = "lightblue",
                                colour = "lightblue",
                                size = 0.5, linetype = "solid"),
  panel.grid.major = element_line(size = 0.2, linetype = 'dashed',
                                colour = "white")
  ) +
  geom_sf(data = ph_mun_new, aes(fill=pcprop.se), size = 0.1, color = "black") + 
  scale_fill_gradient(low="white", high="red") + 
  geom_sf(data = goni_trace, size = 0.2, color = "red", fill = "cyan1") + 
  coord_sf(xlim=c(118,127), ylim=c(11,16), expand=F) + labs(fill = "Complete damage rate \n standard error", x = "", y = "") + geom_text(data = cap.df, aes(x = x, y = y, label = text), size = 2.5, hjust = 1) +
  theme(legend.position =  c(0.025, 0.15), legend.justification = c(0, 0), legend.title = element_text(size = 7.5), legend.text = element_text(size = 7.5))

The top 10 municipalities hit, in terms of damage rate, are:

top10.dr <- sort(ph_mun_new$tdprop, decreasing = TRUE)[1:10]

top10.dr.df <- data.frame(ph_mun_new[ph_mun_new$tdprop %in% top10.dr, c("Mun_Name", "tdprop")])

top10.dr.df <- top10.dr.df[order(top10.dr.df$tdprop, decreasing = TRUE), ]

top10.dr.df[, c("Mun_Name", "tdprop")]
##                  Mun_Name    tdprop
## 567                  VIGA 0.9181192
## 558             BAGAMANOC 0.9137369
## 566            SAN MIGUEL 0.8992065
## 564     PANGANIBAN (PAYO) 0.8951749
## 563                PANDAN 0.8875699
## 562               GIGMOTO 0.8857977
## 561             CARAMORAN 0.8770626
## 531              CARAMOAN 0.8446999
## 565 SAN ANDRES (CALOLBON) 0.7725639
## 534          GARCHITORENA 0.7692863

The top 10 municipalities hit, in terms of number of damaged houses, are:

top10.db <- sort(ph_mun_new$tdnum, decreasing = TRUE)[1:10]

top10.db.df <- data.frame(ph_mun_new[ph_mun_new$tdnum %in% top10.db, c("Mun_Name", "tdnum")])

top10.db.df <- top10.db.df[order(top10.db.df$tdnum, decreasing = TRUE), ]

top10.db.df[, c("Mun_Name", "tdnum")]
##                  Mun_Name     tdnum
## 557              TINAMBAC 10074.036
## 531              CARAMOAN 10052.985
## 568       VIRAC (Capital)  9998.508
## 537               LAGONOY  8727.877
## 566            SAN MIGUEL  8319.909
## 528             CALABANGA  8244.199
## 538              LIBMANAN  7347.871
## 565 SAN ANDRES (CALOLBON)  7103.532
## 535                   GOA  6600.385
## 561             CARAMORAN  6590.249

Finally, we provide a KMZ file of the map to enable visualisation via Google Earth:

spdf <- as(ph_mun_new, "Spatial")
spdf@data <- spdf@data[, c("Mun_Name", "tdprop", "tdprop.se", "tdnum", "tdnum.se", "pcprop", "pcprop.se", "pcnum", "pcnum.se", "Wind_Speed", "dist_first_impact", "Slope", "Elevation", "Population.density", "X..strong.roof.type", "Experience")]
colnames(spdf@data) <- c("Municipality", "Damage_rate", "Damage_rate_standard_error", "No_of_damaged_houses", "No_of_damaged_houses_standard_error", "Complete_damage_rate", "Complete_damage_rate_standard_error", "No_of_completely_damaged_houses", "No_of_completely_damaged_houses_standard_error", "Wind_speed", "Distance_to_first_impact", "Slope", "Elevation", "Population_density", "Proportion_of_strong_roof_type", "Experience")
spdf@data$Municipality <- as.character(spdf@data$Municipality)
# Replace pattern in string:
spdf@data$Municipality[spdf@data$Municipality == "GABALDON (BITULOK & SABANI)"] <- "GABALDON (BITULOK AND SABANI)"

shape = "http://maps.google.com/mapfiles/kml/pal2/icon18.png"

# Round variables

# To 2 decimal places
spdf@data[, which(!(colnames(spdf@data)%in% c("Municipality", "Damage_rate", "Complete_damage_rate", "No_of_damaged_houses", "No_of_completely_damaged_houses")))] <-  round(spdf@data[, which(!(colnames(spdf@data)%in% c("Municipality", "Damage_rate", "Complete_damage_rate", "No_of_damaged_houses", "No_of_completely_damaged_houses")))], digits = 2)

# To the nearest 5%
spdf@data[, "Damage_rate"] <- round(spdf@data[, "Damage_rate"]/0.05)*0.05
spdf@data[, "Complete_damage_rate"] <- round(spdf@data[, "Complete_damage_rate"]/0.05)*0.05

# To the nearest 100
spdf@data[, "No_of_damaged_houses"] <- round(spdf@data[, "No_of_damaged_houses"]/100)*100
spdf@data[, "No_of_completely_damaged_houses"] <- round(spdf@data[, "No_of_completely_damaged_houses"]/100)*100

head(spdf@data)
##    Municipality Damage_rate Damage_rate_standard_error No_of_damaged_houses
## 1         ADAMS           0                       0.00                    0
## 2       BACARRA           0                       0.00                  100
## 3         BADOC           0                       0.00                  100
## 4        BANGUI           0                       0.01                  100
## 5 CITY OF BATAC           0                       0.00                  100
## 6        BURGOS           0                       0.01                    0
##   No_of_damaged_houses_standard_error Complete_damage_rate
## 1                                0.61                    0
## 2                               37.24                    0
## 3                               22.22                    0
## 4                               23.25                    0
## 5                               32.28                    0
## 6                               13.68                    0
##   Complete_damage_rate_standard_error No_of_completely_damaged_houses
## 1                                   0                               0
## 2                                   0                               0
## 3                                   0                               0
## 4                                   0                               0
## 5                                   0                               0
## 6                                   0                               0
##   No_of_completely_damaged_houses_standard_error Wind_speed
## 1                                           0.04         19
## 2                                           0.79         60
## 3                                           0.74         36
## 4                                           0.39         60
## 5                                           1.30         36
## 6                                           0.35         60
##   Distance_to_first_impact Slope Elevation Population_density
## 1                   650.43 16.83    558.75              14.23
## 2                   653.20  2.13     23.80             515.70
## 3                   629.54  7.37     97.70             347.13
## 4                   664.10 11.94    220.47             112.76
## 5                   635.41  5.23     91.41             308.78
## 6                   669.55  4.79    168.21              62.58
##   Proportion_of_strong_roof_type Experience
## 1                           0.97       3.26
## 2                           0.99       3.26
## 3                           0.92       3.26
## 4                           0.96       3.26
## 5                           0.98       3.26
## 6                           0.90       3.26
plotKML(spdf, 'graphics/goni_damage', colour = "Damage_rate", plot.labpt = TRUE, shape = shape, alpha = 0.75, size = 0.2, balloon = TRUE)
## KML file opened for writing...
## Writing to KML...
## Closing  graphics/goni_damage.kml
## Warning in kml_View(file.name): No MIME type detected for .kml file extension.

Conclusions

Models were trained from past typhoon damage data both for proportion of damaged houses and proportion of completely damaged houses. In both cases, a similar set of covariates were found to be important for predicting damage: Wind speed (arguely the most relevant), distance to first impact of typhoon in land, proportion of strong roof houses, average slope, experience in past typhoons and mean elevation of the municipality.

Damage was predicted for the latest Goni Storm based on remotely obtained wind speed data (see other DASL report) and the observed typhoon trace. Proportion and total number of damaged houses per municipality was predicted using the calibrated models. The province of Catanduane seems to be the worst hitted by the typhoon in coincidence with the largest observed wind speeds.

Caution in the use and evaluation of predictions and thorough review of the model’s development and hypothesis is recommended before analyzing the results.