This notebook presents the development of a statistical model for building damage due to typhoons in the Philippines, using data from past 12 typhoons during the last 2 decades. The damage model is constructed to predict the proportion of damaged houses per municipality, countrywide, using a set of hazard related metrics, such as wind speed, and socio-economic indicators, such as poverty incidence or wall constructions quality, as predictors.
Two damage models are built: the first one to predict the proportion of damaged houses in a municipality, and the second one for the proportion of completely damaged houses in a municipality. Both models are chosen to be multivariate logistic for interpretation purposes (the estimated coefficients relate to log-odds) and because the a probit model gives similar performance.
chosen_link <- "logit"
if(chosen_link == "logit"){link_fn <- logit}
if(chosen_link == "probit"){link_fn <- qnorm}
include_transform <- TRUE # Set = TRUE if we want to compute cross-validation errors accounting for variable transformation, scaling, VIF and stepAIC.
The damage model is built based on past observations of building damages due to typhoons in the last decades (510 Global). The dataset has 1,638 observations corresponding to 1,034 countrywide municipalities and impacts from 12 typhoons. It has 34 columns corresponding to data surveyed in the aftermath of the typhoons and other site and socio-economic metrics corresponding to each municipality.
# Read CSV with complete dataset into R
raw_data <- read.csv(file="data//All.csv", header=TRUE, sep=",")
summary(raw_data)
## disaster_type disaster_name pcode Completely.damaged..abs..
## Typhoon:1638 Haima :301 PH012815000: 6 Min. : 1.0
## Rammasun:294 PH025012000: 6 1st Qu.: 1.0
## Koppu :203 PH143206000: 6 Median : 16.0
## Bopha :175 PH143209000: 6 Mean : 582.9
## Haiyan :162 PH148102000: 6 3rd Qu.: 269.5
## Melor :105 PH012804000: 5 Max. :14132.0
## (Other) :398 (Other) :1603
## Partly.damaged..abs.. Total.damaged.houses..abs..
## Min. : 1.0 Min. : 2
## 1st Qu.: 14.0 1st Qu.: 19
## Median : 179.5 Median : 225
## Mean : 1139.0 Mean : 1722
## 3rd Qu.: 1411.5 3rd Qu.: 1962
## Max. :46553.0 Max. :58823
##
## total_damage_houses_0p25weight Partly.damaged..rel.. Completely.damaged..rel..
## Min. : 1.25 Min. : 0.00097 Min. : 0.00117
## 1st Qu.: 7.00 1st Qu.: 0.19783 1st Qu.: 0.02581
## Median : 75.50 Median : 2.58792 Median : 0.22321
## Mean : 867.68 Mean : 11.86016 Mean : 6.10332
## 3rd Qu.: 694.56 3rd Qu.: 18.69875 3rd Qu.: 3.02683
## Max. :23908.25 Max. :115.34310 Max. :95.60731
##
## Total.damaged.houses..rel.. total_damage_houses_0p25weight_perc
## Min. : 0.00265 Min. : 0.00165
## 1st Qu.: 0.27720 1st Qu.: 0.10543
## Median : 3.25947 Median : 1.10478
## Mean : 17.96349 Mean : 9.06837
## 3rd Qu.: 27.26852 3rd Qu.: 9.26261
## Max. :129.97414 Max. :98.54451
##
## ratio_comp_part Total...of.houses Wind.speed Distance.to.typhoon
## Min. : 0.01125 Min. : 324.2 Min. : 40.00 Min. : 0.1225
## 1st Qu.: 0.24230 1st Qu.: 4700.8 1st Qu.: 60.00 1st Qu.: 24.3993
## Median : 0.57941 Median : 8017.6 Median : 80.00 Median : 50.3040
## Mean : 0.97434 Mean : 11767.5 Mean : 84.68 Mean : 74.8998
## 3rd Qu.: 0.97201 3rd Qu.: 13664.4 3rd Qu.:102.69 3rd Qu.:104.9571
## Max. :88.20134 Max. :194096.5 Max. :298.46 Max. :347.2922
##
## rainfall distance_first_impact Slope Elevation
## Min. : 45.0 Min. : 4.358 Min. : 0.4569 Min. : 2.516
## 1st Qu.: 244.9 1st Qu.: 110.800 1st Qu.: 3.9357 1st Qu.: 59.706
## Median : 303.5 Median : 175.940 Median : 7.1648 Median : 138.018
## Mean : 323.9 Mean : 197.212 Mean : 8.1863 Mean : 258.570
## 3rd Qu.: 376.5 3rd Qu.: 263.415 3rd Qu.:11.3319 3rd Qu.: 305.599
## Max. :1964.0 Max. :1223.384 Max. :25.2295 Max. :1899.717
##
## ruggedness_stdev Ruggedness slope_stdev
## Min. : 1.629 Min. : 3.497 Min. : 0.2861
## 1st Qu.:18.188 1st Qu.: 21.022 1st Qu.: 3.9357
## Median :29.284 Median : 36.834 Median : 6.3751
## Mean :28.602 Mean : 41.833 Mean : 6.0110
## 3rd Qu.:39.950 3rd Qu.: 57.244 3rd Qu.: 8.4818
## Max. :86.704 Max. :128.965 Max. :13.0245
##
## Predicted.damage.class..1.5. Population land_area
## Mode:logical Min. : 1291 Min. : 1.857
## NA's:1638 1st Qu.: 18807 1st Qu.: 80.481
## Median : 32108 Median : 146.901
## Mean : 47066 Mean : 215.934
## 3rd Qu.: 54659 3rd Qu.: 264.231
## Max. :776382 Max. :2428.147
##
## Population.density Poverty.incidence X..strong.roof.type X..strong.wall.type
## Min. : 4.5 Min. :0.0085 Min. :0.2362 Min. :0.05375
## 1st Qu.: 115.5 1st Qu.:0.1363 1st Qu.:0.7404 1st Qu.:0.43032
## Median : 229.1 Median :0.2250 Median :0.9155 Median :0.59095
## Mean : 452.2 Mean :0.2579 Mean :0.8345 Mean :0.58940
## 3rd Qu.: 434.3 3rd Qu.:0.3585 3rd Qu.:0.9630 3rd Qu.:0.77660
## Max. :34384.3 Max. :0.8476 Max. :1.0000 Max. :0.95228
##
## X..skilled.Agriculture.Forestry.Fishermen Region prov
## Min. :0.001049 central :533 0 :1597
## 1st Qu.:0.068022 metro luzon:263 albay : 25
## Median :0.103011 minandao :140 catanduane: 16
## Mean :0.107656 north luzon:702
## 3rd Qu.:0.138443
## Max. :0.340328
## NA's :21
## Experience Bicol.region
## Min. :1.000 Min. :0.0000
## 1st Qu.:1.278 1st Qu.:0.0000
## Median :2.034 Median :0.0000
## Mean :2.100 Mean :0.1172
## 3rd Qu.:2.800 3rd Qu.:0.0000
## Max. :4.280 Max. :1.0000
## NA's :1
The database is preprocessed to remove rows (observations) with empty values, and remove certain columns that are not going to be used in the model training.
# Remove rows with NA in skilled.Agriculture.Forestry.Fishermen (21 rows):
raw_data <- raw_data[!is.na(raw_data$X..skilled.Agriculture.Forestry.Fishermen), ]
# Remove row with NA in experience.
raw_data <- raw_data[!is.na(raw_data$Experience), ]
# Remove unnecessary columns
raw_data <- raw_data[, -which(colnames(raw_data) %in% c("Completely.damaged..abs..", "Partly.damaged..abs..", "Total.damaged.houses..abs..", "total_damage_houses_0p25weight", "Partly.damaged..rel..", "total_damage_houses_0p25weight_perc", "ratio_comp_part", "Total...of.houses", "Predicted.damage.class..1.5.", "Bicol.region", "Population", "land_area", "Ruggedness","ruggedness_stdev"))]
Caution: The variables ‘Ruggedness’ and ‘ruggedness_stdev’ that describe the variation rate of slope in the topography, were ruled out in favour of the ‘slope’ variable. They both have a ver high correlation and thus carry similar information regarding to damage prediction. This decision was taken due to the relative easier interpretation of the slope variable.
Next, we check for and remove outliers.
raw_data %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Remove outlier:
raw_data <- raw_data[-which(raw_data$Wind.speed == max(raw_data$Wind.speed)),]
sum(raw_data$Total.damaged.houses..abs.. > raw_data$Total...of.houses)
## [1] 0
max(raw_data$Total.damaged.houses..rel..[raw_data$Total.damaged.houses..rel..<100])
## [1] 99.92221
Since there are 13 municipalities where the number of total damaged houses exceed the number of houses on the record likely due to different data sources, we set an upper bound of 99.99% on the proportion of damaged houses. This is higher than the maximum observed value which is less than 100%.
# Replace >100 Total.damaged.houses..rel.. with 99.99
raw_data$Total.damaged.houses..rel..[raw_data$Total.damaged.houses..rel..>100] <- 99.99
Next, we convert the building damage proportions to a scale of 0-1 for modelling.
# Create proportion of houses which are damaged:
raw_data$Total.damaged.houses..rel.. <- raw_data$Total.damaged.houses..rel../100
# Create proportion of houses which are completely damaged:
raw_data$Completely.damaged..rel.. <- raw_data$Completely.damaged..rel../100
Each of the potential predictors of damage is tested using the Box-Cox transformation to determine if a transformation of power-law transformation of the variable is necessary to normalise the variables relative to the link-transformed damaged proportions.
# Link-transformed damage proportions
link_fn_td <- link_fn(raw_data$Total.damaged.houses..rel..)
link_fn_pc <- link_fn(raw_data$Completely.damaged..rel..)
# Round lambda est to the nearest half integer
power_est <- function(bC){l.est <- bC$x[bC$y == max(bC$y)]
return(round(l.est/0.5)*0.5)}
# Set up dataframe with first row for td and second row for pc
var_names <- colnames(raw_data)
var_names <- var_names[!(var_names %in% c("disaster_type", "disaster_name", "pcode", "Completely.damaged..rel..", "Total.damaged.houses..rel..", "Region", "prov"))]
power_df <- matrix(NA, nrow = 2, ncol = length(var_names))
colnames(power_df) <- var_names
for (i in 1:length(var_names)){
var_i <- var_names[i]
power_df[1, var_i] <- power_est(boxCox(link_fn_td - min(link_fn_td)*1.01 ~ raw_data[, var_i],
plotit = FALSE))
power_df[2, var_i] <- power_est(boxCox(link_fn_pc - min(link_fn_pc)*1.01 ~ raw_data[, var_i],
plotit = FALSE))
}
power_df
## Wind.speed Distance.to.typhoon rainfall distance_first_impact Slope
## [1,] 1.0 0.5 0.5 0.5 0.5
## [2,] 0.5 0.5 0.5 0.5 0.5
## Elevation slope_stdev Population.density Poverty.incidence
## [1,] 0.5 0.5 0.5 0.5
## [2,] 0.5 0.5 0.5 0.5
## X..strong.roof.type X..strong.wall.type
## [1,] 0.5 0.5
## [2,] 0.5 0.5
## X..skilled.Agriculture.Forestry.Fishermen Experience
## [1,] 0.5 0.5
## [2,] 0.5 0.5
Apply the obtained transformations:
td_data <- raw_data
pc_data <- raw_data
for (i in 1:length(var_names)){
var_i <- var_names[i]
if(power_df[1, var_i]==0){
td_data[, var_i] <- log(td_data[, var_i])
}else{td_data[, var_i] <- td_data[, var_i]^power_df[1, var_i]}
if(power_df[2, var_i]==0){
pc_data[, var_i] <- log(pc_data[, var_i])
}else{pc_data[, var_i] <- pc_data[, var_i]^power_df[2, var_i]}
}
All numeric columns are standardize to have mean 0 and standard deviation 1.
# Standardise data:
# Save to apply to Goni 2020 dataset later
cols.mean.td <- colMeans(td_data[, var_names] )
cols.sd.td <- apply(td_data[, var_names], 2, sd )
scaled.data.td <- lapply(td_data[, var_names], scale)
cols.mean.pc <- colMeans(td_data[, var_names] )
cols.sd.pc <- apply(td_data[, var_names], 2, sd )
scaled.data.pc <- lapply(pc_data[, var_names], scale)
# Switch out scaled numerical columns (apart from response variables)
td_data[, var_names] <- as.data.frame(scaled.data.td)
pc_data[, var_names] <- as.data.frame(scaled.data.pc)
In this section, we explore the data to identify possible relations between the predictors and the building damage proportions.
For the proportion of houses damaged:
td_data %>%
keep(is.numeric) %>%
gather(-Total.damaged.houses..rel.., -Completely.damaged..rel.., key = "var", value = "value") %>%
ggplot(aes(x = value, y = link_fn(Total.damaged.houses..rel..))) +
geom_point() +
facet_wrap(~ var, scales = "free") +
theme_bw()
For the proportion of houses completely damaged:
pc_data %>%
keep(is.numeric) %>%
gather(-Total.damaged.houses..rel.., -Completely.damaged..rel.., key = "var", value = "value") %>%
ggplot(aes(x = value, y = link_fn(Completely.damaged..rel..))) +
geom_point() +
facet_wrap(~ var, scales = "free") +
theme_bw()
For both models, a distinctive behavior can be seen for the Wind Speed predictor depending on its magnitude. While for lower wind speeds a rather scattered and significant positive slope can be observed, this increment of damage seems to saturate for higher wind speed (e.g. above average). This non-linearity was also explored in the model although without satisfactory results, so no changes were made. Similar trends can be seen for other variables, such as elevation and rainfall, but since wind speed is a major predictor variable, the analysis was constrained to it.
The dataset for the 1st damage model (Proportion of damage houses):
# Dataset for Proportion Damaged:
# Omit Population, Land.Area, Region and prov.
omit_var_td <- which(colnames(raw_data) %in% c("disaster_type", "disaster_name", "pcode", "Completely.damaged..rel..", "Region", "prov"))
td_data <- td_data[, -omit_var_td]
head(td_data)
## Total.damaged.houses..rel.. Wind.speed Distance.to.typhoon rainfall
## 1 0.0002225499 -1.2571410 1.6699987 -2.8985630
## 2 0.0018975332 -0.6908614 0.3872697 -1.2770629
## 3 0.0101243535 -0.9740012 1.0062739 0.3332327
## 4 0.0005569479 -0.6908614 0.5339093 -1.4256505
## 5 0.0006712911 -0.9740012 0.7843221 -0.8008816
## 6 0.0014214182 -0.9740012 1.1039695 0.3698862
## distance_first_impact Slope Elevation slope_stdev Population.density
## 1 5.129665 -1.0718403 -1.0334861 -0.51418359 1.0187587782
## 2 1.349430 -0.6883624 -0.5532139 -0.69190964 -0.0002248347
## 3 1.435343 -0.4932481 -0.4260822 -0.66493207 -0.2569326702
## 4 1.251112 -0.0203109 0.1860411 -0.05896435 -0.2237533693
## 5 1.176279 0.4651641 0.7762894 0.38980071 -0.1999884083
## 6 1.464872 -0.5844099 -0.7985256 -0.45485839 -0.0838508353
## Poverty.incidence X..strong.roof.type X..strong.wall.type
## 1 -0.9156615 0.7800073 0.92881818
## 2 -0.5707727 0.4770982 0.09587523
## 3 1.2687909 -0.2126670 -0.69068122
## 4 0.4591817 0.7754063 0.36482433
## 5 0.3988127 0.4541748 0.28660244
## 6 1.1709577 -0.5112979 -0.75516651
## X..skilled.Agriculture.Forestry.Fishermen Experience
## 1 -0.4856129 1.245906
## 2 -0.8788238 -1.401257
## 3 0.9973094 -1.401257
## 4 0.4261285 -1.401257
## 5 0.6513256 -1.401257
## 6 0.7471066 -1.401257
The dataset for the 2nd damage model (Proportion of completely damage houses):
# Dataset for Proportion of Completely Damaged:
omit_var_pc <- which(colnames(raw_data) %in% c("disaster_type", "disaster_name", "pcode", "Total.damaged.houses..rel..", "Region", "prov"))
pc_data <- pc_data[, -omit_var_pc]
head(pc_data)
## Completely.damaged..rel.. Wind.speed Distance.to.typhoon rainfall
## 1 0.0001112749 -1.4437362 1.6699987 -2.8985630
## 2 0.0003795066 -0.6762612 0.3872697 -1.2770629
## 3 0.0014672976 -1.0406652 1.0062739 0.3332327
## 4 0.0002784740 -0.6762612 0.5339093 -1.4256505
## 5 0.0002237637 -1.0406652 0.7843221 -0.8008816
## 6 0.0012921983 -1.0406652 1.1039695 0.3698862
## distance_first_impact Slope Elevation slope_stdev Population.density
## 1 5.129665 -1.0718403 -1.0334861 -0.51418359 1.0187587782
## 2 1.349430 -0.6883624 -0.5532139 -0.69190964 -0.0002248347
## 3 1.435343 -0.4932481 -0.4260822 -0.66493207 -0.2569326702
## 4 1.251112 -0.0203109 0.1860411 -0.05896435 -0.2237533693
## 5 1.176279 0.4651641 0.7762894 0.38980071 -0.1999884083
## 6 1.464872 -0.5844099 -0.7985256 -0.45485839 -0.0838508353
## Poverty.incidence X..strong.roof.type X..strong.wall.type
## 1 -0.9156615 0.7800073 0.92881818
## 2 -0.5707727 0.4770982 0.09587523
## 3 1.2687909 -0.2126670 -0.69068122
## 4 0.4591817 0.7754063 0.36482433
## 5 0.3988127 0.4541748 0.28660244
## 6 1.1709577 -0.5112979 -0.75516651
## X..skilled.Agriculture.Forestry.Fishermen Experience
## 1 -0.4856129 1.245906
## 2 -0.8788238 -1.401257
## 3 0.9973094 -1.401257
## 4 0.4261285 -1.401257
## 5 0.6513256 -1.401257
## 6 0.7471066 -1.401257
Variance Inflation Factors (VIF) are computed to check for multicollinearity. Variables with VIF>10 are removed from consideration.
# Fit Multi-Logistic Model for td
td_full <- glm(Total.damaged.houses..rel.. ~ ., data= td_data, family = binomial(link=chosen_link), trace = FALSE)
## Check on variable collinearity
temp_model <- td_full
temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
temp_var <- "dummy"
temp_data <- td_data
while(max(temp_vif)>10){
var_remove <- which(temp_vif == max(temp_vif))
temp_var <- append(temp_var, names(var_remove))
temp_data <- temp_data[, -which(colnames(temp_data) == names(var_remove))]
temp_model <- (glm(Total.damaged.houses..rel.. ~ ., data= temp_data, family = binomial(link=chosen_link), trace = FALSE))
temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
}
td_col_to_use <- colnames(temp_data)
temp_vif
## Wind.speed
## 2.852955
## Distance.to.typhoon
## 2.435281
## rainfall
## 1.419057
## distance_first_impact
## 1.505244
## Slope
## 9.059720
## Elevation
## 4.198276
## slope_stdev
## 5.967166
## Population.density
## 2.670852
## Poverty.incidence
## 3.453024
## X..strong.roof.type
## 2.201079
## X..strong.wall.type
## 3.058554
## X..skilled.Agriculture.Forestry.Fishermen
## 2.347577
## Experience
## 1.837720
temp_var
## [1] "dummy"
# Fit Multi-Logistic Model for pc
pc_full <- glm(Completely.damaged..rel.. ~ ., data= pc_data, family = binomial(link=chosen_link), trace = FALSE)
## Check on variable collinearity
temp_model <- pc_full
temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
temp_var <- "dummy"
temp_data <- pc_data
while(max(temp_vif)>10){
var_remove <- which(temp_vif == max(temp_vif))
temp_var <- append(temp_var, names(var_remove))
temp_data <- temp_data[, -which(colnames(temp_data) == names(var_remove))]
temp_model <- (glm(Completely.damaged..rel.. ~ ., data= temp_data, family = binomial(link=chosen_link), trace = FALSE))
temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
}
pc_col_to_use <- colnames(temp_data)
temp_vif
## Wind.speed
## 3.320326
## Distance.to.typhoon
## 2.202872
## rainfall
## 1.755294
## distance_first_impact
## 1.708302
## Slope
## 9.612807
## Elevation
## 4.386630
## slope_stdev
## 6.468108
## Population.density
## 2.757635
## Poverty.incidence
## 2.831486
## X..strong.roof.type
## 2.072141
## X..strong.wall.type
## 2.747146
## X..skilled.Agriculture.Forestry.Fishermen
## 2.376611
## Experience
## 2.161044
temp_var
## [1] "dummy"
# Subset variables based on VIF:
td_data <- td_data[, colnames(td_data) %in% td_col_to_use]
pc_data <- pc_data[, colnames(pc_data) %in% pc_col_to_use]
The stepwise variable selection uses AIC to compare between models.
# Fit Multi-Logistic Model
td_full <- glm(Total.damaged.houses..rel.. ~ ., data= td_data, family = binomial(link=chosen_link), trace = FALSE)
td_step <- stepAIC(td_full)
# Check the variable significance
summary(td_step)
##
## Call:
## glm(formula = Total.damaged.houses..rel.. ~ Wind.speed + distance_first_impact +
## Slope + Elevation + Population.density + X..strong.roof.type +
## Experience, family = binomial(link = chosen_link), data = td_data,
## trace = FALSE)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.22294 -0.30991 -0.15599 0.08836 2.67136
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.16986 0.10147 -21.384 < 2e-16 ***
## Wind.speed 1.18801 0.08946 13.280 < 2e-16 ***
## distance_first_impact -0.35929 0.08475 -4.239 2.24e-05 ***
## Slope 0.45756 0.18188 2.516 0.011876 *
## Elevation -0.73948 0.19996 -3.698 0.000217 ***
## Population.density -0.33519 0.15741 -2.129 0.033225 *
## X..strong.roof.type -0.23936 0.07781 -3.076 0.002097 **
## Experience -0.16023 0.09002 -1.780 0.075091 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 721.58 on 1614 degrees of freedom
## Residual deviance: 285.80 on 1607 degrees of freedom
## AIC: 752.7
##
## Number of Fisher Scoring iterations: 6
The correlation with the predictors seem as expected. Higher windspeeds give raise to higher damage, the same as municipalities with higher average slope where wind effects can be more severe. On the other hand, the distance to the typhoon’s first impact in land, the percentage of houses with strong roof and municipalities with more experience from past typhoons yield a lesser damage. The negative correlation with population density might be discriminating between municipalities with more dense urban land use and more rural ones, where it is expected to find a lower proportion of damage (although a larger absolute amount of damage). The reason for the sign of the correlation with elevation is not yet clear.
# Fit Multi-Logistic Model
pc_full <- glm(Completely.damaged..rel.. ~ ., data= pc_data, family = binomial(link=chosen_link), trace = FALSE)
pc_step <- stepAIC(pc_full)
# Check resulting model
pc_step
##
## Call: glm(formula = Completely.damaged..rel.. ~ Wind.speed + distance_first_impact +
## Elevation + Population.density + X..strong.roof.type + Experience,
## family = binomial(link = chosen_link), data = pc_data, trace = FALSE)
##
## Coefficients:
## (Intercept) Wind.speed distance_first_impact
## -4.1679 1.5099 -0.3467
## Elevation Population.density X..strong.roof.type
## -0.3597 -0.5586 -0.2420
## Experience
## -0.3302
##
## Degrees of Freedom: 1614 Total (i.e. Null); 1608 Residual
## Null Deviance: 346.5
## Residual Deviance: 96.17 AIC: 276.4
# Check the variable significance
summary(pc_step)
##
## Call:
## glm(formula = Completely.damaged..rel.. ~ Wind.speed + distance_first_impact +
## Elevation + Population.density + X..strong.roof.type + Experience,
## family = binomial(link = chosen_link), data = pc_data, trace = FALSE)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.73483 -0.15798 -0.06738 -0.01244 1.78359
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.1679 0.2372 -17.573 < 2e-16 ***
## Wind.speed 1.5099 0.1547 9.757 < 2e-16 ***
## distance_first_impact -0.3467 0.1154 -3.006 0.00265 **
## Elevation -0.3597 0.2072 -1.736 0.08257 .
## Population.density -0.5586 0.2700 -2.069 0.03857 *
## X..strong.roof.type -0.2420 0.1230 -1.967 0.04921 *
## Experience -0.3302 0.1637 -2.017 0.04371 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 346.490 on 1614 degrees of freedom
## Residual deviance: 96.169 on 1608 degrees of freedom
## AIC: 276.4
##
## Number of Fisher Scoring iterations: 7
The variables selected are the same than in the previous model (with less significance in most of the predictors), with the exception that ‘slope’ does not appear. This might as well be a statistical artifact.
To evaluate our fitted models, we perform cross-validation on the model structure with data from each typhoon as test sets.
typhoons <- unique(raw_data$disaster_name)
table(raw_data$disaster_name)
##
## Bopha Goni Hagupit Haima Haiyan Kalmaegi Koppu Melor
## 173 32 77 299 159 92 202 102
## Nock-Ten Rammasun Sarika Utor
## 80 289 57 53
cv_pc <- data.frame("Split" = 1:length(typhoons))
cv_pc$RMSE <- NA
cv_pc$MAPE <- NA
cv_td <- data.frame("Split" = 1:length(typhoons))
cv_td$RMSE <- NA
cv_td$MAPE <- NA
for (i in 1:length(typhoons)){
train.i <- which(raw_data$disaster_name != typhoons[i])
if(include_transform){
# Link-transformed damage proportions
link_fn_td.i <- link_fn(raw_data[train.i, ]$Total.damaged.houses..rel..)
link_fn_pc.i <- link_fn(raw_data[train.i, ]$Completely.damaged..rel..)
# Set up dataframe with first row for td and second row for pc
power_df.i <- matrix(NA, nrow = 2, ncol = length(var_names))
colnames(power_df.i) <- var_names
td_data.i <- raw_data
pc_data.i <- raw_data
td_data.i <- td_data.i[, -omit_var_td]
pc_data.i <- pc_data.i[, -omit_var_pc]
for (j in 1:length(var_names)){
var_i <- var_names[j]
power_df.i[1, var_i] <- power_est(boxCox(link_fn_td.i - min(link_fn_td.i)*1.01 ~ raw_data[train.i, ][, var_i], plotit = FALSE))
power_df.i[2, var_i] <- power_est(boxCox(link_fn_pc.i - min(link_fn_pc.i)*1.01 ~ raw_data[train.i, ][, var_i], plotit = FALSE))
if(power_df.i[1, var_i]==0){
td_data.i[, var_i] <- log(td_data.i[, var_i])
}else{td_data.i[, var_i] <- td_data.i[, var_i]^power_df.i[1, var_i]}
cols.mean.td.i <- mean(td_data.i[train.i, var_i])
cols.sd.td.i <- sd(td_data.i[train.i, var_i])
td_data.i[, var_i] <- (td_data.i[, var_i] - cols.mean.td.i)/cols.sd.td.i
if(power_df.i[2, var_i]==0){
pc_data.i[, var_i] <- log(pc_data.i[, var_i])
}else{pc_data.i[, var_i] <- pc_data.i[, var_i]^power_df.i[2, var_i]}
cols.mean.pc.i <- mean(pc_data.i[train.i, var_i])
cols.sd.pc.i <- sd(pc_data.i[train.i, var_i])
pc_data.i[, var_i] <- (pc_data.i[, var_i] - cols.mean.pc.i)/cols.sd.pc.i
}
# Fit Multi-Logistic Model
td_full.i <- glm(Total.damaged.houses..rel.. ~ ., data= td_data.i, family = binomial(link=chosen_link), trace = FALSE)
## Check on variable collinearity
temp_model <- td_full.i
temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
temp_var <- "dummy"
temp_data <- td_data.i
while(max(temp_vif)>10){
var_remove <- which(temp_vif == max(temp_vif))
temp_var <- append(temp_var, names(var_remove))
temp_data <- temp_data[, -which(colnames(temp_data) == names(var_remove))]
temp_model <- (glm(Total.damaged.houses..rel.. ~ ., data= temp_data, family = binomial(link=chosen_link), trace = FALSE))
temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
}
td_col_to_use.i <- colnames(temp_data)
# Fit Multi-Logistic Model
pc_full.i <- glm(Completely.damaged..rel.. ~ ., data= pc_data.i, family = binomial(link=chosen_link), trace = FALSE)
## Check on variable collinearity
temp_model <- pc_full.i
temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
temp_var <- "dummy"
temp_data <- pc_data.i
while(max(temp_vif)>10){
var_remove <- which(temp_vif == max(temp_vif))
temp_var <- append(temp_var, names(var_remove))
temp_data <- temp_data[, -which(colnames(temp_data) == names(var_remove))]
temp_model <- (glm(Completely.damaged..rel.. ~ ., data= temp_data, family = binomial(link=chosen_link), trace = FALSE))
temp_vif <- vif(temp_model) # Use a threshold of 10 to remove variables.
}
pc_col_to_use.i <- colnames(temp_data)
td_data.i <- td_data.i[, colnames(td_data.i) %in% td_col_to_use.i]
pc_data.i <- pc_data.i[, colnames(pc_data.i) %in% pc_col_to_use.i]
td.training.data.i <- td_data.i[train.i, ]
td.test.data.i <- td_data.i[-train.i, ]
pc.training.data.i <- pc_data.i[train.i, ]
pc.test.data.i <- pc_data.i[-train.i, ]
pc_full.i <- glm(Completely.damaged..rel.. ~ ., data= pc.training.data.i, family = binomial(link=chosen_link), trace = FALSE)
pc_model.i <- stepAIC(pc_full.i, trace = 0)
td_full.i <- glm(Total.damaged.houses..rel.. ~ ., data= td.training.data.i, family = binomial(link=chosen_link), trace = FALSE)
td_model.i <- stepAIC(td_full.i, trace = 0)
}else{
pc.training.data.i <- pc_data[train.i, ]
pc.test.data.i <- pc_data[-train.i, ]
pc_model <- pc_step$formula
pc_model.i <- glm(pc_model, data= pc.training.data.i, family = binomial(link=chosen_link), trace = FALSE)
td.training.data.i <- td_data[train.i, ]
td.test.data.i <- td_data[-train.i, ]
td_model <- td_step$formula
td_model.i <- glm(td_model, data= td.training.data.i, family = binomial(link=chosen_link), trace = FALSE)
}
pc.test.pred.i <- predict(pc_model.i, newdata = pc.test.data.i, type = "response")
pc.test.res.i <- pc.test.pred.i - pc.test.data.i$Completely.damaged..rel..
pc.test.rmse.i <- sqrt(mean(pc.test.res.i^2))
pc.test.mape.i <- mean(abs(pc.test.res.i/pc.test.data.i$Completely.damaged..rel..))
cv_pc[i, c("RMSE", "MAPE")] <- c(pc.test.rmse.i, pc.test.mape.i)
td.test.pred.i <- predict(td_model.i, newdata = td.test.data.i, type = "response")
# If the estimate of houses damaged is less than the estimate of houses completely damaged, set it to the latter.
td.test.pred.i[td.test.pred.i<pc.test.pred.i] <- pc.test.pred.i
td.test.res.i <- td.test.pred.i - td.test.data.i$Total.damaged.houses..rel..
td.test.rmse.i <- sqrt(mean(td.test.res.i^2))
td.test.mape.i <- mean(abs(td.test.res.i/td.test.data.i$Total.damaged.houses..rel..))
cv_td[i, c("RMSE", "MAPE")] <- c(td.test.rmse.i, td.test.mape.i)
}
colMeans(cv_pc[, c("RMSE", "MAPE")])
## RMSE MAPE
## 0.07390853 38.76647185
cv_pc
## Split RMSE MAPE
## 1 1 0.13166103 97.2087230
## 2 2 0.01922374 5.3899594
## 3 3 0.03078913 15.4966904
## 4 4 0.04749458 56.6814891
## 5 5 0.20752031 0.9356902
## 6 6 0.02426268 53.5214649
## 7 7 0.02930979 39.2189082
## 8 8 0.17069310 12.0584787
## 9 9 0.09489228 26.0775407
## 10 10 0.05420394 7.5692131
## 11 11 0.01772350 31.9201821
## 12 12 0.05912826 119.1193224
colMeans(cv_td[, c("RMSE", "MAPE")])
## RMSE MAPE
## 0.1668861 53.3935719
cv_td
## Split RMSE MAPE
## 1 1 0.19376474 38.410240
## 2 2 0.13677559 13.626954
## 3 3 0.12920052 17.831413
## 4 4 0.13590789 85.054196
## 5 5 0.18251489 0.504323
## 6 6 0.09491613 50.191799
## 7 7 0.13318883 32.099336
## 8 8 0.26227135 24.188147
## 9 9 0.25684684 24.080305
## 10 10 0.16329686 6.365589
## 11 11 0.10466218 44.615731
## 12 12 0.20928757 303.754830
The total damage model has a significantly higher cross-validation error. In both cases, a large portion of the error seems to be driven by a couple of events that lead to errors which are much larger than the average. Note that by treating each typhoon as a test set, we interpret the Experience variable as a static, long-run frequency of experiencing typhoons instead of the way it is actually calculated: as a dynamic variable which accumulates for each municipality over the timeframe of the data (2012-2016). An alternative cross-validation method would be to only use past typhoons to predict for later ones.
Next, we check if the model residuals are symmetrically distributed about zero and if the model has any bias in terms of over or under-prediction.
pc.pred <- predict(pc_step, type = "response")
pc.res <- pc.pred - pc_data$Completely.damaged..rel..
hist(pc.res)
plot(link_fn(pc_data$Completely.damaged..rel..), link_fn(pc.pred), asp = 1)
abline(a = 0, b = 1, add = TRUE)
## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): "add" is
## not a graphical parameter
td.pred <- predict(td_step, type = "response")
# If the estimate of houses damaged is less than the estimate of houses completely damaged, set it to the latter.
td.pred[td.pred<pc.pred] <- pc.pred
td.res <- td.pred - td_data$Total.damaged.houses..rel..
hist(td.res)
plot(link_fn(td_data$Total.damaged.houses..rel..), link_fn(td.pred), asp = 1)
abline(a = 0, b = 1, add = TRUE)
## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): "add" is
## not a graphical parameter
In both cases, the models seems to overpredict damage for low damages. This is important to take into account when evaluating the predictions done with this model. Further analysis to correct this bias can be explored, such us non-linearity in some of the predictors (e.g. wind speed), and eventually update the model.
A new dataset is prepared for the new Goni Typhoon. The same columns from the original dataset are used, but only one row per municipality is left. The columns corresponding to Wind speed, Distance to typhoon and Distance to first impact are completed based on recently acquired information (see other reports from DASL).
# Read CSV with new dataset
new_data <- read.csv(file="data//All_Goni_pred.csv", header=TRUE, sep=",")
# Choose which wind speed data to use
new_data$Wind.speed <- new_data$Wind.speed.max
# new_data$Wind.speed <- new_data$Wind.speed.mean
# Set a minimum distance from typhoon and distance from first impact
min_dist <- 0.01*min(new_data$distance_first_impact[new_data$distance_first_impact>0], new_data$Distance.to.typhoon[new_data$Distance.to.typhoon>0])
new_data$distance_first_impact[new_data$distance_first_impact==0] <- min_dist
new_data$Distance.to.typhoon[new_data$Distance.to.typhoon==0] <- min_dist
head(new_data)
## X disaster_type disaster_name pcode Completely.damaged..abs..
## 1 0 Typhoon Goni PH012801000 0.05559265
## 2 1 Typhoon Goni PH012802000 1.05953427
## 3 2 Typhoon Goni PH012803000 1.06816249
## 4 3 Typhoon Goni PH012804000 0.55749366
## 5 4 Typhoon Goni PH012805000 1.82812867
## 6 5 Typhoon Goni PH012806000 0.48772622
## Total.damaged.houses..abs.. Completely.damaged..rel..
## 1 1.547008 0.0001240907
## 2 106.291491 0.0001315579
## 3 65.854439 0.0001351420
## 4 67.084419 0.0001519885
## 5 93.473126 0.0001324707
## 6 35.297867 0.0001995402
## Total.damaged.houses..rel.. ratio_comp_part Total...of.houses Wind.speed.mean
## 1 0.003453144 0.1589327 448.00 10
## 2 0.013197764 1.0122643 8053.75 31
## 3 0.008331786 0.1273040 7904.00 17
## 4 0.018289100 0.1184767 3668.00 27
## 5 0.006773292 0.3971472 13800.25 17
## 6 0.014441185 0.6618326 2444.25 45
## Wind.speed.max Distance.to.typhoon rainfall distance_first_impact Slope
## 1 19 442.1175 493.20 650.4285 16.827038
## 2 60 412.2540 528.00 653.2009 2.127614
## 3 36 370.8016 256.20 629.5410 7.368456
## 4 60 440.0760 204.96 664.1004 11.938009
## 5 36 387.0933 217.60 635.4122 5.234450
## 6 60 435.9028 202.40 669.5462 4.794050
## Elevation ruggedness_stdev Ruggedness slope_stdev
## 1 558.75002 39.76238 84.40795 8.536798
## 2 23.79743 12.14331 12.05175 2.491546
## 3 97.69650 39.58078 37.36870 8.268917
## 4 220.46549 41.11407 59.74080 8.831068
## 5 91.41104 23.80632 27.48998 5.180507
## 6 168.20983 17.87840 24.79369 3.935199
## Predicted.damage.class..1.5. Population land_area Population.density
## 1 NA 1808 127.0823 14.2270
## 2 NA 32496 63.0110 515.7043
## 3 NA 31893 91.8767 347.1266
## 4 NA 14802 131.2743 112.7567
## 5 NA 55635 180.1741 308.7803
## 6 NA 9777 156.2251 62.5827
## Poverty.incidence X..strong.roof.type X..strong.wall.type
## 1 0.1791 0.9732360 0.6107056
## 2 0.1141 0.9942983 0.9113645
## 3 0.1404 0.9164221 0.8213836
## 4 0.1197 0.9612337 0.8902094
## 5 0.1100 0.9774862 0.8666991
## 6 0.1378 0.8979592 0.8078231
## X..skilled.Agriculture.Forestry.Fishermen Region prov Experience
## 1 0.15569196 north luzon 0 3.262295
## 2 0.10277821 north luzon 0 3.262295
## 3 0.13199013 north luzon 0 3.262295
## 4 0.09385224 north luzon 0 3.262295
## 5 0.09592217 north luzon 0 3.262295
## 6 0.13020354 north luzon 0 3.262295
## Bicol.region Wind.speed Total.damaged.houses..rel..se
## 1 0 19 0.001366342
## 2 0 60 0.004624503
## 3 0 36 0.002811769
## 4 0 60 0.006337689
## 5 0 36 0.002338853
## 6 0 60 0.005598435
## Total.damaged.houses..abs..se Completely.damaged..rel..se
## 1 0.6121211 8.993182e-05
## 2 37.2445947 9.808454e-05
## 3 22.2242194 9.346595e-05
## 4 23.2466444 1.066735e-04
## 5 32.2767539 9.414416e-05
## 6 13.6839749 1.412399e-04
## Completely.damaged..abs..se
## 1 0.04028946
## 2 0.78994834
## 3 0.73875483
## 4 0.39127852
## 5 1.29921298
## 6 0.34522561
The variables are power-transformed to be able to be used as predictors in the models calibrated previously. Then, the data is scaled using the mean and standard deviation for each covariate from the training set from the model calibration.
new_td_data <- new_data
new_pc_data <- new_data
for (i in 1:length(var_names)){
if(power_df[1, var_names[i]]==0){
new_td_data[, var_names[i]] <- (log(new_td_data[, var_names[i]]) - cols.mean.td[i])/cols.sd.td[i]
}else{
new_td_data[, var_names[i]] <- ((new_td_data[, var_names[i]]^power_df[1, var_names[i]]) - cols.mean.td[i])/cols.sd.td[i]
}
if(power_df[2, var_names[i]]==0){
new_pc_data[, var_names[i]] <- (log(new_pc_data[, var_names[i]]) - cols.mean.pc[i])/cols.sd.pc[i]
}else{
new_pc_data[, var_names[i]] <- (new_pc_data[, var_names[i]]^power_df[2, var_names[i]] - cols.mean.pc[i])/cols.sd.pc[i]
}
}
The model calibrated makes use of wind speed, distance to first impact, elevation, slope, strong roof type type and experience as predictors of damage.
td_step
##
## Call: glm(formula = Total.damaged.houses..rel.. ~ Wind.speed + distance_first_impact +
## Slope + Elevation + Population.density + X..strong.roof.type +
## Experience, family = binomial(link = chosen_link), data = td_data,
## trace = FALSE)
##
## Coefficients:
## (Intercept) Wind.speed distance_first_impact
## -2.1699 1.1880 -0.3593
## Slope Elevation Population.density
## 0.4576 -0.7395 -0.3352
## X..strong.roof.type Experience
## -0.2394 -0.1602
##
## Degrees of Freedom: 1614 Total (i.e. Null); 1607 Residual
## Null Deviance: 721.6
## Residual Deviance: 285.8 AIC: 752.7
# Predict proportion of all damaged houses and standard errors
pred.td_prop_bldgs = predict(td_step, newdata = new_td_data, type = "response", se.fit = TRUE)
# Predict number of all damaged houses
pred.td_num_bldgs = pred.td_prop_bldgs$fit * new_data$Total...of.houses
# Add predicted values and standard errors to data set
new_data$Total.damaged.houses..rel.. = pred.td_prop_bldgs$fit
new_data$Total.damaged.houses..abs.. = pred.td_num_bldgs
new_data$Total.damaged.houses..rel..se = pred.td_prop_bldgs$se.fit
new_data$Total.damaged.houses..abs..se = pred.td_prop_bldgs$se.fit * new_data$Total...of.houses
The model calibrated makes use of wind speed, distance to first impact, elevation, strong roof type type and experience as predictors of damage.
pc_step
##
## Call: glm(formula = Completely.damaged..rel.. ~ Wind.speed + distance_first_impact +
## Elevation + Population.density + X..strong.roof.type + Experience,
## family = binomial(link = chosen_link), data = pc_data, trace = FALSE)
##
## Coefficients:
## (Intercept) Wind.speed distance_first_impact
## -4.1679 1.5099 -0.3467
## Elevation Population.density X..strong.roof.type
## -0.3597 -0.5586 -0.2420
## Experience
## -0.3302
##
## Degrees of Freedom: 1614 Total (i.e. Null); 1608 Residual
## Null Deviance: 346.5
## Residual Deviance: 96.17 AIC: 276.4
# Predict proportion of all damaged houses
pred.pc_prop_bldgs = predict(pc_step, newdata =new_pc_data, type = "response", se.fit = TRUE)
# Predict number of all damaged houses
pred.pc_num_bldgs = pred.pc_prop_bldgs$fit * new_data$Total...of.houses
# Add predicted values and standard errors to data set
new_data$Completely.damaged..rel.. = pred.pc_prop_bldgs$fit
new_data$Completely.damaged..abs.. = pred.pc_num_bldgs
new_data$Completely.damaged..rel..se = pred.pc_prop_bldgs$se.fit
new_data$Completely.damaged..abs..se = pred.pc_prop_bldgs$se.fit * new_data$Total...of.houses
head(new_data)
## X disaster_type disaster_name pcode Completely.damaged..abs..
## 1 0 Typhoon Goni PH012801000 0.05559265
## 2 1 Typhoon Goni PH012802000 1.05953427
## 3 2 Typhoon Goni PH012803000 1.06816249
## 4 3 Typhoon Goni PH012804000 0.55749366
## 5 4 Typhoon Goni PH012805000 1.82812867
## 6 5 Typhoon Goni PH012806000 0.48772622
## Total.damaged.houses..abs.. Completely.damaged..rel..
## 1 1.547008 0.0001240907
## 2 106.291491 0.0001315579
## 3 65.854439 0.0001351420
## 4 67.084419 0.0001519885
## 5 93.473126 0.0001324707
## 6 35.297867 0.0001995402
## Total.damaged.houses..rel.. ratio_comp_part Total...of.houses Wind.speed.mean
## 1 0.003453144 0.1589327 448.00 10
## 2 0.013197764 1.0122643 8053.75 31
## 3 0.008331786 0.1273040 7904.00 17
## 4 0.018289100 0.1184767 3668.00 27
## 5 0.006773292 0.3971472 13800.25 17
## 6 0.014441185 0.6618326 2444.25 45
## Wind.speed.max Distance.to.typhoon rainfall distance_first_impact Slope
## 1 19 442.1175 493.20 650.4285 16.827038
## 2 60 412.2540 528.00 653.2009 2.127614
## 3 36 370.8016 256.20 629.5410 7.368456
## 4 60 440.0760 204.96 664.1004 11.938009
## 5 36 387.0933 217.60 635.4122 5.234450
## 6 60 435.9028 202.40 669.5462 4.794050
## Elevation ruggedness_stdev Ruggedness slope_stdev
## 1 558.75002 39.76238 84.40795 8.536798
## 2 23.79743 12.14331 12.05175 2.491546
## 3 97.69650 39.58078 37.36870 8.268917
## 4 220.46549 41.11407 59.74080 8.831068
## 5 91.41104 23.80632 27.48998 5.180507
## 6 168.20983 17.87840 24.79369 3.935199
## Predicted.damage.class..1.5. Population land_area Population.density
## 1 NA 1808 127.0823 14.2270
## 2 NA 32496 63.0110 515.7043
## 3 NA 31893 91.8767 347.1266
## 4 NA 14802 131.2743 112.7567
## 5 NA 55635 180.1741 308.7803
## 6 NA 9777 156.2251 62.5827
## Poverty.incidence X..strong.roof.type X..strong.wall.type
## 1 0.1791 0.9732360 0.6107056
## 2 0.1141 0.9942983 0.9113645
## 3 0.1404 0.9164221 0.8213836
## 4 0.1197 0.9612337 0.8902094
## 5 0.1100 0.9774862 0.8666991
## 6 0.1378 0.8979592 0.8078231
## X..skilled.Agriculture.Forestry.Fishermen Region prov Experience
## 1 0.15569196 north luzon 0 3.262295
## 2 0.10277821 north luzon 0 3.262295
## 3 0.13199013 north luzon 0 3.262295
## 4 0.09385224 north luzon 0 3.262295
## 5 0.09592217 north luzon 0 3.262295
## 6 0.13020354 north luzon 0 3.262295
## Bicol.region Wind.speed Total.damaged.houses..rel..se
## 1 0 19 0.001366342
## 2 0 60 0.004624503
## 3 0 36 0.002811769
## 4 0 60 0.006337689
## 5 0 36 0.002338853
## 6 0 60 0.005598435
## Total.damaged.houses..abs..se Completely.damaged..rel..se
## 1 0.6121211 8.993182e-05
## 2 37.2445947 9.808454e-05
## 3 22.2242194 9.346595e-05
## 4 23.2466444 1.066735e-04
## 5 32.2767539 9.414416e-05
## 6 13.6839749 1.412399e-04
## Completely.damaged..abs..se
## 1 0.04028946
## 2 0.78994834
## 3 0.73875483
## 4 0.39127852
## 5 1.29921298
## 6 0.34522561
The predicted values are saved back in the csv.
# Write new dataset in csv
write.csv(new_data, 'data\\All_Goni_pred.csv', row.names=F)
library(rgdal)
## Loading required package: sp
## rgdal: version: 1.4-8, (SVN revision 845)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
## Path to GDAL shared files: C:/Users/michele.nguyen/Documents/R/win-library/3.6/rgdal/gdal
## GDAL binary built with GEOS: TRUE
## Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
## Path to PROJ.4 shared files: C:/Users/michele.nguyen/Documents/R/win-library/3.6/rgdal/proj
## Linking to sp version: 1.3-2
library(sf)
## Warning: package 'sf' was built under R version 3.6.3
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(ggplot2)
dsn <- system.file("vectors", package = "rgdal")
# Read shapefiles
ph_mun = st_read('data\\Shapefiles\\PH_municipality.shp')
## Reading layer `PH_municipality' from data source `C:\Users\michele.nguyen\Documents\GitHub\storm-goni\GoniDamageModel\data\Shapefiles\PH_municipality.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1647 features and 13 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 114.2815 ymin: 4.587399 xmax: 126.605 ymax: 21.12176
## geographic CRS: WGS 84
goni_trace = st_read('data\\Shapefiles\\goni-path.shp')
## Reading layer `goni-path' from data source `C:\Users\michele.nguyen\Documents\GitHub\storm-goni\GoniDamageModel\data\Shapefiles\goni-path.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 2 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 114.2 ymin: 13.5 xmax: 140.1 ymax: 16.8
## geographic CRS: WGS 84
# Create new shapefile
ph_mun_new = ph_mun
tdprop.list = replicate(length(ph_mun$Mun_Code),NA)
tdnum.list = replicate(length(ph_mun$Mun_Code),NA)
pcprop.list = replicate(length(ph_mun$Mun_Code),NA)
pcnum.list = replicate(length(ph_mun$Mun_Code),NA)
tdprop.se.list = replicate(length(ph_mun$Mun_Code),NA)
tdnum.se.list = replicate(length(ph_mun$Mun_Code),NA)
pcprop.se.list = replicate(length(ph_mun$Mun_Code),NA)
pcnum.se.list = replicate(length(ph_mun$Mun_Code),NA)
windspeed.list = replicate(length(ph_mun$Mun_Code),NA)
dist_first_impact.list = replicate(length(ph_mun$Mun_Code),NA)
Slope.list = replicate(length(ph_mun$Mun_Code),NA)
Elevation.list = replicate(length(ph_mun$Mun_Code),NA)
Population.density.list = replicate(length(ph_mun$Mun_Code),NA)
X..strong.roof.type.list = replicate(length(ph_mun$Mun_Code),NA)
Experience.list = replicate(length(ph_mun$Mun_Code),NA)
for (i in 1:length(new_data$pcode)){
pcode = new_data$pcode[i]
windspeed = new_data$Wind.speed[i]
dist_first_impact = new_data$distance_first_impact[i]
tdprop = new_data$Total.damaged.houses..rel..[i]
tdnum = new_data$Total.damaged.houses..abs..[i]
pcprop = new_data$Completely.damaged..rel..[i]
pcnum = new_data$Completely.damaged..abs..[i]
tdprop.se = new_data$Total.damaged.houses..rel..se[i]
tdnum.se = new_data$Total.damaged.houses..abs..se[i]
pcprop.se = new_data$Completely.damaged..rel..se[i]
pcnum.se = new_data$Completely.damaged..abs..se[i]
Slope = new_data$Slope[i]
Elevation = new_data$Elevation[i]
Population.density = new_data$Population.density[i]
X..strong.roof.type = new_data$X..strong.roof.type[i]
Experience = new_data$Experience[i]
# Look for pcode in shapefile attributes
index = which(ph_mun$Mun_Code == as.character(pcode))
tdprop.list[index] = tdprop
tdnum.list[index] = tdnum
pcprop.list[index] = pcprop
pcnum.list[index] = pcnum
windspeed.list[index] = windspeed
dist_first_impact.list[index] = dist_first_impact
tdprop.se.list[index] = tdprop.se
tdnum.se.list[index] = tdnum.se
pcprop.se.list[index] = pcprop.se
pcnum.se.list[index] = pcnum.se
Slope.list[index] = Slope
Elevation.list[index] = Elevation
Population.density.list[index] = Population.density
X..strong.roof.type.list[index] = X..strong.roof.type
Experience.list[index] = Experience
}
# Save new fields in output shapefile
ph_mun_new$Wind_Speed = windspeed.list
ph_mun_new$dist_first_impact = dist_first_impact.list
ph_mun_new$tdprop = tdprop.list
ph_mun_new$tdnum = tdnum.list
ph_mun_new$pcprop = pcprop.list
ph_mun_new$pcnum = pcnum.list
ph_mun_new$tdprop.se = tdprop.se.list
ph_mun_new$tdnum.se = tdnum.se.list
ph_mun_new$pcprop.se = pcprop.se.list
ph_mun_new$pcnum.se = pcnum.se.list
ph_mun_new$Slope = Slope.list
ph_mun_new$Elevation = Elevation.list
ph_mun_new$Population.density = Population.density.list
ph_mun_new$X..strong.roof.type = X..strong.roof.type.list
ph_mun_new$Experience = Experience.list
# st_write(ph_mun_new, 'data\\Shapefiles\\PH_municipality_results.shp')
cap.df <- data.frame(x = 126.8, y = 15.75, text = "Developed by the Disaster Analytics for Society Lab (DASL) at the \n Earth Observatory of Singapore and Nanyang Technological University.")
# Plot tdprop
ggplot() + theme(
panel.background = element_rect(fill = "lightblue",
colour = "lightblue",
size = 0.5, linetype = "solid"),
panel.grid.major = element_line(size = 0.2, linetype = 'dashed',
colour = "white")
) +
geom_sf(data = ph_mun_new, aes(fill=tdprop), size = 0.1, color = "black") +
scale_fill_fermenter(breaks = c(0,0.2,0.4,0.6,0.8,1), type = "seq", palette = 7, direction=1,
na.value='grey50') +
geom_sf(data = goni_trace, size = 0.2, color = "red", fill = "cyan1") +
coord_sf(xlim=c(118,127), ylim=c(11,16), expand=F) + labs(fill = "Damage rate", x = "", y = "") + geom_text(data = cap.df, aes(x = x, y = y, label = text), size = 2.5, hjust = 1) +
theme(legend.position = c(0.025, 0.15), legend.justification = c(0, 0), legend.title = element_text(size = 7.5), legend.text = element_text(size = 7.5))
ggsave("graphics/goni-damagerate.png", dpi = 300, height = 6, width = 10, units = "in")
# Plot tdprop standard errors:
ggplot() + theme(
panel.background = element_rect(fill = "lightblue",
colour = "lightblue",
size = 0.5, linetype = "solid"),
panel.grid.major = element_line(size = 0.2, linetype = 'dashed',
colour = "white")
) +
geom_sf(data = ph_mun_new, aes(fill=tdprop.se), size = 0.1, color = "black") +
scale_fill_gradient(low="white", high="red") +
geom_sf(data = goni_trace, size = 0.2, color = "red", fill = "cyan1") +
coord_sf(xlim=c(118,127), ylim=c(11,16), expand=F) + labs(fill = "Damage rate \n standard error", x = "", y = "") + geom_text(data = cap.df, aes(x = x, y = y, label = text), size = 2.5, hjust = 1) +
theme(legend.position = c(0.025, 0.15), legend.justification = c(0, 0), legend.title = element_text(size = 7.5), legend.text = element_text(size = 7.5))
# Plot tdnum
ggplot() + theme(
panel.background = element_rect(fill = "lightblue",
colour = "lightblue",
size = 0.5, linetype = "solid"),
panel.grid.major = element_line(size = 0.2, linetype = 'dashed',
colour = "white")
) +
geom_sf(data = ph_mun_new, aes(fill=tdnum), size = 0.1, color = "black") +
scale_fill_fermenter(breaks = c(0,0.2,0.4,0.6,0.8)*10000, type = "seq", palette = 7, direction=1,
na.value='grey50') +
geom_sf(data = goni_trace, size = 0.2, color = "red", fill = "cyan1") +
coord_sf(xlim=c(118,127), ylim=c(11,16), expand=F) + labs(fill = "# Damaged houses", x = "", y = "") + geom_text(data = cap.df, aes(x = x, y = y, label = text), size = 2.5, hjust = 1) +
theme(legend.position = c(0.025, 0.15), legend.justification = c(0, 0), legend.title = element_text(size = 7.5), legend.text = element_text(size = 7.5))
ggsave("graphics/goni-damagedhouses.png", dpi = 300, height = 6, width = 10, units = "in")
# Plot pcprop
ggplot() + theme(
panel.background = element_rect(fill = "lightblue",
colour = "lightblue",
size = 0.5, linetype = "solid"),
panel.grid.major = element_line(size = 0.2, linetype = 'dashed',
colour = "white")
) +
geom_sf(data = ph_mun_new, aes(fill=pcprop), size = 0.1, color = "black") +
scale_fill_gradient(low="white", high="red") +
geom_sf(data = goni_trace, size = 0.2, color = "red", fill = "cyan1") +
coord_sf(xlim=c(118,127), ylim=c(11,16), expand=F) + labs(fill = "Complete damage rate", x = "", y = "") + geom_text(data = cap.df, aes(x = x, y = y, label = text), size = 2.5, hjust = 1) +
theme(legend.position = c(0.025, 0.15), legend.justification = c(0, 0), legend.title = element_text(size = 7.5), legend.text = element_text(size = 7.5))
# Plot pcprop standard errors
ggplot() + theme(
panel.background = element_rect(fill = "lightblue",
colour = "lightblue",
size = 0.5, linetype = "solid"),
panel.grid.major = element_line(size = 0.2, linetype = 'dashed',
colour = "white")
) +
geom_sf(data = ph_mun_new, aes(fill=pcprop.se), size = 0.1, color = "black") +
scale_fill_gradient(low="white", high="red") +
geom_sf(data = goni_trace, size = 0.2, color = "red", fill = "cyan1") +
coord_sf(xlim=c(118,127), ylim=c(11,16), expand=F) + labs(fill = "Complete damage rate \n standard error", x = "", y = "") + geom_text(data = cap.df, aes(x = x, y = y, label = text), size = 2.5, hjust = 1) +
theme(legend.position = c(0.025, 0.15), legend.justification = c(0, 0), legend.title = element_text(size = 7.5), legend.text = element_text(size = 7.5))
The top 10 municipalities hit, in terms of damage rate, are:
top10.dr <- sort(ph_mun_new$tdprop, decreasing = TRUE)[1:10]
top10.dr.df <- data.frame(ph_mun_new[ph_mun_new$tdprop %in% top10.dr, c("Mun_Name", "tdprop")])
top10.dr.df <- top10.dr.df[order(top10.dr.df$tdprop, decreasing = TRUE), ]
top10.dr.df[, c("Mun_Name", "tdprop")]
## Mun_Name tdprop
## 567 VIGA 0.9181192
## 558 BAGAMANOC 0.9137369
## 566 SAN MIGUEL 0.8992065
## 564 PANGANIBAN (PAYO) 0.8951749
## 563 PANDAN 0.8875699
## 562 GIGMOTO 0.8857977
## 561 CARAMORAN 0.8770626
## 531 CARAMOAN 0.8446999
## 565 SAN ANDRES (CALOLBON) 0.7725639
## 534 GARCHITORENA 0.7692863
The top 10 municipalities hit, in terms of number of damaged houses, are:
top10.db <- sort(ph_mun_new$tdnum, decreasing = TRUE)[1:10]
top10.db.df <- data.frame(ph_mun_new[ph_mun_new$tdnum %in% top10.db, c("Mun_Name", "tdnum")])
top10.db.df <- top10.db.df[order(top10.db.df$tdnum, decreasing = TRUE), ]
top10.db.df[, c("Mun_Name", "tdnum")]
## Mun_Name tdnum
## 557 TINAMBAC 10074.036
## 531 CARAMOAN 10052.985
## 568 VIRAC (Capital) 9998.508
## 537 LAGONOY 8727.877
## 566 SAN MIGUEL 8319.909
## 528 CALABANGA 8244.199
## 538 LIBMANAN 7347.871
## 565 SAN ANDRES (CALOLBON) 7103.532
## 535 GOA 6600.385
## 561 CARAMORAN 6590.249
Finally, we provide a KMZ file of the map to enable visualisation via Google Earth:
spdf <- as(ph_mun_new, "Spatial")
spdf@data <- spdf@data[, c("Mun_Name", "tdprop", "tdprop.se", "tdnum", "tdnum.se", "pcprop", "pcprop.se", "pcnum", "pcnum.se", "Wind_Speed", "dist_first_impact", "Slope", "Elevation", "Population.density", "X..strong.roof.type", "Experience")]
colnames(spdf@data) <- c("Municipality", "Damage_rate", "Damage_rate_standard_error", "No_of_damaged_houses", "No_of_damaged_houses_standard_error", "Complete_damage_rate", "Complete_damage_rate_standard_error", "No_of_completely_damaged_houses", "No_of_completely_damaged_houses_standard_error", "Wind_speed", "Distance_to_first_impact", "Slope", "Elevation", "Population_density", "Proportion_of_strong_roof_type", "Experience")
spdf@data$Municipality <- as.character(spdf@data$Municipality)
# Replace pattern in string:
spdf@data$Municipality[spdf@data$Municipality == "GABALDON (BITULOK & SABANI)"] <- "GABALDON (BITULOK AND SABANI)"
shape = "http://maps.google.com/mapfiles/kml/pal2/icon18.png"
# Round variables
# To 2 decimal places
spdf@data[, which(!(colnames(spdf@data)%in% c("Municipality", "Damage_rate", "Complete_damage_rate", "No_of_damaged_houses", "No_of_completely_damaged_houses")))] <- round(spdf@data[, which(!(colnames(spdf@data)%in% c("Municipality", "Damage_rate", "Complete_damage_rate", "No_of_damaged_houses", "No_of_completely_damaged_houses")))], digits = 2)
# To the nearest 5%
spdf@data[, "Damage_rate"] <- round(spdf@data[, "Damage_rate"]/0.05)*0.05
spdf@data[, "Complete_damage_rate"] <- round(spdf@data[, "Complete_damage_rate"]/0.05)*0.05
# To the nearest 100
spdf@data[, "No_of_damaged_houses"] <- round(spdf@data[, "No_of_damaged_houses"]/100)*100
spdf@data[, "No_of_completely_damaged_houses"] <- round(spdf@data[, "No_of_completely_damaged_houses"]/100)*100
head(spdf@data)
## Municipality Damage_rate Damage_rate_standard_error No_of_damaged_houses
## 1 ADAMS 0 0.00 0
## 2 BACARRA 0 0.00 100
## 3 BADOC 0 0.00 100
## 4 BANGUI 0 0.01 100
## 5 CITY OF BATAC 0 0.00 100
## 6 BURGOS 0 0.01 0
## No_of_damaged_houses_standard_error Complete_damage_rate
## 1 0.61 0
## 2 37.24 0
## 3 22.22 0
## 4 23.25 0
## 5 32.28 0
## 6 13.68 0
## Complete_damage_rate_standard_error No_of_completely_damaged_houses
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
## No_of_completely_damaged_houses_standard_error Wind_speed
## 1 0.04 19
## 2 0.79 60
## 3 0.74 36
## 4 0.39 60
## 5 1.30 36
## 6 0.35 60
## Distance_to_first_impact Slope Elevation Population_density
## 1 650.43 16.83 558.75 14.23
## 2 653.20 2.13 23.80 515.70
## 3 629.54 7.37 97.70 347.13
## 4 664.10 11.94 220.47 112.76
## 5 635.41 5.23 91.41 308.78
## 6 669.55 4.79 168.21 62.58
## Proportion_of_strong_roof_type Experience
## 1 0.97 3.26
## 2 0.99 3.26
## 3 0.92 3.26
## 4 0.96 3.26
## 5 0.98 3.26
## 6 0.90 3.26
plotKML(spdf, 'graphics/goni_damage', colour = "Damage_rate", plot.labpt = TRUE, shape = shape, alpha = 0.75, size = 0.2, balloon = TRUE)
## KML file opened for writing...
## Writing to KML...
## Closing graphics/goni_damage.kml
## Warning in kml_View(file.name): No MIME type detected for .kml file extension.
Models were trained from past typhoon damage data both for proportion of damaged houses and proportion of completely damaged houses. In both cases, a similar set of covariates were found to be important for predicting damage: Wind speed (arguely the most relevant), distance to first impact of typhoon in land, proportion of strong roof houses, average slope, experience in past typhoons and mean elevation of the municipality.
Damage was predicted for the latest Goni Storm based on remotely obtained wind speed data (see other DASL report) and the observed typhoon trace. Proportion and total number of damaged houses per municipality was predicted using the calibrated models. The province of Catanduane seems to be the worst hitted by the typhoon in coincidence with the largest observed wind speeds.
Caution in the use and evaluation of predictions and thorough review of the model’s development and hypothesis is recommended before analyzing the results.