Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.5k views
in Technique[技术] by (71.8m points)

ggplot2 - How can I study the relationship between these two variables in R?

I am trying to analyse the relationship between unemployment rate and crime rate across 9 regions in England.

Here's what the faceted graph for each region looks like (yes, scaling that unemployment rate line is a pain):

enter image description here

I was wondering how I can analyse said relationship other than from a "visual" perspective? I.e. it appears that crime rate for Theft decreases as unemployment rate increases, and the opposite is true for Anti-social behaviour related crimes.

This is more a data analysis question rather than a programming one, however all my plots and analyses should be carried out in R, hence why I am posting here.

Any suggestion would be highly appreciated!

Plot code:

  ggplot(mapping=aes(Date)) +
  geom_line(aes(y = Crime_occurrencies, colour = Crime),
            size = 1, data = crime_data) +
  geom_line(mapping = aes(Date, y = rescale(Unemployment.rate, to = out_range, from = in_range),
                          linetype = "Unemployment rate"),
            col = "black", size = 1, data = unemployment_data) +
  labs(linetype = "Unemployment") +
  facet_wrap(~Region,
             scales = "free_y") +
  scale_x_date(breaks = seq(as.Date("2019-01-01"), as.Date("2020-10-01"), by="1 month"),
               date_labels = '%m %Y') +
  scale_y_continuous(sec.axis = 
                       sec_axis(~ rescale(.x, to = in_range, from = out_range), 
                                name = "Unemployment Rate (%)")) +
  theme(axis.text.x=element_text(angle =- 90, vjust = 0.5))

EDIT: here is a sample of the data I used

df1 = Crime data

structure(list(
Region = c("London", "South West", "North East", 
  "South West", "West Midlands", "Yorkshire and The Humber", "London", 
  "South East", "Yorkshire and The Humber", "London", "East of England", 
  "London", "West Midlands", "East of England", "East Midlands", 
  "East of England", "London", "South West", "East Midlands", "North West"
  ), 
Date = structure(c(18078, 18262, 18475, 18078, 17897, 17897, 
  18444, 18231, 17928, 18506, 18201, 18293, 18475, 18201, 18262, 
  18536, 18353, 18414, 18109, 18383), class = "Date"), 
Crime = c("Robbery", 
  "Theft", "Robbery", "Violence and sexual offences", "Violence and sexual 
  offences", "Anti-social behaviour", "Burglary", "Robbery", "Burglary", 
  "Robbery", 
  "Anti-social behaviour", "Theft", "Theft", "Violence and sexual offences", 
  "Robbery", "Violence and sexual offences", "Robbery", "Burglary", 
  "Robbery", "Violence and sexual offences"), 
Crime_occurrencies = c(3330L, 
  5508L, 95L, 8427L, 14350L, 15072L, 4942L, 565L, 4569L, 2605L, 
  8375L, 30039L, 7057L, 12141L, 174L, 12854L, 1101L, 987L, 175L, 
  13325L)), class = "data.frame", row.names = c(NA, -20L))

df2 = Unemployment data

structure(list(
Date = structure(c(18170, 18293, 18170, 18201, 
  18475, 17956, 17956, 18078, 18201, 18078, 18140, 18170, 18322, 
  18201, 18383, 18109, 18383, 18048, 17897, 18536), class = "Date"), 
Region = structure(c(8L, 8L, 9L, 3L, 3L, 6L, 9L, 9L, 10L, 
  10L, 8L, 4L, 2L, 9L, 10L, 3L, 6L, 4L, 6L, 2L), 
.Label = c("England", 
    "South East", "South West", "London", "East of England", 
    "East Midlands", "West Midlands", "Yorkshire and The Humber", 
    "North East", "North West"), class = "factor"), 
Unemployment.rate = c(4.08974888999112, 
    4.71840892982655, 6.11361138828401, 2.8428439676314, 4.13354432440967, 
    4.02517515965457, 5.41295614949722, 4.97071907922267, 4.1730633389162, 
    4.29820710838942, 3.90122545742185, 4.50615604436695, 2.90903701310954, 
    6.21086689536757, 3.79490967669574, 2.38897367671231, 3.97182605242641, 
    4.54049887070026, 4.68247349426148, 3.86912441545878)), 
class = "data.frame", row.names = c(NA, 
-20L))

GEOM SMOOTH OUTPUTS When run before geom_line:

enter image description here

When run on its own:

enter image description here

ggplot(mapping=aes(Date)) +
  geom_smooth(method = 'lm',se=F,
              aes(x=Date,y = Crime_occurrencies,color='Trend', group=1),
              data=crime_count)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try playing with geom_smooth(). Here, I have used to create a trend for crime so that you can compare with the other variable and derive relationships:

library(ggplot2)
#Values
out_range <- range(crime_data$Crime_occurrencies)
in_range <- range(unemployment_data$Unemployment.rate)
#Plot
ggplot(mapping=aes(Date)) +
  geom_line(aes(y = Crime_occurrencies, colour = Crime),
            size = 1, data = crime_data) +
  geom_line(mapping = aes(Date, y = rescale(Unemployment.rate, to = out_range,
                                            from = in_range),
                          linetype = "Unemployment rate"),
            col = "black", size = 1, data = unemployment_data) +
  geom_smooth(method = 'lm',se=F,
              aes(x=Date,y = Crime_occurrencies,color='Trend'),
              data=crime_data)+
  labs(linetype = "Unemployment") +
  facet_wrap(~Region,
             scales = "free_y") +
  scale_x_date(breaks = seq(as.Date("2019-01-01"), as.Date("2020-10-01"), by="1 month"),
               date_labels = '%m %Y') +
  scale_y_continuous(sec.axis = 
                       sec_axis(~ rescale(.x, to = in_range, from = out_range), 
                                name = "Unemployment Rate (%)")) +
  theme(axis.text.x=element_text(angle =- 90, vjust = 0.5))

Output:

enter image description here

With your full data you can get the complete trend, and if necessary, you can add group=1 inside aes() from geom_smooth().


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...