Reproducible research and open tools for future-proof transport planning

Mobile Tartu 2024

Robin Lovelace

University of Leeds, Active Travel England

June 14, 2024

Introduction

About me and my work

  • Associate Professor of Transport Data Science
  • Work with government
  • Focus on impact
  • Software developer and data scientist
  • New methods for more reproducible, data-driven and participatory transport planning

Contents

  • Definitions: what is reproducible research and future proof transport planning?
  • Reproducible research and open tools
  • Future-proof transport planning

Definitions

a tool is a … piece of software or online service; a model … is method or process that is expounded in theoretical terms; software is … instructions that underlies digital tools Lovelace (2021)

Reproducible research: Other people can re-generate your results

Open source software: Software that is free to use and modify

Open access tools: Web applications for transport planning that are based on open source software, that anyone can use

Open access data: Data that is freely available to use and share

Future-proof work that is likely to be useful in the medium-term future

(I-)Reproducibility

Reproducibility is a continuous variable (Peng 2011)

Reproducible research

Why make your research (more) reproducible?

Source: Raff (2023)

  • Scientific rigour
  • Benefits to your future self
  • Benefits to others
  • Huge increase in potential for impact

Why not make your research reproducible?

  • Time

  • Know-how

  • Lack of permission

  • Software is not open

  • Data is not open access

  • Someone might use it in unethical ways

  • Someone might “steal” the work

Example of fully reproducible research

Lovelace, Tennekes, and Carlino (2022)

Reproducibility and generalisability

Illustration of the ClockBoard zoning system used to visualize a geographically dependendent phenomena: air quality, measured in mass of PM10 particles, measured in micrograms per cubic meter, from the London Atmospheric Emissions Inventory (LAEI). The facets show the data in spatial grid available from the LAEI, facet Am and aggregated to London boroughs B, to ClockBoard zones covering all the input data shown in C, and ClockBoard zones clipped by the administrative boundary of Greater London in D.

Application: road traffic casualties

International comparisons

Premise: A key reason for reproducibility is generalisability.

Open source software and open access tools

Case study: mobile telephone data in Spain

Don’t reinvent the wheel

Before

options(timeout = 600) # 10 minutes
u1 = "https://movilidad-opendata.mitma.es/estudios_basicos/por-distritos/viajes/ficheros-diarios/2024-03/20240301_Viajes_distritos.csv.gz"
f1 = basename(u1)
if (!file.exists(f1)) {
  download.file(u1, f1)
}
drv = duckdb::duckdb("daily.duckdb")
con = DBI::dbConnect(drv)
od1 = duckdb::tbl_file(con, f1)

Credit: Egor Kotov

remotes::install_github("Robinlovelace/spanishoddata")
od_multi_list = get_od(date_regex = "2024030[1-7]")
# ...
n_per_hour |>
  ggplot(aes(x = Time, y = Trips)) +
  geom_line(aes(colour = Day)) +
  labs(title = "Number of trips per hour over 7 days")

Re-using existing tools

# Process the data
od_large = od_database |>
  group_by(origen, destino) |>
  summarise(Trips = sum(viajes), .groups = "drop") |>
  filter(Trips > 500) |>
  collect() |>
  arrange(desc(Trips))
# ℹ 37,013 more rows
# Convert to geo with {od} package:
od_large_interzonal_sf = od::od_to_sf(
  od_large_interzonal,
  z = distritos_wgs84
)
od_large_interzonal_sf |>
  ggplot() +
  geom_sf(aes(size = Trips), colour = "red") +
  theme_void()

Zooming in

distritos = get_zones(type = "distritos")
distritos_wgs84 = sf::st_transform(distritos, 4326)
salamanca_zones = zonebuilder::zb_zone("Salamanca")
distritos_salamanca = distritos_wgs84[salamanca_zones, ]
plot(distritos_salamanca)

Subsetting from the database

od_salamanca = od_database |>
  filter(origen %in% ids_salamanca) |>
  filter(destino %in% ids_salamanca) |>
  collect()
  group_by(origen, destino) |>
  summarise(Trips = sum(viajes)) |>
  arrange(Trips)
od_salamanca_sf = od::od_to_sf(
  od_salamanca,
  z = distritos_salamanca
)
od_salamanca_sf |>
  filter(origen != destino) |>
  ggplot() +
  geom_sf(aes(colour = Trips), size = 1) +
  scale_colour_viridis_c() +
  theme_void()

Spatial disaggregation

od_jittered = odjitter::jitter(
  od_salamanca_sf,
  zones = distritos_salamanca,
  subpoints = drive_net,
  disaggregation_threshold = 1000,
  disaggregation_key = "Trips"
)
od_jittered |>
  arrange(Trips) |>
  ggplot() +
  geom_sf(aes(colour = Trips), size = 1) +
  scale_colour_viridis_c() +
  geom_sf(data = drive_net_major, colour = "black") +
  theme_void()

How does spatial disaggregation (jittering) work?

Source: (Lovelace, Félix, and Carlino 2022)

Origin and end point randomisation + disaggregation

Source: (Lovelace, Félix, and Carlino 2022)

Cross-language collaboration

Source: https://github.com/dabreegster/odjitter

From open source to open access

“In essence ‘open access’ goes beyond ‘open source’ in that users are not only given the option of viewing (potentially indecipherable) source code, but are encouraged to do so, with measures taken in the software itself, and the community that builds it, to make it more user-friendly.””

Source: (Lovelace, Parkin, and Cohen 2020)

From methods/software development to impact

Source: screenshot from development version of open source and open access Network Planning Tools for Scotland: https://nptscot.github.io/#/rnet/#9.29/55.9882/-3.4379

Rapid feedback loops and interactivity

Illustration of od2net client-side network generator (source: od2net.org)

Enabling participation

Source: acteng.github.io. Credit: Dustin Carlino (Alan Turing Institute and Active Travel England) and colleagues in ATE.

Future-proof transport planning

Source: situational-awareness.ai

Drivers of demand for transport planning

transport planning software was originally designed in the late 1950s and onwards to plan for

increased use of cars [for personal travel], and trucks for deliveries and goods movement Boyce and Williams (2015)

Thankfully that is no longer a priority:

Policy drivers have changed dramatically since then: climate change mitigation, air quality improvement and public health are prioritised in the emergent ‘sustainable mobility paradigm’ Lovelace (2021)

How could/should/will demand shift in the future?

Stages of open and reproducible science

  1. Open access to the publications

  2. Open access to sample (synthetic if sensitive) data

  3. Open access to the code

  4. Fully reproducible paper published with documentation

  5. Project deployed in tool for non-specialist use

Future-proofing workflows

Avoid stranded assets (tech debt)

Source: Semieniuk et al. (2022)

Conclusions

  • Reproducible research is a key part of future-proofing transport planning, for your work, and for the discipline as a whole

  • Open source software and open access tools are key to this, especially if you want to have humans in the loop

  • AI is not a panacea, and has its own environmental costs

  • In this context, there are some key desirable features of future-proof transport models and associated software and tools:

    • Open source
    • Open access
    • Reproducible
    • Human-in-the-loop
    • Easily adaptable to new data sources, methods, and demands

References

Boyce, David E., and Huw C. W. L. Williams. 2015. Forecasting Urban Travel: Past, Present and Future. Edward Elgar Publishing.
Lovelace, Robin. 2021. “Open Source Tools for Geographic Analysis in Transport Planning.” Journal of Geographical Systems, January. https://doi.org/10.1007/s10109-020-00342-2.
Lovelace, Robin, Rosa Félix, and Dustin Carlino. 2022. “Jittering: A Computationally Efficient Method for Generating Realistic Route Networks from Origin-Destination Data.” Findings, April, 33873. https://doi.org/10.32866/001c.33873.
Lovelace, Robin, John Parkin, and Tom Cohen. 2020. “Open Access Transport Models: A Leverage Point in Sustainable Transport Planning.” Transport Policy 97 (October): 47–54. https://doi.org/10.1016/j.tranpol.2020.06.015.
Lovelace, Robin, Martijn Tennekes, and Dustin Carlino. 2022. “ClockBoard: A Zoning System for Urban Analysis.” Journal of Spatial Information Science, no. 24 (June): 63–85. https://doi.org/10.5311/JOSIS.2022.24.172.
Peng, Roger D. 2011. “Reproducible Research in Computational Science.” Science (New York, N.y.) 334 (6060): 1226–27. https://doi.org/10.1126/science.1213847.
Raff, Edward. 2023. “Does the Market of Citations Reward Reproducible Work?” In, 8996. ACM REP ’23. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3589806.3600041.
Semieniuk, Gregor, Philip B. Holden, Jean-Francois Mercure, Pablo Salas, Hector Pollitt, Katharine Jobson, Pim Vercoulen, Unnada Chewpreecha, Neil R. Edwards, and Jorge E. Viñuales. 2022. “Stranded Fossil-Fuel Assets Translate to Major Losses for Investors in Advanced Economies.” Nature Climate Change 12 (6): 532–38. https://doi.org/10.1038/s41558-022-01356-y.

Appendix: What if the machines do take over?

  • Would you want the AIs to be trained on your work?
    • Initial thought: no way, that’s my data!
    • Second thought: if the AIs are going to take over, they might as well be well-informed, by good information and good intentions!
  • Would you want to be able to understand how the AIs work?

AI takeover?

Where are we headed with AI?

Source: https://situational-awareness.ai

The limits of AI

Nezhurina (N.D.)

Environmental costs of ‘AI’

IT sector was already poised to become a decarbonisation bottleneck

Source: theregister.com and Gupta et al. (2021)

Factoring-in build-out of AI data centres