9.6 KiB
Mobility Workbench
Working prototype for a mobility-data management interface and pipeline.
It is intentionally small but executable. The current implementation lets you:
- register data sources;
- download/copy source files into a local cache;
- import GTFS static timetable feeds;
- import raw OSM PBF extracts by deriving transport GeoJSON;
- import OSM-derived transport GeoJSON;
- persist raw datasets and normalized route/stop records;
- run automatic GTFS-route ↔ OSM-route matching;
- persist manual accept/reject rules from the UI;
- expose GeoJSON layers for a zoomable map;
- use a management web UI with separate GTFS Harmonization and Mapping Data modules, plus source runs, stats, matches, and map inspection.
The default database is SQLite so the prototype runs immediately. The schema is kept simple enough to migrate to PostGIS when the pipeline needs European scale, vector tiles, and spatial indexes.
Quick start
cd mobility-workbench
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m app.cli load-sample
uvicorn app.main:app --reload
Open:
http://127.0.0.1:8000
The sample project loads a small Berlin-like GTFS feed plus an OSM-like GeoJSON network. It imports routes/stops, runs the matcher, and shows matched and missing coverage on the map.
PostgreSQL/PostGIS
SQLite remains the default. For Germany-scale imports, point DATABASE_URL at PostgreSQL:
export DATABASE_URL=postgresql://USER:PASSWORD@localhost:5432/meubility
python -m app.cli init-db
uvicorn app.main:app --reload
PostgreSQL mode automatically creates postgis and pg_trgm, stores GTFS stop_times and OSM features in main tables, and uses GiST/trigram indexes for map bbox queries, route-layer stop linking, and search filters. To keep using legacy sidecars with PostgreSQL, set:
export POSTGRES_USE_SIDECARS=true
To migrate the existing SQLite project into a fresh PostgreSQL database:
python scripts/migrate_sqlite_to_postgres.py \
--sqlite-path data/workbench.sqlite \
--postgres-url postgresql://USER:PASSWORD@localhost:5432/meubility \
--reset
The migration copies normal tables first, imports legacy GTFS/OSM sidecars into PostgreSQL main tables, rewrites dataset storage metadata to main, refreshes PostGIS geometry columns, and rebuilds runtime indexes.
Docker start
docker compose up --build
Then open:
http://127.0.0.1:8000
CLI commands
python -m app.cli init-db
python -m app.cli reset-db
python -m app.cli load-sample
python -m app.cli stats
python -m app.cli add-source --name "My GTFS" --kind gtfs --url ./data/feed.zip --country DE
python -m app.cli add-source --name "VBB Online GTFS" --kind gtfs --url https://unternehmen.vbb.de/fileadmin/user_upload/VBB/Dokumente/API-Datensaetze/gtfs-mastscharf/GTFS.zip --country DE --license "CC BY 4.0"
python -m app.cli add-source --name "DB Long-distance Rail GTFS.DE" --kind gtfs --url https://download.gtfs.de/germany/fv_free/latest.zip --country DE --license "Creative Commons 4.0"
python -m app.cli add-source --name "Germany Regional Rail GTFS.DE" --kind gtfs --url https://download.gtfs.de/germany/rv_free/latest.zip --country DE --license "Creative Commons 4.0"
python -m app.cli add-source --name "Berlin OSM" --kind osm_pbf --url https://download.geofabrik.de/europe/germany/berlin-latest.osm.pbf --country DE --license ODbL
python -m app.cli run-source 1
python -m app.cli run-match
python -m app.cli prune-cache --dry-run
python -m app.cli prune-cache
HTTP API
Core endpoints:
GET /api/sources
POST /api/sources
POST /api/sources/{source_id}/run
POST /api/sample/reset
POST /api/match/run
GET /api/stats
GET /api/matches
POST /api/matches/{match_id}/accept
POST /api/matches/{match_id}/reject
GET /api/rules
POST /api/rules
Map layers:
GET /api/map/osm_routes.geojson
GET /api/map/osm_stops.geojson
GET /api/map/gtfs_routes.geojson
GET /api/map/gtfs_stops.geojson
GET /api/map/matched_gtfs_routes.geojson
GET /api/map/matched_gtfs_routes.geojson?status=missing
Map endpoints accept viewport and layer filters:
bbox=min_lon,min_lat,max_lon,max_lat
zoom=13
kind=route,infra,stop,station,terminal
mode=bus,tram,train,subway,light_rail,ferry
geometry=point,line,polygon,nonpoint
source_id=4
dataset_id=5
limit=5000
Source types implemented
gtfs
Expected input: GTFS static zip.
Imported files:
agency.txt
stops.txt
routes.txt
trips.txt
stop_times.txt
shapes.txt, if available
The importer stores agencies, stops, routes, trips, limited stop-times, and representative route geometries. Route geometry comes from shapes.txt where available; otherwise it falls back to stop sequences from a representative trip.
Multiple GTFS sources can be active at once. Map endpoints and layer controls keep sources separate with source_id filters, so VBB, DB long-distance rail, DB/regional rail, and local sample feeds can be rendered independently.
The journey UI routes against the active harmonized transit snapshot instead of exposing a raw GTFS source selector. Feed-level filters remain available for map layers, QA, and source diagnostics.
osm_pbf
Expected input: an OSM .osm.pbf extract, for example a Geofabrik regional extract.
The importer records the downloaded/copied file once as an immutable raw dataset with kind osm_pbf_raw. For .osm.pbf inputs it then runs scripts/osmium_transport_filter.sh and stores one transport-only extract as osm_pbf_transport. The Python extractor reads that filtered extract, writes transport.geojson, and imports it through the osm_geojson importer.
The raw and filtered datasets are inactive storage stages; the derived osm_geojson dataset is the active visual layer. Re-running an unchanged source reuses the existing raw, filtered, and derived datasets instead of duplicating the extract.
The extractor emits:
route relations as LineString/MultiLineString features built from member ways
rail/tram/subway/ferry/aerialway infrastructure ways
stations, stops, platforms, bus stations, and ferry terminals
Route display uses OSM route relation member ways, not stop-to-stop straight-line interpolation.
osm_geojson
Expected input: GeoJSON FeatureCollection containing OSM-derived route/station/stop/terminal features.
Minimum useful properties for route features:
{
"osm_type": "relation",
"osm_id": "12345",
"type": "route",
"route": "train",
"ref": "RE1",
"name": "RE1 Example Line",
"operator": "Example Operator",
"network": "Example Network"
}
Supported route modes include:
train, light_rail, subway, tram, bus, trolleybus, coach,
ferry, monorail, funicular, aerialway
Matching logic
The current automatic matcher scores each GTFS route against OSM route features using:
mode compatibility
route ref similarity
route name similarity
operator/network similarity
bbox overlap or proximity, used as a major disambiguator for common refs
GTFS/OSM geometry proximity, where both geometries are available
same normalized route key
Each match also stores a scope classification:
in_osm_scope
near_osm_scope
outside_osm_scope
unknown_scope
Overall coverage and in-scope coverage are intentionally separate. A GTFS route outside the loaded OSM extract should not be interpreted as a failed route match.
Status thresholds:
>= 85 matched
65–84 probable
40–64 weak
< 40 missing
Manual accept/reject actions are stored as match_rules. The current prototype records the rule; the next implementation step is applying those rules automatically before/after every matching run.
The route layer treats OSM route geometry as the visual authority when a suitable match exists. Multiple GTFS timetable shapes or trips, including opposite directions, can link to the same OSM-backed RoutePattern; each GTFS shape link keeps its own match and direction evidence. When no OSM route matches, the builder creates a gtfs_proposed visual pattern from GTFS geometry for review.
Data flow
source registration
→ local source cache
→ dataset record with hash
→ raw OSM commit, if source is osm_pbf
→ filtered transport extract, if source is osm_pbf and prefiltering is enabled
→ derived transport GeoJSON extraction, if source is osm_pbf
→ normalized GTFS / OSM tables
→ route matching
→ canonical stops and OSM-authoritative route layer
→ manual review rules
→ GeoJSON map layers
→ downstream routing/coverage/tile generation
Current limitations
- PostgreSQL/PostGIS is supported for large local imports; vector tiles are still the next step for country/Europe-scale browsing.
- OSM PBF snapshot extraction is implemented; applying replication
.osc.gzdiffs onto prior raw snapshots is still a next step. - GTFS-RT, SIRI, NeTEx, TransXChange, OSDM, fares, and booking APIs are not yet implemented.
- The matcher is deliberately transparent rather than sophisticated.
- The frontend requests viewport-bounded GeoJSON by layer; vector tiles are still the next step for country/Europe scale.
OSM extraction helper
A starter Osmium shell filter script is included:
scripts/osmium_transport_filter.sh europe-latest.osm.pbf transport.osm.pbf
The script calls Osmium through scripts/host_tool.sh, which also works from a Flatpak/containerized terminal when flatpak-spawn --host is available. The app has a Python Osmium-based osm_pbf importer for repeatable prototype runs. For the next stage, add OSM replication diff application, move large-region imports to PostGIS, and serve generalized vector tiles where network editing requires broad viewport rendering.
Tests
pytest -q