# Mobility Workbench Working prototype for a mobility-data management interface and pipeline. It is intentionally small but executable. The current implementation lets you: - register data sources; - download/copy source files into a local cache; - import GTFS static timetable feeds; - import raw OSM PBF extracts by deriving transport GeoJSON; - import OSM-derived transport GeoJSON; - persist raw datasets and normalized route/stop records; - run automatic GTFS-route ↔ OSM-route matching; - persist manual accept/reject rules from the UI; - expose GeoJSON layers for a zoomable map; - use a management web UI with separate GTFS Harmonization and Mapping Data modules, plus source runs, stats, matches, and map inspection. The default database is SQLite so the prototype runs immediately. The schema is kept simple enough to migrate to PostGIS when the pipeline needs European scale, vector tiles, and spatial indexes. ## Quick start ```bash cd mobility-workbench python -m venv .venv source .venv/bin/activate pip install -r requirements.txt python -m app.cli load-sample uvicorn app.main:app --reload ``` Open: ```text http://127.0.0.1:8000 ``` The sample project loads a small Berlin-like GTFS feed plus an OSM-like GeoJSON network. It imports routes/stops, runs the matcher, and shows matched and missing coverage on the map. ## PostgreSQL/PostGIS SQLite remains the default. For Germany-scale imports, point `DATABASE_URL` at PostgreSQL: ```bash export DATABASE_URL=postgresql://USER:PASSWORD@localhost:5432/meubility python -m app.cli init-db uvicorn app.main:app --reload ``` PostgreSQL mode automatically creates `postgis` and `pg_trgm`, stores GTFS `stop_times` and OSM features in main tables, and uses GiST/trigram indexes for map bbox queries, route-layer stop linking, and search filters. To keep using legacy sidecars with PostgreSQL, set: ```bash export POSTGRES_USE_SIDECARS=true ``` To migrate the existing SQLite project into a fresh PostgreSQL database: ```bash python scripts/migrate_sqlite_to_postgres.py \ --sqlite-path data/workbench.sqlite \ --postgres-url postgresql://USER:PASSWORD@localhost:5432/meubility \ --reset ``` The migration copies normal tables first, imports legacy GTFS/OSM sidecars into PostgreSQL main tables, rewrites dataset storage metadata to `main`, refreshes PostGIS geometry columns, and rebuilds runtime indexes. ## Docker start ```bash docker compose up --build ``` Then open: ```text http://127.0.0.1:8000 ``` ## CLI commands ```bash python -m app.cli init-db python -m app.cli reset-db python -m app.cli load-sample python -m app.cli stats python -m app.cli add-source --name "My GTFS" --kind gtfs --url ./data/feed.zip --country DE python -m app.cli add-source --name "VBB Online GTFS" --kind gtfs --url https://unternehmen.vbb.de/fileadmin/user_upload/VBB/Dokumente/API-Datensaetze/gtfs-mastscharf/GTFS.zip --country DE --license "CC BY 4.0" python -m app.cli add-source --name "DB Long-distance Rail GTFS.DE" --kind gtfs --url https://download.gtfs.de/germany/fv_free/latest.zip --country DE --license "Creative Commons 4.0" python -m app.cli add-source --name "Germany Regional Rail GTFS.DE" --kind gtfs --url https://download.gtfs.de/germany/rv_free/latest.zip --country DE --license "Creative Commons 4.0" python -m app.cli add-source --name "Berlin OSM" --kind osm_pbf --url https://download.geofabrik.de/europe/germany/berlin-latest.osm.pbf --country DE --license ODbL python -m app.cli run-source 1 python -m app.cli run-match python -m app.cli prune-cache --dry-run python -m app.cli prune-cache ``` ## HTTP API Core endpoints: ```text GET /api/sources POST /api/sources POST /api/sources/{source_id}/run POST /api/sample/reset POST /api/match/run GET /api/stats GET /api/matches POST /api/matches/{match_id}/accept POST /api/matches/{match_id}/reject GET /api/rules POST /api/rules ``` Map layers: ```text GET /api/map/osm_routes.geojson GET /api/map/osm_stops.geojson GET /api/map/gtfs_routes.geojson GET /api/map/gtfs_stops.geojson GET /api/map/matched_gtfs_routes.geojson GET /api/map/matched_gtfs_routes.geojson?status=missing ``` Map endpoints accept viewport and layer filters: ```text bbox=min_lon,min_lat,max_lon,max_lat zoom=13 kind=route,infra,stop,station,terminal mode=bus,tram,train,subway,light_rail,ferry geometry=point,line,polygon,nonpoint source_id=4 dataset_id=5 limit=5000 ``` ## Source types implemented ### `gtfs` Expected input: GTFS static zip. Imported files: ```text agency.txt stops.txt routes.txt trips.txt stop_times.txt shapes.txt, if available ``` The importer stores agencies, stops, routes, trips, limited stop-times, and representative route geometries. Route geometry comes from `shapes.txt` where available; otherwise it falls back to stop sequences from a representative trip. Multiple GTFS sources can be active at once. Map endpoints and layer controls keep sources separate with `source_id` filters, so VBB, DB long-distance rail, DB/regional rail, and local sample feeds can be rendered independently. The journey UI routes against the active harmonized transit snapshot instead of exposing a raw GTFS source selector. Feed-level filters remain available for map layers, QA, and source diagnostics. ### `osm_pbf` Expected input: an OSM `.osm.pbf` extract, for example a Geofabrik regional extract. The importer records the downloaded/copied file once as an immutable raw dataset with kind `osm_pbf_raw`. For `.osm.pbf` inputs it then runs `scripts/osmium_transport_filter.sh` and stores one transport-only extract as `osm_pbf_transport`. The Python extractor reads that filtered extract, writes `transport.geojson`, and imports it through the `osm_geojson` importer. The raw and filtered datasets are inactive storage stages; the derived `osm_geojson` dataset is the active visual layer. Re-running an unchanged source reuses the existing raw, filtered, and derived datasets instead of duplicating the extract. The extractor emits: ```text route relations as LineString/MultiLineString features built from member ways rail/tram/subway/ferry/aerialway infrastructure ways stations, stops, platforms, bus stations, and ferry terminals ``` Route display uses OSM route relation member ways, not stop-to-stop straight-line interpolation. ### `osm_geojson` Expected input: GeoJSON `FeatureCollection` containing OSM-derived route/station/stop/terminal features. Minimum useful properties for route features: ```json { "osm_type": "relation", "osm_id": "12345", "type": "route", "route": "train", "ref": "RE1", "name": "RE1 Example Line", "operator": "Example Operator", "network": "Example Network" } ``` Supported route modes include: ```text train, light_rail, subway, tram, bus, trolleybus, coach, ferry, monorail, funicular, aerialway ``` ## Matching logic The current automatic matcher scores each GTFS route against OSM route features using: ```text mode compatibility route ref similarity route name similarity operator/network similarity bbox overlap or proximity, used as a major disambiguator for common refs GTFS/OSM geometry proximity, where both geometries are available same normalized route key ``` Each match also stores a scope classification: ```text in_osm_scope near_osm_scope outside_osm_scope unknown_scope ``` Overall coverage and in-scope coverage are intentionally separate. A GTFS route outside the loaded OSM extract should not be interpreted as a failed route match. Status thresholds: ```text >= 85 matched 65–84 probable 40–64 weak < 40 missing ``` Manual accept/reject actions are stored as `match_rules`. The current prototype records the rule; the next implementation step is applying those rules automatically before/after every matching run. The route layer treats OSM route geometry as the visual authority when a suitable match exists. Multiple GTFS timetable shapes or trips, including opposite directions, can link to the same OSM-backed `RoutePattern`; each GTFS shape link keeps its own match and direction evidence. When no OSM route matches, the builder creates a `gtfs_proposed` visual pattern from GTFS geometry for review. ## Data flow ```text source registration → local source cache → dataset record with hash → raw OSM commit, if source is osm_pbf → filtered transport extract, if source is osm_pbf and prefiltering is enabled → derived transport GeoJSON extraction, if source is osm_pbf → normalized GTFS / OSM tables → route matching → canonical stops and OSM-authoritative route layer → manual review rules → GeoJSON map layers → downstream routing/coverage/tile generation ``` ## Current limitations - PostgreSQL/PostGIS is supported for large local imports; vector tiles are still the next step for country/Europe-scale browsing. - OSM PBF snapshot extraction is implemented; applying replication `.osc.gz` diffs onto prior raw snapshots is still a next step. - GTFS-RT, SIRI, NeTEx, TransXChange, OSDM, fares, and booking APIs are not yet implemented. - The matcher is deliberately transparent rather than sophisticated. - The frontend requests viewport-bounded GeoJSON by layer; vector tiles are still the next step for country/Europe scale. ## OSM extraction helper A starter Osmium shell filter script is included: ```bash scripts/osmium_transport_filter.sh europe-latest.osm.pbf transport.osm.pbf ``` The script calls Osmium through `scripts/host_tool.sh`, which also works from a Flatpak/containerized terminal when `flatpak-spawn --host` is available. The app has a Python Osmium-based `osm_pbf` importer for repeatable prototype runs. For the next stage, add OSM replication diff application, move large-region imports to PostGIS, and serve generalized vector tiles where network editing requires broad viewport rendering. ## Tests ```bash pytest -q ```