Files
meubility-workbench/MVP_ROADMAP.md
2026-07-01 23:29:51 +02:00

8.1 KiB

MVP roadmap

Last updated: 2026-07-01

See also docs/backlog.md for the prioritized engineering backlog, caveats, and open optimization list.

Objective

Build an internal management workbench that turns public mobility data into a normalized, auditable, coverage-scored dataset for a future traveller-facing web/native app.

The workbench stays distinct from the public app. Its users are data engineers, analysts, and operations staff who need to ingest, inspect, link, correct, route against, and publish mobility data.

Current prototype: implemented

The repository has moved beyond the original SQLite/Berlin prototype. The current development path is Germany-scale and PostGIS-first, while SQLite remains useful as a legacy/test fallback.

Implemented:

source registry and source catalog
local source cache
job queue with job events and worker process
PostgreSQL/PostGIS runtime support with SQLite fallback
GTFS static importer for large national feeds
OSM PBF import path for Germany-scale extracts
OSM address index and address-aware journey endpoints
canonical stop/station linking from GTFS and OSM
automatic GTFS <-> OSM route matching
manual route and canonical-stop rule persistence
visual route-layer builder from OSM routes and GTFS shapes
walk/drive routing layer from OSM-derived routing graph
progressive journey-search API and UI polling
map right-click "from here" / "to here"
management UI with map, sources, stats, jobs, matches, search, and journeys
separate GTFS Harmonization and Mapping Data source modules in the UI
generic job-details overlay with phase timeline, event log, and queue snapshot
QA dashboard skeleton for source/import/link/route/publication health
GTFS harmonization concept and service-boundary decision
CLI commands
tests and syntax checks for changed modules

Recent fixes:

PostgreSQL startup avoids unnecessary DDL when PostGIS columns/indexes already exist.
Queue route-layer rebuild can be claimed by a real worker instead of staying queued behind a stale worker pid.
Timetable routing no longer requires visual route-pattern trip links.
Walk-leg route geometry has a short-lived in-process cache.
Address search is bbox-aware without being bbox-limited.
Job rows expose a details overlay that polls job events only while open.
Journey routing consumes the active harmonized GTFS snapshot instead of a raw feed picker.

Current prototype: known limits

The app can import and inspect Germany-scale OSM and GTFS, but the routing and route-layer rebuild paths are still prototype-grade.

Important limits:

journey search is not yet RAPTOR/CSA or connection-scan based
address endpoints can multiply transit searches through several nearby access/egress stops
progressive transfer stages still recompute too much
route-layer rebuild is coarse-grained and rewrites derived link tables
visual route-pattern links are not yet incrementally updated
canonical stop extraction is CPU/memory heavy on national feeds
route geometry cannot yet classify temporary GTFS detours as separate variants
local-transport-only routing is not a first-class query mode
route-search caches are process-local and not persisted
Alembic migrations are still missing

MVP 1: stable Germany data workbench

Backend

  • Add proper Alembic migrations for PostgreSQL and keep SQLite test support.
  • Add source-run history and dataset-version comparison.
  • Make route-layer rebuild incremental: update only affected matches/patterns/stops.
  • Keep old route-layer tables readable while a rebuild prepares replacement rows.
  • Add source health checks: download success, hash change, feed freshness, calendar validity.
  • Expand the QA dashboard into drill-down review queues for source health, GTFS validation, canonical stop conflicts, route conflicts, and publication blockers.
  • Add GTFS validation summary reports: service dates, route direction coverage, stop coordinate outliers, bad stop_times, missing shapes.
  • Add database maintenance jobs: analyze, vacuum, stale job recovery, orphan cleanup.
  • Add durable cache tables for journey stages, nearest stops, address access candidates, and common station-to-station searches.

Routing

  • Replace the demo round-expansion router with a GTFS-appropriate algorithm such as RAPTOR or CSA.
  • Precompute transfer graph edges: station-internal transfers, nearby walking transfers, and access/egress stop candidates.
  • Add routing profiles:
fastest public transport
fewest transfers
local transport only / Deutschlandticket-like
walk only
drive
car comparison
  • Treat access/egress walking as access legs, not as public-transport transfers.
  • Add bounded hub-aware long-distance routing for city-to-city requests: local access to likely hubs, long-distance/regional trunk, local egress.
  • Add arrive-by search and better stop conditions for "good enough" results.
  • Add route diagnostics that explain why a route was found or pruned.

Frontend

  • Add source detail page.
  • Add dataset detail page.
  • Add match-review queue with filters by mode, operator, country, confidence, and source scope.
  • Add route detail inspection: GTFS geometry, OSM geometry, candidate matches, stops, evidence, and route-pattern provenance.
  • Add canonical stop/station detail overlay.
  • Add persistent rule editor.
  • Add routing controls for profile, transfer buffer, avoid/prefer modes, arrive-by, via, and local-only.
  • Show partial/progressive route results with clear stage labels.

Data outputs

  • GeoJSON exports for small regions.
  • GeoParquet exports for analysis.
  • PMTiles/vector-tile export for map display.
  • Coverage CSV/API for downstream services.

MVP 2: Europe-scale coverage map

  • Use Geofabrik country/Europe extracts and reproducible OSM PBF jobs.
  • Store OSM transport features, addresses, and routing graph in PostGIS.
  • Generate ranked/generalized transport route layers by zoom level.
  • Serve tiles with Martin or export PMTiles.
  • Add coverage statuses:
existing_in_osm
static_timetable_covered
live_data_covered
fare_data_covered
booking_covered
missing_static
stale_feed
restricted_license
low_confidence_match
detour_or_temporary_variant
  • Add coverage metrics:
operator coverage
route coverage
route-km coverage
stop coverage
live-data coverage
feed freshness
license confidence
booking coverage
route-layer provenance coverage

MVP 3: more source formats

Add importers:

NeTEx
TransXChange
SIRI discovery/live endpoints
GTFS-Realtime
GBFS for shared mobility, optional
operator CSV/API adapters

Target data model:

canonical operators
canonical stops/stations/terminals
canonical routes
route variants
trip patterns
calendar/service validity
transfers
access/egress legs
coverage observations
source evidence
manual rules

MVP 4: production journey-planning dataset

  • Build a canonical stop/station graph with transfer rules and transfer-time profiles.
  • Generate timetable-routing input for RAPTOR/CSA.
  • Add first/last-mile routing from OSM walk/drive graph.
  • Add emissions factors per mode/operator/country.
  • Add fare/ticket placeholders and booking/deep-link metadata.
  • Add confidence and provenance to every derived route/journey.

MVP 5: booking-readiness layer

  • Track booking availability separately from timetable coverage.
  • Add deep-link metadata per operator/route.
  • Add partner API adapters later.
  • Distinguish clearly:
travel-plausible itinerary
bookable itinerary
single-interface multi-booking
protected through-ticket
  1. Finish route-layer rebuild resilience: incremental updates, shadow tables, and detour/provenance classification.
  2. Replace or heavily optimize journey routing: precomputed transfers, hub-aware long-distance routing, local-only profile, and bounded search.
  3. Add durable PostgreSQL-backed journey caches for address access, stop pairs, and repeated stage searches.
  4. Add Alembic migrations and remove runtime DDL from normal request/worker startup.
  5. Add route/journey diagnostics so slow or failed requests explain what was searched and pruned.
  6. Add vector-tile output for route layers and large map rendering.