Files
meubility-workbench/docs/backlog.md
2026-07-01 23:29:51 +02:00

14 KiB

Product and Engineering Backlog

Last updated: 2026-07-01

This backlog reflects the current Germany-scale PostGIS prototype. The target remains a Europe-scale mobility data workbench that builds canonical stops, stations, routes, route geometry, timetable links, transfer rules, routing graph data, address search, and coverage evidence from many public sources.

OSM-derived geometry is the preferred visual authority. GTFS, NeTEx, realtime, and official APIs are timetable, validation, routing, and gap-detection inputs. GTFS shapes are still valuable evidence, especially for missing OSM relations and temporary detours.

Current State

  • PostgreSQL/PostGIS is the active development database path; SQLite remains a legacy/test fallback.
  • Germany OSM and Germany GTFS/DELFI-scale imports are supported.
  • OSM address indexing is available and address search is bbox-aware without being bbox-limited.
  • Jobs and job events exist for imports, route matching, route-layer rebuilds, address indexing, relabeling, deletes, and maintenance.
  • Job rows expose a generic details overlay with planned/current/done phases, event log, metadata, and a compact queue snapshot.
  • A first QA dashboard skeleton exists for source discovery, import health, GTFS validation, canonical stop/link coverage, route matching, and publication readiness.
  • The GTFS harmonization target architecture is documented in docs/gtfs_harmonization.md.
  • GTFS source management is presented as a separate GTFS Harmonization UI module; OSM/map inputs are presented as a separate Mapping Data module.
  • Journey search consumes the active harmonized transit snapshot instead of exposing a raw GTFS source selector.
  • Route-layer rebuild runs through the queue, but it is still coarse-grained and can take minutes on national datasets.
  • The route-layer builder links canonical GTFS stops, OSM stops, OSM route relations, GTFS route patterns, and trip-pattern links.
  • Journey search is progressive and can publish intermediate results, but the underlying routing algorithm is still a prototype.
  • Walk and drive routing use the OSM-derived routing layer when available.

Current Caveats

  • Journey search is not yet a full RAPTOR/CSA-style router.
  • Address endpoints can multiply the search space: current behavior can use up to 4 access stops and 4 egress stops, creating up to 16 transit stop-pair searches per transfer stage.
  • Progressive stages still recompute too much. Searching up to 2 transfers repeats direct and one-transfer work before deeper expansion.
  • Walking access/egress legs are represented separately in journey output, but the search engine still needs a cleaner transfer budget model where access/egress walking never consumes public-transport transfer count.
  • Route-search caches are in-process only. They do not survive server restart, do not deduplicate identical searches already running in another thread/process, and only help once a stage/search has completed.
  • Route-layer rebuild currently clears/rebuilds derived tables. Until the rebuild completes, visual route-pattern link tables can be incomplete.
  • Timetable reachability should not depend on visual route-pattern links. The code has been patched in this checkout, but a running server must reload before using that fix.
  • Canonical stop extraction on national feeds is CPU/memory heavy and does too much Python-side grouping.
  • OSM stop-linking and OSM route-candidate indexing are still large spatial/batch operations.
  • GTFS detours are not classified as first-class route variants yet.
  • Local-transport-only routing is not a first-class profile yet.
  • Proper Alembic migrations are still missing; runtime schema maintenance should be reduced to an explicit migration/maintenance path.
  • The source and job database tables are still shared between harmonization, mapping, and routing; the current split is a product/UI boundary, not a separate service or database boundary yet.

P0: Routing Performance and Correctness

These items directly address slow or failed searches such as Berlin, Alexanderplatz to Heidelberg, Blumenstrasse 36.

  • Replace the demo round-expansion router with a timetable-native algorithm. Preferred direction: RAPTOR or CSA over preloaded arrays/tables, with rounds representing public-transport boardings rather than ad hoc SQL expansion.
  • Precompute a transfer graph. Store station-internal transfers, nearby walking transfers, platform/stop-place links, and allowed transfer times by mode/source/station.
  • Separate access/egress from transfer count. Walking from an address to the first stop, and from the last stop to an address, should never count as a vehicle transfer.
  • Add a durable journey cache. Cache normalized requests, address-to-stop candidates, stop-to-stop stage results, common station-pair results, and in-flight request deduplication in PostgreSQL.
  • Add hub-aware long-distance routing. For long-distance OD pairs, search local access to likely hubs, trunk rail/regional candidates, then local egress. Candidate hubs can be ranked by station importance, service frequency, route scope, distance, and direction.
  • Add a local-transport-only profile. Implement a Deutschlandticket-like profile that excludes long-distance route scopes and still supports regional rail, S-Bahn, subway, tram, bus, ferry, and walking transfers.
  • Add admissible pruning. Bound exploration by best known arrival, remaining distance, direction/off-course penalty, transfer budget, service frequency, and maximum tolerated detour.
  • Add journey diagnostics. Return searched stages, candidate counts, pruned reasons, access/egress stops, service date, source feeds, transfer stops, and whether no-route means no timetable path or a search limit was hit.
  • Add arrive-by search. This is important for route quality and for comparing against operator/DB route planners.
  • Add route profile controls in the UI. fastest, earliest arrival, fewest transfers, local only, walk, drive, arrive by, via, avoid, and transfer buffer controls.

P0: Queue and Rebuild Robustness

  • Move runtime schema maintenance out of normal app startup. The current checkout avoids redundant PostgreSQL DDL, but explicit migrations are still needed.
  • Add Alembic migrations. Use migrations for PostGIS columns, indexes, route-layer tables, routing tables, and cache tables.
  • Make route-layer rebuild use shadow tables or versioned rows. Build replacement rows without deleting the readable active layer first; atomically promote the new version when complete.
  • Make route-layer rebuild incremental. Rebuild only affected route patterns after new matches, stop-link decisions, source updates, or OSM diffs.
  • Add stale worker and stale pid reconciliation. Worker status should never report a pid as running unless the current server can verify it.
  • Improve cancellation. Long PostgreSQL statements need cancellable phases and visible progress rather than only a queued/running state.
  • Improve progress granularity and timings. The UI can display job events now, but long PostgreSQL statements still need finer checkpoints, elapsed times, estimated remaining work, and cancellable sub-phases.

P1: Route Layer, Detours, and Geometry Provenance

  • Classify GTFS route variants. Group trips by route, direction, shape, stop sequence, service date span, and trip frequency. Mark rare/temporary shapes as detours or temporary variants rather than replacing the canonical visual route.
  • Add stop-by-stop OSM path fallback. When an OSM route relation is missing or a GTFS shape is a detour, assemble geometry between matched consecutive stops using mode-constrained OSM paths.
  • Cache stop-to-stop route geometry. Key by mode, from canonical stop, to canonical stop, direction constraints, and graph version.
  • Store geometry provenance per route pattern. Examples: osm_route_relation, gtfs_shape, stop_to_stop_osm_path, manual_override, detour_variant.
  • Respect directionality. Bus/car paths need oneway handling; tram/rail paths need topology and direction evidence; reverse links must not be assumed valid.
  • Add route-pattern detail inspection. Show OSM geometry, GTFS shapes, linked trips, linked stops, direction evidence, confidence, and variant/detour status.
  • Add generalized route geometries. Store high-detail inspection geometry and simplified map geometry.

P1: Canonical Stops, Stations, and Addresses

  • Optimize canonical stop extraction. Push more grouping/linking into SQL, avoid loading all scheduled stops into Python, batch inserts, and keep stable canonical IDs when possible.
  • Build a canonical stop alias table. Persist normalized names, multilingual names, station codes, IBNR/EVA/UIC/IFOPT, stop_area IDs, OSM IDs, and source-specific aliases.
  • Improve station-complex modeling. Separate public stop place, station complex, platforms/tracks, entrances, bus bays, and nearby stop groups.
  • Add canonical stop detail overlay. Show linked GTFS stops, linked OSM stops/stations, source names, confidence, distances, and manual overrides.
  • Add manual canonical stop link/unlink decisions. Persist stop matching decisions like route matching decisions, so source updates do not overwrite reviewed links.
  • Improve address result folding. Prefer street-level suggestions for dense house-number ranges, but preserve exact address selection when a full address is typed.
  • Precompute address access candidates. Store nearest useful public-transport stops per address/street point, with mode/source/radius metadata.

P1: More GTFS Sources and Deduplication

  • Import more GTFS feeds where they improve authority or coverage. DB long-distance/regional feeds, state feeds, and neighboring-country feeds touching Germany are useful test cases.
  • Add source priority and authority ranking. Decide which source is more authoritative for stops, operators, routes, calendars, and geometry evidence.
  • Deduplicate operators/agencies. Merge agency/operator records with provenance and aliases instead of treating each GTFS agency.txt row as a separate operator.
  • Turn QA summary counters into review queues. Drill down from each bad/warn metric into concrete sources, stops, routes, links, and conflicts.
  • Add GTFS feed QA reports. Calendar coverage, stale feeds, missing shapes, impossible stop times, duplicate routes, route direction coverage, stop coordinate outliers.
  • Add conflict dashboards and reusable resolution workflows. Show canonical stops/routes with competing source claims, weak matches, missing visual geometry, authority-rule conflicts, and license blockers.

P1: Scalable OSM and Map Outputs

  • Keep OSM PBF import chunked and resumable. Keep previous active visual datasets available while the next import builds.
  • Add vector tile or PMTiles export. Needed for Germany/Europe route layers and dense editing views.
  • Add route-scope and mode-specific map generalization. Different zooms should use different detail levels and route classes.
  • Improve OSM route candidate indexing. Use stronger SQL/PostGIS filtering before loading route geometry into Python.
  • Add OSM diffs later. Minutely/hourly/daily diffs can update route and address layers without full country rebuilds.

P2: Data Platform Hardening

  • Add explicit read/write transaction boundaries for all long requests and jobs.
  • Add API pagination for large result sets.
  • Add import logs and source-run history.
  • Add database maintenance commands: analyze, vacuum, reindex, orphan cleanup.
  • Add test fixtures that do not mutate the live development database.
  • Add observability: query timings, job timings, row counts, cache hit rates, and per-stage routing metrics.

P2: Better Map and Editing Workflows

  • Add canonical stop and route detail side panels.
  • Add candidate map preview for stop matching, not only route matching.
  • Add unmatched/matched/weak/proposed visual layers with source filters.
  • Keep calculated journey geometry and stop markers always on top.
  • Add editable match queues for stops, station complexes, routes, and operators.
  • Add route-layer diff view after rebuilds.

P3: Additional Formats and Live Data

  • Add NeTEx import.
  • Add GTFS-Realtime ingestion for service alerts and trip updates.
  • Add SIRI profile support where national APIs expose it.
  • Add GBFS/shared mobility only after core public transport data is stable.
  • Model temporary closures and disruptions as validity-windowed events, not modifications to base route geometry.

Open Optimization List

Not yet implemented, or only partially implemented:

  • RAPTOR/CSA routing core.
  • Precomputed public-transport transfer graph.
  • Durable PostgreSQL route-search cache.
  • In-flight identical search coalescing.
  • Hub-aware long-distance routing.
  • Local-transport-only routing profile.
  • Access/egress legs excluded from transfer budget at the search-state level.
  • Better pruning for off-course exploration and dominated labels.
  • SQL/array-based canonical stop extraction.
  • Incremental route-layer rebuild.
  • Route-layer shadow tables/versioned activation.
  • Stop-to-stop OSM route fallback for missing routes and detours.
  • Detour/temporary variant classification.
  • PostGIS-first OSM route candidate filtering.
  • Vector tiles or PMTiles for large route layers.
  • Alembic migrations.
  • Persistent query/stage timing diagnostics.
  1. Finish the route-layer rebuild currently in progress and verify route-pattern/trip-pattern link counts.
  2. Restart/reload the server so it picks up the current checkout fixes.
  3. Add route-search diagnostics and timing instrumentation around address access, direct, one-transfer, and round-search stages.
  4. Implement transfer graph precomputation and exclude access/egress walking from transfer count.
  5. Add a hub-aware city-to-city search path for long-distance requests.
  6. Add a local-only routing profile using route scopes.
  7. Convert route-layer rebuild to shadow/versioned tables or incremental updates.
  8. Add Alembic migrations and stop doing routine schema checks during normal app/worker startup.