14 KiB
Product and Engineering Backlog
Last updated: 2026-07-01
This backlog reflects the current Germany-scale PostGIS prototype. The target remains a Europe-scale mobility data workbench that builds canonical stops, stations, routes, route geometry, timetable links, transfer rules, routing graph data, address search, and coverage evidence from many public sources.
OSM-derived geometry is the preferred visual authority. GTFS, NeTEx, realtime, and official APIs are timetable, validation, routing, and gap-detection inputs. GTFS shapes are still valuable evidence, especially for missing OSM relations and temporary detours.
Current State
- PostgreSQL/PostGIS is the active development database path; SQLite remains a legacy/test fallback.
- Germany OSM and Germany GTFS/DELFI-scale imports are supported.
- OSM address indexing is available and address search is bbox-aware without being bbox-limited.
- Jobs and job events exist for imports, route matching, route-layer rebuilds, address indexing, relabeling, deletes, and maintenance.
- Job rows expose a generic details overlay with planned/current/done phases, event log, metadata, and a compact queue snapshot.
- A first QA dashboard skeleton exists for source discovery, import health, GTFS validation, canonical stop/link coverage, route matching, and publication readiness.
- The GTFS harmonization target architecture is documented in
docs/gtfs_harmonization.md. - GTFS source management is presented as a separate
GTFS HarmonizationUI module; OSM/map inputs are presented as a separateMapping Datamodule. - Journey search consumes the active harmonized transit snapshot instead of exposing a raw GTFS source selector.
- Route-layer rebuild runs through the queue, but it is still coarse-grained and can take minutes on national datasets.
- The route-layer builder links canonical GTFS stops, OSM stops, OSM route relations, GTFS route patterns, and trip-pattern links.
- Journey search is progressive and can publish intermediate results, but the underlying routing algorithm is still a prototype.
- Walk and drive routing use the OSM-derived routing layer when available.
Current Caveats
- Journey search is not yet a full RAPTOR/CSA-style router.
- Address endpoints can multiply the search space: current behavior can use up to 4 access stops and 4 egress stops, creating up to 16 transit stop-pair searches per transfer stage.
- Progressive stages still recompute too much. Searching
up to 2 transfersrepeats direct and one-transfer work before deeper expansion. - Walking access/egress legs are represented separately in journey output, but the search engine still needs a cleaner transfer budget model where access/egress walking never consumes public-transport transfer count.
- Route-search caches are in-process only. They do not survive server restart, do not deduplicate identical searches already running in another thread/process, and only help once a stage/search has completed.
- Route-layer rebuild currently clears/rebuilds derived tables. Until the rebuild completes, visual route-pattern link tables can be incomplete.
- Timetable reachability should not depend on visual route-pattern links. The code has been patched in this checkout, but a running server must reload before using that fix.
- Canonical stop extraction on national feeds is CPU/memory heavy and does too much Python-side grouping.
- OSM stop-linking and OSM route-candidate indexing are still large spatial/batch operations.
- GTFS detours are not classified as first-class route variants yet.
- Local-transport-only routing is not a first-class profile yet.
- Proper Alembic migrations are still missing; runtime schema maintenance should be reduced to an explicit migration/maintenance path.
- The source and job database tables are still shared between harmonization, mapping, and routing; the current split is a product/UI boundary, not a separate service or database boundary yet.
P0: Routing Performance and Correctness
These items directly address slow or failed searches such as Berlin, Alexanderplatz to Heidelberg, Blumenstrasse 36.
- Replace the demo round-expansion router with a timetable-native algorithm. Preferred direction: RAPTOR or CSA over preloaded arrays/tables, with rounds representing public-transport boardings rather than ad hoc SQL expansion.
- Precompute a transfer graph. Store station-internal transfers, nearby walking transfers, platform/stop-place links, and allowed transfer times by mode/source/station.
- Separate access/egress from transfer count. Walking from an address to the first stop, and from the last stop to an address, should never count as a vehicle transfer.
- Add a durable journey cache. Cache normalized requests, address-to-stop candidates, stop-to-stop stage results, common station-pair results, and in-flight request deduplication in PostgreSQL.
- Add hub-aware long-distance routing. For long-distance OD pairs, search local access to likely hubs, trunk rail/regional candidates, then local egress. Candidate hubs can be ranked by station importance, service frequency, route scope, distance, and direction.
- Add a local-transport-only profile. Implement a Deutschlandticket-like profile that excludes long-distance route scopes and still supports regional rail, S-Bahn, subway, tram, bus, ferry, and walking transfers.
- Add admissible pruning. Bound exploration by best known arrival, remaining distance, direction/off-course penalty, transfer budget, service frequency, and maximum tolerated detour.
- Add journey diagnostics. Return searched stages, candidate counts, pruned reasons, access/egress stops, service date, source feeds, transfer stops, and whether no-route means no timetable path or a search limit was hit.
- Add arrive-by search. This is important for route quality and for comparing against operator/DB route planners.
- Add route profile controls in the UI.
fastest,earliest arrival,fewest transfers,local only,walk,drive,arrive by,via,avoid, and transfer buffer controls.
P0: Queue and Rebuild Robustness
- Move runtime schema maintenance out of normal app startup. The current checkout avoids redundant PostgreSQL DDL, but explicit migrations are still needed.
- Add Alembic migrations. Use migrations for PostGIS columns, indexes, route-layer tables, routing tables, and cache tables.
- Make route-layer rebuild use shadow tables or versioned rows. Build replacement rows without deleting the readable active layer first; atomically promote the new version when complete.
- Make route-layer rebuild incremental. Rebuild only affected route patterns after new matches, stop-link decisions, source updates, or OSM diffs.
- Add stale worker and stale pid reconciliation. Worker status should never report a pid as running unless the current server can verify it.
- Improve cancellation. Long PostgreSQL statements need cancellable phases and visible progress rather than only a queued/running state.
- Improve progress granularity and timings. The UI can display job events now, but long PostgreSQL statements still need finer checkpoints, elapsed times, estimated remaining work, and cancellable sub-phases.
P1: Route Layer, Detours, and Geometry Provenance
- Classify GTFS route variants. Group trips by route, direction, shape, stop sequence, service date span, and trip frequency. Mark rare/temporary shapes as detours or temporary variants rather than replacing the canonical visual route.
- Add stop-by-stop OSM path fallback. When an OSM route relation is missing or a GTFS shape is a detour, assemble geometry between matched consecutive stops using mode-constrained OSM paths.
- Cache stop-to-stop route geometry. Key by mode, from canonical stop, to canonical stop, direction constraints, and graph version.
- Store geometry provenance per route pattern.
Examples:
osm_route_relation,gtfs_shape,stop_to_stop_osm_path,manual_override,detour_variant. - Respect directionality. Bus/car paths need oneway handling; tram/rail paths need topology and direction evidence; reverse links must not be assumed valid.
- Add route-pattern detail inspection. Show OSM geometry, GTFS shapes, linked trips, linked stops, direction evidence, confidence, and variant/detour status.
- Add generalized route geometries. Store high-detail inspection geometry and simplified map geometry.
P1: Canonical Stops, Stations, and Addresses
- Optimize canonical stop extraction. Push more grouping/linking into SQL, avoid loading all scheduled stops into Python, batch inserts, and keep stable canonical IDs when possible.
- Build a canonical stop alias table. Persist normalized names, multilingual names, station codes, IBNR/EVA/UIC/IFOPT, stop_area IDs, OSM IDs, and source-specific aliases.
- Improve station-complex modeling. Separate public stop place, station complex, platforms/tracks, entrances, bus bays, and nearby stop groups.
- Add canonical stop detail overlay. Show linked GTFS stops, linked OSM stops/stations, source names, confidence, distances, and manual overrides.
- Add manual canonical stop link/unlink decisions. Persist stop matching decisions like route matching decisions, so source updates do not overwrite reviewed links.
- Improve address result folding. Prefer street-level suggestions for dense house-number ranges, but preserve exact address selection when a full address is typed.
- Precompute address access candidates. Store nearest useful public-transport stops per address/street point, with mode/source/radius metadata.
P1: More GTFS Sources and Deduplication
- Import more GTFS feeds where they improve authority or coverage. DB long-distance/regional feeds, state feeds, and neighboring-country feeds touching Germany are useful test cases.
- Add source priority and authority ranking. Decide which source is more authoritative for stops, operators, routes, calendars, and geometry evidence.
- Deduplicate operators/agencies.
Merge agency/operator records with provenance and aliases instead of treating each GTFS
agency.txtrow as a separate operator. - Turn QA summary counters into review queues. Drill down from each bad/warn metric into concrete sources, stops, routes, links, and conflicts.
- Add GTFS feed QA reports. Calendar coverage, stale feeds, missing shapes, impossible stop times, duplicate routes, route direction coverage, stop coordinate outliers.
- Add conflict dashboards and reusable resolution workflows. Show canonical stops/routes with competing source claims, weak matches, missing visual geometry, authority-rule conflicts, and license blockers.
P1: Scalable OSM and Map Outputs
- Keep OSM PBF import chunked and resumable. Keep previous active visual datasets available while the next import builds.
- Add vector tile or PMTiles export. Needed for Germany/Europe route layers and dense editing views.
- Add route-scope and mode-specific map generalization. Different zooms should use different detail levels and route classes.
- Improve OSM route candidate indexing. Use stronger SQL/PostGIS filtering before loading route geometry into Python.
- Add OSM diffs later. Minutely/hourly/daily diffs can update route and address layers without full country rebuilds.
P2: Data Platform Hardening
- Add explicit read/write transaction boundaries for all long requests and jobs.
- Add API pagination for large result sets.
- Add import logs and source-run history.
- Add database maintenance commands: analyze, vacuum, reindex, orphan cleanup.
- Add test fixtures that do not mutate the live development database.
- Add observability: query timings, job timings, row counts, cache hit rates, and per-stage routing metrics.
P2: Better Map and Editing Workflows
- Add canonical stop and route detail side panels.
- Add candidate map preview for stop matching, not only route matching.
- Add unmatched/matched/weak/proposed visual layers with source filters.
- Keep calculated journey geometry and stop markers always on top.
- Add editable match queues for stops, station complexes, routes, and operators.
- Add route-layer diff view after rebuilds.
P3: Additional Formats and Live Data
- Add NeTEx import.
- Add GTFS-Realtime ingestion for service alerts and trip updates.
- Add SIRI profile support where national APIs expose it.
- Add GBFS/shared mobility only after core public transport data is stable.
- Model temporary closures and disruptions as validity-windowed events, not modifications to base route geometry.
Open Optimization List
Not yet implemented, or only partially implemented:
- RAPTOR/CSA routing core.
- Precomputed public-transport transfer graph.
- Durable PostgreSQL route-search cache.
- In-flight identical search coalescing.
- Hub-aware long-distance routing.
- Local-transport-only routing profile.
- Access/egress legs excluded from transfer budget at the search-state level.
- Better pruning for off-course exploration and dominated labels.
- SQL/array-based canonical stop extraction.
- Incremental route-layer rebuild.
- Route-layer shadow tables/versioned activation.
- Stop-to-stop OSM route fallback for missing routes and detours.
- Detour/temporary variant classification.
- PostGIS-first OSM route candidate filtering.
- Vector tiles or PMTiles for large route layers.
- Alembic migrations.
- Persistent query/stage timing diagnostics.
Recommended Next Sprint
- Finish the route-layer rebuild currently in progress and verify route-pattern/trip-pattern link counts.
- Restart/reload the server so it picks up the current checkout fixes.
- Add route-search diagnostics and timing instrumentation around address access, direct, one-transfer, and round-search stages.
- Implement transfer graph precomputation and exclude access/egress walking from transfer count.
- Add a hub-aware city-to-city search path for long-distance requests.
- Add a local-only routing profile using route scopes.
- Convert route-layer rebuild to shadow/versioned tables or incremental updates.
- Add Alembic migrations and stop doing routine schema checks during normal app/worker startup.