# Product and Engineering Backlog Last updated: 2026-07-01 This backlog reflects the current Germany-scale PostGIS prototype. The target remains a Europe-scale mobility data workbench that builds canonical stops, stations, routes, route geometry, timetable links, transfer rules, routing graph data, address search, and coverage evidence from many public sources. OSM-derived geometry is the preferred visual authority. GTFS, NeTEx, realtime, and official APIs are timetable, validation, routing, and gap-detection inputs. GTFS shapes are still valuable evidence, especially for missing OSM relations and temporary detours. ## Current State - PostgreSQL/PostGIS is the active development database path; SQLite remains a legacy/test fallback. - Germany OSM and Germany GTFS/DELFI-scale imports are supported. - OSM address indexing is available and address search is bbox-aware without being bbox-limited. - Jobs and job events exist for imports, route matching, route-layer rebuilds, address indexing, relabeling, deletes, and maintenance. - Job rows expose a generic details overlay with planned/current/done phases, event log, metadata, and a compact queue snapshot. - A first QA dashboard skeleton exists for source discovery, import health, GTFS validation, canonical stop/link coverage, route matching, and publication readiness. - The GTFS harmonization target architecture is documented in `docs/gtfs_harmonization.md`. - GTFS source management is presented as a separate `GTFS Harmonization` UI module; OSM/map inputs are presented as a separate `Mapping Data` module. - Journey search consumes the active harmonized transit snapshot instead of exposing a raw GTFS source selector. - Route-layer rebuild runs through the queue, but it is still coarse-grained and can take minutes on national datasets. - The route-layer builder links canonical GTFS stops, OSM stops, OSM route relations, GTFS route patterns, and trip-pattern links. - Journey search is progressive and can publish intermediate results, but the underlying routing algorithm is still a prototype. - Walk and drive routing use the OSM-derived routing layer when available. ## Current Caveats - Journey search is not yet a full RAPTOR/CSA-style router. - Address endpoints can multiply the search space: current behavior can use up to 4 access stops and 4 egress stops, creating up to 16 transit stop-pair searches per transfer stage. - Progressive stages still recompute too much. Searching `up to 2 transfers` repeats direct and one-transfer work before deeper expansion. - Walking access/egress legs are represented separately in journey output, but the search engine still needs a cleaner transfer budget model where access/egress walking never consumes public-transport transfer count. - Route-search caches are in-process only. They do not survive server restart, do not deduplicate identical searches already running in another thread/process, and only help once a stage/search has completed. - Route-layer rebuild currently clears/rebuilds derived tables. Until the rebuild completes, visual route-pattern link tables can be incomplete. - Timetable reachability should not depend on visual route-pattern links. The code has been patched in this checkout, but a running server must reload before using that fix. - Canonical stop extraction on national feeds is CPU/memory heavy and does too much Python-side grouping. - OSM stop-linking and OSM route-candidate indexing are still large spatial/batch operations. - GTFS detours are not classified as first-class route variants yet. - Local-transport-only routing is not a first-class profile yet. - Proper Alembic migrations are still missing; runtime schema maintenance should be reduced to an explicit migration/maintenance path. - The source and job database tables are still shared between harmonization, mapping, and routing; the current split is a product/UI boundary, not a separate service or database boundary yet. ## P0: Routing Performance and Correctness These items directly address slow or failed searches such as `Berlin, Alexanderplatz` to `Heidelberg, Blumenstrasse 36`. - Replace the demo round-expansion router with a timetable-native algorithm. Preferred direction: RAPTOR or CSA over preloaded arrays/tables, with rounds representing public-transport boardings rather than ad hoc SQL expansion. - Precompute a transfer graph. Store station-internal transfers, nearby walking transfers, platform/stop-place links, and allowed transfer times by mode/source/station. - Separate access/egress from transfer count. Walking from an address to the first stop, and from the last stop to an address, should never count as a vehicle transfer. - Add a durable journey cache. Cache normalized requests, address-to-stop candidates, stop-to-stop stage results, common station-pair results, and in-flight request deduplication in PostgreSQL. - Add hub-aware long-distance routing. For long-distance OD pairs, search local access to likely hubs, trunk rail/regional candidates, then local egress. Candidate hubs can be ranked by station importance, service frequency, route scope, distance, and direction. - Add a local-transport-only profile. Implement a Deutschlandticket-like profile that excludes long-distance route scopes and still supports regional rail, S-Bahn, subway, tram, bus, ferry, and walking transfers. - Add admissible pruning. Bound exploration by best known arrival, remaining distance, direction/off-course penalty, transfer budget, service frequency, and maximum tolerated detour. - Add journey diagnostics. Return searched stages, candidate counts, pruned reasons, access/egress stops, service date, source feeds, transfer stops, and whether no-route means no timetable path or a search limit was hit. - Add arrive-by search. This is important for route quality and for comparing against operator/DB route planners. - Add route profile controls in the UI. `fastest`, `earliest arrival`, `fewest transfers`, `local only`, `walk`, `drive`, `arrive by`, `via`, `avoid`, and transfer buffer controls. ## P0: Queue and Rebuild Robustness - Move runtime schema maintenance out of normal app startup. The current checkout avoids redundant PostgreSQL DDL, but explicit migrations are still needed. - Add Alembic migrations. Use migrations for PostGIS columns, indexes, route-layer tables, routing tables, and cache tables. - Make route-layer rebuild use shadow tables or versioned rows. Build replacement rows without deleting the readable active layer first; atomically promote the new version when complete. - Make route-layer rebuild incremental. Rebuild only affected route patterns after new matches, stop-link decisions, source updates, or OSM diffs. - Add stale worker and stale pid reconciliation. Worker status should never report a pid as running unless the current server can verify it. - Improve cancellation. Long PostgreSQL statements need cancellable phases and visible progress rather than only a queued/running state. - Improve progress granularity and timings. The UI can display job events now, but long PostgreSQL statements still need finer checkpoints, elapsed times, estimated remaining work, and cancellable sub-phases. ## P1: Route Layer, Detours, and Geometry Provenance - Classify GTFS route variants. Group trips by route, direction, shape, stop sequence, service date span, and trip frequency. Mark rare/temporary shapes as detours or temporary variants rather than replacing the canonical visual route. - Add stop-by-stop OSM path fallback. When an OSM route relation is missing or a GTFS shape is a detour, assemble geometry between matched consecutive stops using mode-constrained OSM paths. - Cache stop-to-stop route geometry. Key by mode, from canonical stop, to canonical stop, direction constraints, and graph version. - Store geometry provenance per route pattern. Examples: `osm_route_relation`, `gtfs_shape`, `stop_to_stop_osm_path`, `manual_override`, `detour_variant`. - Respect directionality. Bus/car paths need oneway handling; tram/rail paths need topology and direction evidence; reverse links must not be assumed valid. - Add route-pattern detail inspection. Show OSM geometry, GTFS shapes, linked trips, linked stops, direction evidence, confidence, and variant/detour status. - Add generalized route geometries. Store high-detail inspection geometry and simplified map geometry. ## P1: Canonical Stops, Stations, and Addresses - Optimize canonical stop extraction. Push more grouping/linking into SQL, avoid loading all scheduled stops into Python, batch inserts, and keep stable canonical IDs when possible. - Build a canonical stop alias table. Persist normalized names, multilingual names, station codes, IBNR/EVA/UIC/IFOPT, stop_area IDs, OSM IDs, and source-specific aliases. - Improve station-complex modeling. Separate public stop place, station complex, platforms/tracks, entrances, bus bays, and nearby stop groups. - Add canonical stop detail overlay. Show linked GTFS stops, linked OSM stops/stations, source names, confidence, distances, and manual overrides. - Add manual canonical stop link/unlink decisions. Persist stop matching decisions like route matching decisions, so source updates do not overwrite reviewed links. - Improve address result folding. Prefer street-level suggestions for dense house-number ranges, but preserve exact address selection when a full address is typed. - Precompute address access candidates. Store nearest useful public-transport stops per address/street point, with mode/source/radius metadata. ## P1: More GTFS Sources and Deduplication - Import more GTFS feeds where they improve authority or coverage. DB long-distance/regional feeds, state feeds, and neighboring-country feeds touching Germany are useful test cases. - Add source priority and authority ranking. Decide which source is more authoritative for stops, operators, routes, calendars, and geometry evidence. - Deduplicate operators/agencies. Merge agency/operator records with provenance and aliases instead of treating each GTFS `agency.txt` row as a separate operator. - Turn QA summary counters into review queues. Drill down from each bad/warn metric into concrete sources, stops, routes, links, and conflicts. - Add GTFS feed QA reports. Calendar coverage, stale feeds, missing shapes, impossible stop times, duplicate routes, route direction coverage, stop coordinate outliers. - Add conflict dashboards and reusable resolution workflows. Show canonical stops/routes with competing source claims, weak matches, missing visual geometry, authority-rule conflicts, and license blockers. ## P1: Scalable OSM and Map Outputs - Keep OSM PBF import chunked and resumable. Keep previous active visual datasets available while the next import builds. - Add vector tile or PMTiles export. Needed for Germany/Europe route layers and dense editing views. - Add route-scope and mode-specific map generalization. Different zooms should use different detail levels and route classes. - Improve OSM route candidate indexing. Use stronger SQL/PostGIS filtering before loading route geometry into Python. - Add OSM diffs later. Minutely/hourly/daily diffs can update route and address layers without full country rebuilds. ## P2: Data Platform Hardening - Add explicit read/write transaction boundaries for all long requests and jobs. - Add API pagination for large result sets. - Add import logs and source-run history. - Add database maintenance commands: analyze, vacuum, reindex, orphan cleanup. - Add test fixtures that do not mutate the live development database. - Add observability: query timings, job timings, row counts, cache hit rates, and per-stage routing metrics. ## P2: Better Map and Editing Workflows - Add canonical stop and route detail side panels. - Add candidate map preview for stop matching, not only route matching. - Add unmatched/matched/weak/proposed visual layers with source filters. - Keep calculated journey geometry and stop markers always on top. - Add editable match queues for stops, station complexes, routes, and operators. - Add route-layer diff view after rebuilds. ## P3: Additional Formats and Live Data - Add NeTEx import. - Add GTFS-Realtime ingestion for service alerts and trip updates. - Add SIRI profile support where national APIs expose it. - Add GBFS/shared mobility only after core public transport data is stable. - Model temporary closures and disruptions as validity-windowed events, not modifications to base route geometry. ## Open Optimization List Not yet implemented, or only partially implemented: - RAPTOR/CSA routing core. - Precomputed public-transport transfer graph. - Durable PostgreSQL route-search cache. - In-flight identical search coalescing. - Hub-aware long-distance routing. - Local-transport-only routing profile. - Access/egress legs excluded from transfer budget at the search-state level. - Better pruning for off-course exploration and dominated labels. - SQL/array-based canonical stop extraction. - Incremental route-layer rebuild. - Route-layer shadow tables/versioned activation. - Stop-to-stop OSM route fallback for missing routes and detours. - Detour/temporary variant classification. - PostGIS-first OSM route candidate filtering. - Vector tiles or PMTiles for large route layers. - Alembic migrations. - Persistent query/stage timing diagnostics. ## Recommended Next Sprint 1. Finish the route-layer rebuild currently in progress and verify route-pattern/trip-pattern link counts. 2. Restart/reload the server so it picks up the current checkout fixes. 3. Add route-search diagnostics and timing instrumentation around address access, direct, one-transfer, and round-search stages. 4. Implement transfer graph precomputation and exclude access/egress walking from transfer count. 5. Add a hub-aware city-to-city search path for long-distance requests. 6. Add a local-only routing profile using route scopes. 7. Convert route-layer rebuild to shadow/versioned tables or incremental updates. 8. Add Alembic migrations and stop doing routine schema checks during normal app/worker startup.