Alpha stage commit
This commit is contained in:
202
docs/backlog.md
Normal file
202
docs/backlog.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# Product and Engineering Backlog
|
||||
|
||||
Last updated: 2026-07-01
|
||||
|
||||
This backlog reflects the current Germany-scale PostGIS prototype. The target remains a Europe-scale mobility data workbench that builds canonical stops, stations, routes, route geometry, timetable links, transfer rules, routing graph data, address search, and coverage evidence from many public sources.
|
||||
|
||||
OSM-derived geometry is the preferred visual authority. GTFS, NeTEx, realtime, and official APIs are timetable, validation, routing, and gap-detection inputs. GTFS shapes are still valuable evidence, especially for missing OSM relations and temporary detours.
|
||||
|
||||
## Current State
|
||||
|
||||
- PostgreSQL/PostGIS is the active development database path; SQLite remains a legacy/test fallback.
|
||||
- Germany OSM and Germany GTFS/DELFI-scale imports are supported.
|
||||
- OSM address indexing is available and address search is bbox-aware without being bbox-limited.
|
||||
- Jobs and job events exist for imports, route matching, route-layer rebuilds, address indexing, relabeling, deletes, and maintenance.
|
||||
- Job rows expose a generic details overlay with planned/current/done phases, event log, metadata, and a compact queue snapshot.
|
||||
- A first QA dashboard skeleton exists for source discovery, import health, GTFS validation, canonical stop/link coverage, route matching, and publication readiness.
|
||||
- The GTFS harmonization target architecture is documented in `docs/gtfs_harmonization.md`.
|
||||
- GTFS source management is presented as a separate `GTFS Harmonization` UI module; OSM/map inputs are presented as a separate `Mapping Data` module.
|
||||
- Journey search consumes the active harmonized transit snapshot instead of exposing a raw GTFS source selector.
|
||||
- Route-layer rebuild runs through the queue, but it is still coarse-grained and can take minutes on national datasets.
|
||||
- The route-layer builder links canonical GTFS stops, OSM stops, OSM route relations, GTFS route patterns, and trip-pattern links.
|
||||
- Journey search is progressive and can publish intermediate results, but the underlying routing algorithm is still a prototype.
|
||||
- Walk and drive routing use the OSM-derived routing layer when available.
|
||||
|
||||
## Current Caveats
|
||||
|
||||
- Journey search is not yet a full RAPTOR/CSA-style router.
|
||||
- Address endpoints can multiply the search space: current behavior can use up to 4 access stops and 4 egress stops, creating up to 16 transit stop-pair searches per transfer stage.
|
||||
- Progressive stages still recompute too much. Searching `up to 2 transfers` repeats direct and one-transfer work before deeper expansion.
|
||||
- Walking access/egress legs are represented separately in journey output, but the search engine still needs a cleaner transfer budget model where access/egress walking never consumes public-transport transfer count.
|
||||
- Route-search caches are in-process only. They do not survive server restart, do not deduplicate identical searches already running in another thread/process, and only help once a stage/search has completed.
|
||||
- Route-layer rebuild currently clears/rebuilds derived tables. Until the rebuild completes, visual route-pattern link tables can be incomplete.
|
||||
- Timetable reachability should not depend on visual route-pattern links. The code has been patched in this checkout, but a running server must reload before using that fix.
|
||||
- Canonical stop extraction on national feeds is CPU/memory heavy and does too much Python-side grouping.
|
||||
- OSM stop-linking and OSM route-candidate indexing are still large spatial/batch operations.
|
||||
- GTFS detours are not classified as first-class route variants yet.
|
||||
- Local-transport-only routing is not a first-class profile yet.
|
||||
- Proper Alembic migrations are still missing; runtime schema maintenance should be reduced to an explicit migration/maintenance path.
|
||||
- The source and job database tables are still shared between harmonization, mapping, and routing; the current split is a product/UI boundary, not a separate service or database boundary yet.
|
||||
|
||||
## P0: Routing Performance and Correctness
|
||||
|
||||
These items directly address slow or failed searches such as `Berlin, Alexanderplatz` to `Heidelberg, Blumenstrasse 36`.
|
||||
|
||||
- Replace the demo round-expansion router with a timetable-native algorithm.
|
||||
Preferred direction: RAPTOR or CSA over preloaded arrays/tables, with rounds representing public-transport boardings rather than ad hoc SQL expansion.
|
||||
- Precompute a transfer graph.
|
||||
Store station-internal transfers, nearby walking transfers, platform/stop-place links, and allowed transfer times by mode/source/station.
|
||||
- Separate access/egress from transfer count.
|
||||
Walking from an address to the first stop, and from the last stop to an address, should never count as a vehicle transfer.
|
||||
- Add a durable journey cache.
|
||||
Cache normalized requests, address-to-stop candidates, stop-to-stop stage results, common station-pair results, and in-flight request deduplication in PostgreSQL.
|
||||
- Add hub-aware long-distance routing.
|
||||
For long-distance OD pairs, search local access to likely hubs, trunk rail/regional candidates, then local egress. Candidate hubs can be ranked by station importance, service frequency, route scope, distance, and direction.
|
||||
- Add a local-transport-only profile.
|
||||
Implement a Deutschlandticket-like profile that excludes long-distance route scopes and still supports regional rail, S-Bahn, subway, tram, bus, ferry, and walking transfers.
|
||||
- Add admissible pruning.
|
||||
Bound exploration by best known arrival, remaining distance, direction/off-course penalty, transfer budget, service frequency, and maximum tolerated detour.
|
||||
- Add journey diagnostics.
|
||||
Return searched stages, candidate counts, pruned reasons, access/egress stops, service date, source feeds, transfer stops, and whether no-route means no timetable path or a search limit was hit.
|
||||
- Add arrive-by search.
|
||||
This is important for route quality and for comparing against operator/DB route planners.
|
||||
- Add route profile controls in the UI.
|
||||
`fastest`, `earliest arrival`, `fewest transfers`, `local only`, `walk`, `drive`, `arrive by`, `via`, `avoid`, and transfer buffer controls.
|
||||
|
||||
## P0: Queue and Rebuild Robustness
|
||||
|
||||
- Move runtime schema maintenance out of normal app startup.
|
||||
The current checkout avoids redundant PostgreSQL DDL, but explicit migrations are still needed.
|
||||
- Add Alembic migrations.
|
||||
Use migrations for PostGIS columns, indexes, route-layer tables, routing tables, and cache tables.
|
||||
- Make route-layer rebuild use shadow tables or versioned rows.
|
||||
Build replacement rows without deleting the readable active layer first; atomically promote the new version when complete.
|
||||
- Make route-layer rebuild incremental.
|
||||
Rebuild only affected route patterns after new matches, stop-link decisions, source updates, or OSM diffs.
|
||||
- Add stale worker and stale pid reconciliation.
|
||||
Worker status should never report a pid as running unless the current server can verify it.
|
||||
- Improve cancellation.
|
||||
Long PostgreSQL statements need cancellable phases and visible progress rather than only a queued/running state.
|
||||
- Improve progress granularity and timings.
|
||||
The UI can display job events now, but long PostgreSQL statements still need finer checkpoints, elapsed times, estimated remaining work, and cancellable sub-phases.
|
||||
|
||||
## P1: Route Layer, Detours, and Geometry Provenance
|
||||
|
||||
- Classify GTFS route variants.
|
||||
Group trips by route, direction, shape, stop sequence, service date span, and trip frequency. Mark rare/temporary shapes as detours or temporary variants rather than replacing the canonical visual route.
|
||||
- Add stop-by-stop OSM path fallback.
|
||||
When an OSM route relation is missing or a GTFS shape is a detour, assemble geometry between matched consecutive stops using mode-constrained OSM paths.
|
||||
- Cache stop-to-stop route geometry.
|
||||
Key by mode, from canonical stop, to canonical stop, direction constraints, and graph version.
|
||||
- Store geometry provenance per route pattern.
|
||||
Examples: `osm_route_relation`, `gtfs_shape`, `stop_to_stop_osm_path`, `manual_override`, `detour_variant`.
|
||||
- Respect directionality.
|
||||
Bus/car paths need oneway handling; tram/rail paths need topology and direction evidence; reverse links must not be assumed valid.
|
||||
- Add route-pattern detail inspection.
|
||||
Show OSM geometry, GTFS shapes, linked trips, linked stops, direction evidence, confidence, and variant/detour status.
|
||||
- Add generalized route geometries.
|
||||
Store high-detail inspection geometry and simplified map geometry.
|
||||
|
||||
## P1: Canonical Stops, Stations, and Addresses
|
||||
|
||||
- Optimize canonical stop extraction.
|
||||
Push more grouping/linking into SQL, avoid loading all scheduled stops into Python, batch inserts, and keep stable canonical IDs when possible.
|
||||
- Build a canonical stop alias table.
|
||||
Persist normalized names, multilingual names, station codes, IBNR/EVA/UIC/IFOPT, stop_area IDs, OSM IDs, and source-specific aliases.
|
||||
- Improve station-complex modeling.
|
||||
Separate public stop place, station complex, platforms/tracks, entrances, bus bays, and nearby stop groups.
|
||||
- Add canonical stop detail overlay.
|
||||
Show linked GTFS stops, linked OSM stops/stations, source names, confidence, distances, and manual overrides.
|
||||
- Add manual canonical stop link/unlink decisions.
|
||||
Persist stop matching decisions like route matching decisions, so source updates do not overwrite reviewed links.
|
||||
- Improve address result folding.
|
||||
Prefer street-level suggestions for dense house-number ranges, but preserve exact address selection when a full address is typed.
|
||||
- Precompute address access candidates.
|
||||
Store nearest useful public-transport stops per address/street point, with mode/source/radius metadata.
|
||||
|
||||
## P1: More GTFS Sources and Deduplication
|
||||
|
||||
- Import more GTFS feeds where they improve authority or coverage.
|
||||
DB long-distance/regional feeds, state feeds, and neighboring-country feeds touching Germany are useful test cases.
|
||||
- Add source priority and authority ranking.
|
||||
Decide which source is more authoritative for stops, operators, routes, calendars, and geometry evidence.
|
||||
- Deduplicate operators/agencies.
|
||||
Merge agency/operator records with provenance and aliases instead of treating each GTFS `agency.txt` row as a separate operator.
|
||||
- Turn QA summary counters into review queues.
|
||||
Drill down from each bad/warn metric into concrete sources, stops, routes, links, and conflicts.
|
||||
- Add GTFS feed QA reports.
|
||||
Calendar coverage, stale feeds, missing shapes, impossible stop times, duplicate routes, route direction coverage, stop coordinate outliers.
|
||||
- Add conflict dashboards and reusable resolution workflows.
|
||||
Show canonical stops/routes with competing source claims, weak matches, missing visual geometry, authority-rule conflicts, and license blockers.
|
||||
|
||||
## P1: Scalable OSM and Map Outputs
|
||||
|
||||
- Keep OSM PBF import chunked and resumable.
|
||||
Keep previous active visual datasets available while the next import builds.
|
||||
- Add vector tile or PMTiles export.
|
||||
Needed for Germany/Europe route layers and dense editing views.
|
||||
- Add route-scope and mode-specific map generalization.
|
||||
Different zooms should use different detail levels and route classes.
|
||||
- Improve OSM route candidate indexing.
|
||||
Use stronger SQL/PostGIS filtering before loading route geometry into Python.
|
||||
- Add OSM diffs later.
|
||||
Minutely/hourly/daily diffs can update route and address layers without full country rebuilds.
|
||||
|
||||
## P2: Data Platform Hardening
|
||||
|
||||
- Add explicit read/write transaction boundaries for all long requests and jobs.
|
||||
- Add API pagination for large result sets.
|
||||
- Add import logs and source-run history.
|
||||
- Add database maintenance commands: analyze, vacuum, reindex, orphan cleanup.
|
||||
- Add test fixtures that do not mutate the live development database.
|
||||
- Add observability: query timings, job timings, row counts, cache hit rates, and per-stage routing metrics.
|
||||
|
||||
## P2: Better Map and Editing Workflows
|
||||
|
||||
- Add canonical stop and route detail side panels.
|
||||
- Add candidate map preview for stop matching, not only route matching.
|
||||
- Add unmatched/matched/weak/proposed visual layers with source filters.
|
||||
- Keep calculated journey geometry and stop markers always on top.
|
||||
- Add editable match queues for stops, station complexes, routes, and operators.
|
||||
- Add route-layer diff view after rebuilds.
|
||||
|
||||
## P3: Additional Formats and Live Data
|
||||
|
||||
- Add NeTEx import.
|
||||
- Add GTFS-Realtime ingestion for service alerts and trip updates.
|
||||
- Add SIRI profile support where national APIs expose it.
|
||||
- Add GBFS/shared mobility only after core public transport data is stable.
|
||||
- Model temporary closures and disruptions as validity-windowed events, not modifications to base route geometry.
|
||||
|
||||
## Open Optimization List
|
||||
|
||||
Not yet implemented, or only partially implemented:
|
||||
|
||||
- RAPTOR/CSA routing core.
|
||||
- Precomputed public-transport transfer graph.
|
||||
- Durable PostgreSQL route-search cache.
|
||||
- In-flight identical search coalescing.
|
||||
- Hub-aware long-distance routing.
|
||||
- Local-transport-only routing profile.
|
||||
- Access/egress legs excluded from transfer budget at the search-state level.
|
||||
- Better pruning for off-course exploration and dominated labels.
|
||||
- SQL/array-based canonical stop extraction.
|
||||
- Incremental route-layer rebuild.
|
||||
- Route-layer shadow tables/versioned activation.
|
||||
- Stop-to-stop OSM route fallback for missing routes and detours.
|
||||
- Detour/temporary variant classification.
|
||||
- PostGIS-first OSM route candidate filtering.
|
||||
- Vector tiles or PMTiles for large route layers.
|
||||
- Alembic migrations.
|
||||
- Persistent query/stage timing diagnostics.
|
||||
|
||||
## Recommended Next Sprint
|
||||
|
||||
1. Finish the route-layer rebuild currently in progress and verify route-pattern/trip-pattern link counts.
|
||||
2. Restart/reload the server so it picks up the current checkout fixes.
|
||||
3. Add route-search diagnostics and timing instrumentation around address access, direct, one-transfer, and round-search stages.
|
||||
4. Implement transfer graph precomputation and exclude access/egress walking from transfer count.
|
||||
5. Add a hub-aware city-to-city search path for long-distance requests.
|
||||
6. Add a local-only routing profile using route scopes.
|
||||
7. Convert route-layer rebuild to shadow/versioned tables or incremental updates.
|
||||
8. Add Alembic migrations and stop doing routine schema checks during normal app/worker startup.
|
||||
Reference in New Issue
Block a user