update readme and addresses preprocessing

This commit is contained in:
Will Bradley 2022-09-01 13:12:13 -07:00
parent 4f7b9ce93a
commit 1474ebe2f0
9 changed files with 93322 additions and 48 deletions

View File

@ -1,31 +1,47 @@
* Open the original data in QGIS * Open the original data in QGIS
* Format OSM fields with QGIS functions to have proper capitalization and full spellings without extraneous whitespace, based on original fields * Format OSM fields with QGIS functions to have proper capitalization and full spellings without extraneous whitespace, based on original fields. For example OSM uses names like North Main Street, not N MAIN ST. All fields are of the QGIS type "text" even if they're numbers.
* You can use the Attribute Table's Field Calculator for this; you can copy-paste the `qgis-functions.py` file into the Function Editor and then use the Expression tab to create new, formatted virtual fields. Don't worry if the field name limit is too short, it can be fixed in JOSM. * You can use the Attribute Table's Field Calculator for this; you can copy-paste the `qgis-functions.py` file into the Function Editor and then use the Expression tab to create new, formatted virtual fields. Don't worry if the field name limit is too short, it can be fixed in JOSM.
* For addresses: * For addresses:
* addr:housenumber * The Addresses shapefile is recorded in the ESRI:102659 CRS, you may need to convert or reproject to/from.
* addr:street expanding N MAIN ST to North Main Street * `ADD_NUM` becomes the virtual `addr:housenumber` (or `addr:house` temporarily, avoiding addr:house which is a real tag)
* addr:city * `SADD` becomes the virtual `addr:street` (or `addr:stree` temporarily) via the `getformattedstreetnamefromaddress( "SADD")` custom expression
* addr:postcode * `POST_COMM` becomes the virual `addr:city` via the `title("POST_COMM")` expression (we care about postal community addresses not what municipality a place might be governed by)
* `POST_CODE` becomes `addr:postcode` (or `addr:postc` temporarily)
* For roads: * For roads:
* name (with nice casing and spelling like North Main Street not N MAIN ST) * `NAME` becomes the virtual `name` via the `getformattedstreetname("NAME")`
* `surface=asphalt` * `surface=asphalt` added manually in JOSM
* `maxspeed=10 mph` * `maxspeed=10 mph` added manually in JOSM
* Export to Geojson or Shapefile * Export to Geojson, **selecting only the OSM-formatted fields we want** and deselecting the rest.
* Open in JSOM * Ensure the export file is in the `EPSG:4326 - WGS84` CRS.
* Select and remove all relations from the geojson/shapefile layer: the data often has one relation per road and this is improper for OSM import. * Open in JSOM. It's suggested to begin with roads first, addresses second, so the addresses can be placed in context.
* Highlight a small region to work on: one neighborhood or smaller. For this import, we are assuming that only newly-constructed small residential areas will be imported, not main roads or commercial areas or areas with significant existing map data. * In the Roads dataset, select and remove all relations from the geojson/shapefile layer: the data often has one relation per road and this is improper for OSM import.
* Select a small region to work on: one neighborhood or smaller. For this import, we are assuming that only newly-constructed small residential areas will be imported, not main roads or commercial areas or areas with significant existing map data.
* Download the area you're working on from OSM, into a new Data Layer (not your geojson layer.)
* Select all features to be imported at this time and leave them selected until the merge step below.
* Select all ways for roads, or all nodes for addresses. Make sure you aren't about to mass-edit the nodes of a road: deselect the nodes if this happens. * Select all ways for roads, or all nodes for addresses. Make sure you aren't about to mass-edit the nodes of a road: deselect the nodes if this happens.
* Ensure the tags are correct and good. (QGIS has a character limit and sometimes doesn't like colons.) * Ensure the tags are correct and good. (QGIS has a character limit and sometimes doesn't like colons, so double check that `addr:house` is `addr:housenumber`, `addr:postc` is `addr:postcode`, `addr:stree` is `addr:street`, etc.)
* Add tags like highway=residential, surface=asphalt, etc, as needed. * Mass-add new tags like `highway=residential`, `surface=asphalt`, etc, as indicated.
* Remove any spurious tags that may have been brought over in the import (if it's not in the OSM Wiki, we don't want it.) * Remove any spurious tags that may have been brought over in the import (if it's not in the OSM Wiki, we don't want it.)
* Press ctrl-shift-M to merge into the OSM data layer. There will be a warning, but click OK; we will be extra careful about validating the merge in the next steps.
* For addresses, remove any address nodes that seem to not reflect reality or be placed far from the street bearing their name: it's better to not have 123 Adams Street mapped at all, than to claim that 123 Adams Street is hovering over someone's newly-built house at 321 Franklin Avenue, 200 feet away from Adams Street. (Cities often won't remove old addresses, leading to confusion when new streets are built.)
* For roads, highlight multiple street segments which have the same name and press C to combine them: the county data has one way per road segment and that's excessive for OSM. * For roads, highlight multiple street segments which have the same name and press C to combine them: the county data has one way per road segment and that's excessive for OSM.
* Download the area you're working on from OSM, into a new Data Layer.
* Highlight all features to be imported at this time and press ctrl-shift-M to merge into the OSM data layer
* Check the edges of the imported areas to ensure new roads are merged with any preexisting roads * Check the edges of the imported areas to ensure new roads are merged with any preexisting roads
* Check the import area to ensure no incorrect overlaps * Check the import area to ensure no incorrect overlaps
* Use the JOSM validator to ensure no errors in imported data * Use the JOSM validator to ensure no errors in imported data. Warnings about existing data separate from the import can be ignored.
* If there are duplicate house numbers in the data, investigate and remove the more-unlikely node or both nodes. For example `4650 Ramsell Road` is duplicated in the source data, but the easternmost copy is on the "odd" side of the street and between 4653 and 4663 so it's more likely to actually be 4651, 4655, 4657, 4659, or 4661. We have no way of knowing, so we can either delete it entirely or simply delete the housenumber tag and leave it as an address without a number for a future editor to review. (We may submit incomplete data, just not wrong data.) We then leave the westernmost copy alone since 4650 fits neatly in between 4640/4644 and 4654/4660.
* Click upload * Click upload
* Make sure there are no erroneous Relations or other unwanted objects about to be uploaded. * Make sure there are no erroneous Relations or other unwanted objects about to be uploaded.
* Use a descriptive changeset message like "Roads/Addresses in The Villages #villagesimport" * Use a descriptive changeset message like "Roads/Addresses in The Villages #villagesimport"
* Set the Source to be "Sumter County GIS" * Set the Source to be "Sumter County GIS"
* Review imported data in Achavi or Osmcha to ensure it looks proper * You can easily copy-paste the below into the Settings tab:
```
comment=Roads/Addresses in The Villages #villagesimport
import=yes
website=https://wiki.openstreetmap.org/wiki/The_Villages_Road_and_Address_Import
source=Sumter County GIS
source:url=https://gitlab.com/zyphlar/the-villages-import
```
* Review imported data in Achavi or Osmcha to ensure it looks proper.

View File

@ -1 +0,0 @@
UTF-8

Binary file not shown.

View File

@ -1 +0,0 @@
GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]]

View File

@ -1,26 +0,0 @@
<!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd' 'SYSTEM'>
<qgis version="3.22.10-Białowieża">
<identifier></identifier>
<parentidentifier></parentidentifier>
<language></language>
<type>dataset</type>
<title></title>
<abstract></abstract>
<links/>
<fees></fees>
<encoding></encoding>
<crs>
<spatialrefsys>
<wkt></wkt>
<proj4></proj4>
<srsid>0</srsid>
<srid>0</srid>
<authid></authid>
<description></description>
<projectionacronym></projectionacronym>
<ellipsoidacronym></ellipsoidacronym>
<geographicflag>false</geographicflag>
</spatialrefsys>
</crs>
<extent/>
</qgis>

Binary file not shown.

Binary file not shown.

File diff suppressed because it is too large Load Diff

View File

@ -1,14 +1,42 @@
from qgis.core import * from qgis.core import *
from qgis.gui import * from qgis.gui import *
import re
@qgsfunction(args='auto', group='Custom', referenced_columns=[]) @qgsfunction(args='auto', group='Custom', referenced_columns=[])
def getstreetfromaddress(value1, feature, parent): def getformattedstreetname(value1, feature, parent):
parts = value1.split()
parts = map(formatstreetname, parts)
return " ".join(parts)
@qgsfunction(args='auto', group='Custom', referenced_columns=[])
def getformattedstreetnamefromaddress(value1, feature, parent):
parts = value1.split() parts = value1.split()
parts.pop(0) # Ignore the first bit (i.e. "123" in "123 N MAIN ST") parts.pop(0) # Ignore the first bit (i.e. "123" in "123 N MAIN ST")
parts = map(formatstreetname, parts) parts = map(formatstreetname, parts)
return " ".join(parts) return " ".join(parts)
def formatstreetname(name): def formatstreetname(name):
# Specific suffixes like "123th" we have lower
if re.search("[0-9]+TH", name):
return name.capitalize()
if re.search("[0-9]+ND", name):
return name.capitalize()
if re.search("[0-9]+ST", name):
return name.capitalize()
if re.search("[0-9]+RD", name):
return name.capitalize()
# Weird names like 123D we keep upper
if re.search("[0-9]+[A-Z]+", name):
return name
# Prefixes we want to keep uppercase
if name == "US":
return "US"
if name == "SR":
return "SR"
if name == "CR":
return "CR"
if name == "C":
return "C"
# Directions # Directions
if name == "N": if name == "N":
return "North" return "North"
@ -71,4 +99,9 @@ def formatstreetname(name):
return "Way" return "Way"
if name == "XING": if name == "XING":
return "Crossing" return "Crossing"
# Irish names
if name == "MCCRAY":
return "McCray"
if name == "MCKOWN":
return "McKown"
return name.capitalize() return name.capitalize()