Sort a polygon feature class spatially using Python

I was manually adjusting a custom grid for a map series this morning, a task which I have not had to do for several years now, when the time came to export the maps. By this point, my grid numbers were completely messed up because of all the shifting and reshaping I had to do.

It would make sense to number the grid from left to right, top to bottom. However, the ability to spatially sort a feature class in ArcGIS is hidden behind an Advanced Licence in the Sort tool.

License: For the Field(s) parameter, sorting by the Shape field or by multiple fields is only available with an Desktop Advanced license. Sorting by any single attribute field (excluding Shape) is available at all license levels.

I had this prickly feeling in the back of my head that I had found a workaround for this before. I went searching through the blog (since I have posted about things twice because I forgot about the first post), but I didn’t find it.

Undeterred, I searched through my gists (unrelated note – why does using the “user:<username> <search term> ” trick in the GitHub search bar now flag as spam?!) and I found it!

	#
	# @date 13/07/2015
	# @author Cindy Williams
	#
	# Sorts a polygon feature class by descending X and ascending Y,
	# to obtain the spatial order of the polys from left to right,
	# top to bottom. Bypasses the Advanced licence needed to
	# do this in the Sort tool.
	#
	# For use in the Python window in ArcMap.
	#

	import arcpy

	mxd = arcpy.mapping.MapDocument("CURRENT")
	lyr = arcpy.mapping.ListLayers(mxd, "DDP")[0]

	# Get the extent of each feature, and store as tuple (name, x coord of top right, y coord of top right)
	ddp = [(row[0], row[1].extent.XMax, row[1].extent.YMax) for row in arcpy.da.SearchCursor(lyr, ("Name", "SHAPE@"))]

	# Sort by x descending
	ddp.sort(key=lambda row:row[1], reverse=True)

	# Sort by y ascending
	ddp.sort(key=lambda row:row[2])

	# Reverse the list to get the L->R, T->B order
	dct = {}
	# Must still be optimised
	for i, j in enumerate(reversed(ddp)):
	# Key is name, value is index + 1
	dct[j[0]] = i + 1

	# Write values back to feature class
	with arcpy.da.UpdateCursor(lyr, ("Name", "MapNum")) as cursor:
	for row in cursor:
	row[1] = dct[row[0]]
	cursor.updateRow(row)

view raw

sortFeatureClassByExtent.py

hosted with ❤ by GitHub

Link for mobile users is here.

I’m pretty sure there is a much better way to do this, but I need this so rarely that I don’t see the need to rewrite it. It also would work much better as a function. It takes a polygon feature class and sorts the selected features from left to right, top to bottom.

Description of this hastily drawn, blurry PowerPoint graphic: Chaotic, unsorted grid on the left. Slightly less chaotic, sorted grid on the right.

A rant about attachments

I remember when attachments were first introduced in ArcGIS Desktop (~~10.2 I think?~~ whoops it was ArcGIS 10). It was a very useful feature, and more functionality was added over the years.

It also made mobile data capture even easier. The fieldworkers would go out, do their assessments, and attach multiple photos to their points. However, attachments with Collector has caused me so much frustration. Specifically, syncing with attachments.

The nature of the work we do (and the economic environment we are in) means that by default, I take the maps offline so that the fieldworkers can carry out their assessments, and then sync back to AGOL when they are on lunch break (or whenever they can pick up WiFi). I discovered a few years ago that once one hits a certain threshold (like 20 attachments in the map), there are going to be problems syncing.

It will just outright fail, or take very long and may need to be attempted a number of times. Why is this? I don’t know. Over the years, I’ve encountered this issue on all types of devices – the latest iPhones, low-end Android tablets, high-end Android tablets, mid-range Android phones…

What it seems like to me is that Collector “expects” a certain connection speed, and when it doesn’t get it, it times out and rolls back the sync. Fair enough – I’ve found multiple delta tables on devices I’ve needed to recover the databases from due to failed sync attempts. On a current project, they are using rugged devices which have really awful network chips (as in, I need to stand about 1 or 2m away from the access point so that I can take the maps offline). Naturally, at the end of the first day, each device had dozens of features with multiple attachments each, which refused to sync.

They have been out in the field for 2 weeks. Everyday, I have to manually retrieve the databases from the device, recover them, and push them out into appropriate geodatabases once I’ve determined what’s inside them.

So clear

I can deal with all of that, because Python is a tool that I maaay have mentioned here before. What I cannot deal with is the fact that attachments are still lost during geoprocessing. The fact that it was added as an environment setting in ArcGIS 10.5 and has been available in ArcGIS Pro for a while is of little comfort to me as I currently have access to neither.

Fine. I store the GlobalIDs in another field, merge the features together into their correct feature classes, enable attachments and insert the records from the corresponding attachment tables. Of course, I forget that the relationship class is now messed up, as it’s linking through the (now incorrect) GlobalID fields instead of the fields I stored the original IDs in.

After staring at the screen cross-eyed, I then realise that I only need to provide the attachments as jpgs in a folder, which I can extract from the tables using the original IDs and write into subfolders based on the feature type. I don’t actually need to link them back together since the technician does not need to view the photos to complete the work in ArcMap. /endrant

Inserting spaces into CamelCase

I have some trauma related to regular expressions from my university days. Yes, they are super useful. Do I enjoy using them? Does anyone?

Image result for regular expressions meme

I routinely modify and create new instances of the AMIS GIS data model. Depending on the client and the type of assets they have, there can be 4 feature datasets, containing about 5 feature classes each, or 9 feature datasets with up to 50 feature classes spread across the database.

Inevitably at some point in the process of losing my mind, I will forget that the alias is lost when creating a new feature class, or randomly copying it over or whatever. I then end up with dozens of feature classes with really ugly looking abbreviated layer names in CamelCase. So for posterity, and so that I will never ever forget this ridiculously easy thing again:

	#
	# @date 05/10/2017
	# @author Cindy Jayakumar
	#
	# Change the alias of feature classes to the
	# title case version of the CamelCase name
	#
	# e.g. WaterPumpStation => Water Pump Station

	import arcpy
	import re

	gdb = r"C:\Some\Arb\Folder\test.gdb"

	def alterFCAlias(name):
	return re.sub(r"\B([A-Z])", r" \1", name)

	for root, _, fcs in arcpy.da.Walk(gdb):
	for fc in fcs:
	alias = alterFCAlias(fc)
	arcpy.AlterAliasName(fc, alias )
	print("{0} => {1}".format(fc, alias))

view raw

alterCamelCaseAlias.py

hosted with ❤ by GitHub

	# @credit https://stackoverflow.com/a/199215

	import re

	text = "ThisIsATest"
	re.sub(r"\B([A-Z])", r" \1", text)

view raw

insertSpaceBetweenCamelCase.py

hosted with ❤ by GitHub

One thing I’ve omitted from the script is that I usually store the feature class names with a prefix, e.g. fc_name = wps_WaterPumpStation. In this case, I would use split on the feature class name before passing it to the alterFCAlias function i.e. fc_name.split("_")[1].

Link for mobile users is here.

Aaaaaaand I’ve just realised that I’ve forgotten this basic task so many times over the last few years that I actually already have a blog post about it, except for some reason I was updating the layer names in ArcMap every time instead of resetting it once on the feature classes themselves.

Image result for what were you thinking meme

Create points without XY from a table in ArcPy

I had a point feature class containing features with unique names. I also had a table containing hundreds of records, each “linking” to the point feature classes via name. Normally, I could create points for the records by using Make Query Table.

I say “linking” because from looking at the data, I could see which records belonged to which point, but unfortunately there were spelling mistakes and different naming conventions e.g. if the point was called “ABC Sewage Treatment Works”, some of its matching records in the table would be “ABC WWTP”, “AB-C Waste Water Treatment Plant”.

	'''
	@author Cindy Jayakumar
	@date 31/01/2017

	– Inserts a new point with the selected geometry in to_lyr
	– Adds attributes to that point from from_lyr
	– Updates the record in from_lyr

	Uses selected features

	For the python window in ArcMap
	'''

	import arcpy

	from_lyr = r'C:\Some\Arb\Folder\work.gdb\tbl_test'
	mxd = arcpy.mapping.MapDocument("CURRENT")
	out_lyr = arcpy.mapping.ListLayers(mxd, "lyr")[0]

	'''
	– Select the single point in to_lyr that will have its geometry duplicated
	– Select the rows in from_lyr that you want to be inserted into to_lyr
	'''
	def duplicateAssets(to_lyr):
	# Matching point is selected in to_lyr
	geom = [geo[0] for geo in arcpy.da.SearchCursor(to_lyr, "SHAPE@")][0]
	# Create insert cursor on same layer
	inscursor = arcpy.da.InsertCursor(to_lyr, ("SHAPE@", "Unique_ID"))

	with arcpy.da.UpdateCursor(from_lyr, ("Unique_ID", "Matched")) as cursor:
	for row in cursor:
	# Build the new point to be inserted
	point = [geom, row[0]]
	inscursor.insertRow(point)
	print("Inserted " +str(int(row[0])))
	# Update the record with the name of the layer the point was inserted into
	row[1] = to_lyr.datasetName
	cursor.updateRow(row)
	del inscursor

	duplicatePoint(out_lyr)

view raw

duplicatePoint.py

hosted with ❤ by GitHub

How to run VBA code from a Python script

I recently modified a script I wrote to extract data from a Word document to a csv file. The modified script had to iterate over multiple docs and extract data from certain tables based on certain keywords and fields.

I used the python-docx module to do this, but hit an obstacle when I realised that it could not (as yet) parse Word’s content controls. Since I only had 9 documents, I opened each, pasted some VBA code pilfered off StackOverflow to remove all content controls from the document.

While that worked temporarily, my next step is of course to schedule the script to automatically pull the data out once the folder is updated with the new batch of docs for the month. A solution suggested entails the code being saved inside the doc so it can be called via com.

I’m not happy with that solution because I would still need to open each document and insert the code. What I need to do now is fiddle around some more so that the code can be saved inside the script and then run on each document as needed.

Replace the delimiter in a csv file using Python

Recently I had to wrangle some csv files, including some data calculations and outputting a semi-colon delimited file instead of a comma-delimited file.

	'''

	@date 19/07/2016
	@author Cindy Williams-Jayakumar

	Replaces the delimiter in a csv file
	using the pathlib library in Python 3
	'''

	import csv
	from pathlib import Path

	folder_in = Path(r'C:\Some\Arb\Folder\in')
	folder_out = Path(r'C:\Some\Arb\Folder\out')

	for incsv in folder_in.iterdir():
	outcsv = folder_out.joinpath(incsv.name)
	with open(str(incsv),'r') as fin, open(str(outcsv), 'w') as fout:
	reader = csv.DictReader(fin)
	writer = csv.DictWriter(fout, reader.fieldnames, delimiter='\|')
	writer.writeheader()
	writer.writerows(reader)

view raw

changeDelimiterInCSV.py

hosted with ❤ by GitHub

Convert a list of field names and aliases from Excel to table using ArcPy

I went digging through my old workspace and started looking at some of my old scripts. My style of coding back then is almost embarrassing now 🙂 but that’s just the process of learning. I decided to post this script I wrote just before ArcGIS released their Excel toolset in 10.2.

	'''
	@date 09/07/2013
	@author Cindy Williams

	Converts a spreadsheet containing field names and aliases
	into a file geodatabase table.
	'''
	import arcpy

	xls = r"C:\Some\Arb\Folder\test.xlsx\Fields$"
	tbl = r"C:\Some\Arb\Folder\test.gdb\tbl_Samples"

	with arcpy.da.SearchCursor(xls,("FIELD_NAME","ALIAS")) as cursor:
	for row in cursor:
	arcpy.management.AddField(tbl, row[0], "DOUBLE", field_alias=row[1])
	print("Adding field {}…".format(row[0]))
	print("Fields added successfully.")

view raw

createTableFieldsFromXLS.py

hosted with ❤ by GitHub

From what I can recall, I needed to create a file geodatabase table to store records of microbial sample data. Many of the field names were the chemical compound themselves, such as phosporus or nitrogen, or bacterial names. For brevity’s sake, I had to use the shortest field names possible while still retaining the full meaning.

I set up a spreadsheet containing the full list of field names in column FIELD_NAMES and their aliases in ALIAS. I created an empty table in a file gdb, and used a SearchCursor on the spreadsheet to create the fields and fill in their aliases.

This solution worked for me at the time, but of course there are now better ways to do this.

Reverse geocode spreadsheet coordinates using geocoder and pandas

I had a spreadsheet of coordinates, along with their addresses. The addresses were either inaccurate or missing. Without access to an ArcGIS licence, and knowing the addresses were not available on our enterprise geocoding service, I sought to find a quicker (and open-source) way.

	import geocoder
	import pandas as pd

	xls = r'C:\Some\Arb\Folder\coords.xls'
	out_xls = r'C:\Some\Arb\Folder\geocoded.xls'
	df = pd.read_excel(xls)

	for index, row in df.iterrows():
	g = geocoder.google([row[3], row[2]], method='reverse')
	df.set_value(index, 'Street Address', g.address)

	df.to_excel(out_xls, 'Geocoded')

view raw

reverseGeocodeXLScoords.py

hosted with ❤ by GitHub

I used the geocoder library to do this. I used it previously when I still had an ArcGIS Online account and a Bing key to check geocoding accuracy amongst the three providers.

Since I don’t have those luxuries anymore, I used pandas to read in the spreadsheet and reverse geocode the coordinates found in the the third and fourth columns. I then added a new column to the data frame to contain the returned address, and copied the data frame to a new spreadsheet.

Database access via Python

In my ongoing quest to do absolutely everything through Python, I’ve been looking a lot lately at manipulating databases. I’ve been using arcpy to access GIS databases for years, and last year I finally got around to using pyodbc (and pypyodbc) for accessing SQL Server databases.

Now that I’m in an Oracle environment, Oracle has provided the cx_Oracle library to directly connect to databases. I have yet to test that though. What I’m interested in at the moment is creating and accessing databases for personal use.

I considered MongoDB for a while, but I don’t think I want to go NoSQL yet. This is why I have been experimenting with SQLite (through the sqlite3 library), as it is included in the Python install, and has the delightful SpatiaLite extension. The slogan goes against my one of my mottos (Spatial is Special) while supporting my other motto (Everything is Spatial).

Filter a pandas data frame using a mask

After using pandas for quite some time now, I started to question if I was really using it effectively. After two MOOCs in R about 2 or 3 years ago, I realised that because my GIS work wasn’t in analysis, I would not be able to use it properly.

Similarly, because pandas is essentially the R of Python, I thought I wouldn’t be able to use all the features it had to offer. As it stands, I’m still hovering around in the data munging side of pandas.

	import pandas as pd

	in_xls = r"C:\Some\Arb\Folder\test.xlsx"
	columns = [0, 2, 3, 4, 6, 8]

	# Use a function to define the mask
	# to create a subset of the data frame
	def mask(df, key, value):
	return df[df[key] == value]

	pd.DataFrame.mask = mask

	# Out of the 101 rows, only 50 are stored in the data frame
	df = pd.read_excel(in_xls,0, parse_cols=columns).mask('Create', 'Y')

view raw

filterDataFrameUsingMask.py

hosted with ❤ by GitHub

I used a pandas mask to filter a spreadsheet (or csv) based on some value. I originally used this to filter out which feature classes need to be created from a list of dozens of templates, but I’ve also used it to filter transactions in the money tracking app I made for my household.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31