Inserting spaces into CamelCase

I have some trauma related to regular expressions from my university days. Yes, they are super useful. Do I enjoy using them? Does anyone?

Image result for regular expressions meme

I routinely modify and create new instances of the AMIS GIS data model. Depending on the client and the type of assets they have, there can be 4 feature datasets, containing about 5 feature classes each, or 9 feature datasets with up to 50 feature classes spread across the database.

Inevitably at some point in the process of losing my mind, I will forget that the alias is lost when creating a new feature class, or randomly copying it over or whatever. I then end up with dozens of feature classes with really ugly looking abbreviated layer names in CamelCase. So for posterity, and so that I will never ever forget this ridiculously easy thing again:


#
# @date 05/10/2017
# @author Cindy Jayakumar
#
# Change the alias of feature classes to the
# title case version of the CamelCase name
#
# e.g. WaterPumpStation => Water Pump Station
import arcpy
import re
gdb = r"C:\Some\Arb\Folder\test.gdb"
def alterFCAlias(name):
return re.sub(r"\B([A-Z])", r" \1", name)
for root, _, fcs in arcpy.da.Walk(gdb):
for fc in fcs:
alias = alterFCAlias(fc)
arcpy.AlterAliasName(fc, alias )
print("{0} => {1}".format(fc, alias))


# @credit https://stackoverflow.com/a/199215
import re
text = "ThisIsATest"
re.sub(r"\B([A-Z])", r" \1", text)

One thing I’ve omitted from the script is that I usually store the feature class names with a prefix, e.g. fc_name = wps_WaterPumpStation. In this case, I would use split on the feature class name before passing it to the alterFCAlias function i.e. fc_name.split("_")[1].

Link for mobile users is here.

Aaaaaaand I’ve just realised that I’ve forgotten this basic task so many times over the last few years that I actually already have a blog post about it, except for some reason I was updating the layer names in ArcMap every time instead of resetting it once on the feature classes themselves.

Image result for what were you thinking meme

I passed my registration exams (and other news)

I know it’s already a month into the new year, and I haven’t been active on the blog for a while, but that’s mostly because I had nothing to post. However, that will all be changing because in January I returned to Aurecon. This time around, I’m fulfilling a technology expert role within the Asset Management team, so I’ll be designing and driving our technology strategy going forward. I’ve got a few interesting things lined up, particularly around spatial data warehouses and ontologies, so I’ll be posting about it alot here while I work through my ideas.

I also got my SAGC exam results today – 100% for Paper A and 96% for Paper F, so once the fees are paid, I will be registered as a Geomatics Technologist. When I’m finished with my BscHons next year, I’ll be able to upgrade to Practitioner as I’m just short of the academic requirement.

Speaking of Hons, I just registered for the 2nd (final) year, and am currently studying for 3 exams next week. To be honest, it has been painful studying, simply because many of the concepts we have discussed are outdated and having been in industry now for almost 6 years, it is irritating to have to study these things. I’m just pushing through to get that piece of paper. Hopefully I get a good research topic for the project (or can convince one of the supervisors to take me with my own topic). I’ll worry about that after the exams.

I’ll probably be able to return to my normal posting schedule in March. I just thought it was about time I post something.

Aaaaand now I'm thinking about how much I miss Community. #SixSeasonsAndAMovie

Aaaaand now I’m thinking about how much I miss Community. #SixSeasonsAndAMovie

Using a query table to represent a 1:M relationship spatially

I first discovered (and used) the Make Query Table tool during my second week at Aurecon (March 13 2012, according to my OneNote). This was about a month before I started using ModelBuilder (to combat the frustrations of ArcMap), and about 6 months before I started Python (to combat the frustrations of ModelBuilder).

I’m just giving a bit of context because this was before everything clicked into place for me. Before this point, I treated everything I learnt in my studies and during my internship as separate silos of information: GIS, Databases, Programming.

I did not even realise until after I was working at Aurecon that the query expressions used in ArcMap are SQL, despite me studying all those things. It just shows how one’s mindset can block progress, and how I allowed the awful experience of learning Computer Science in Afrikaans to stop me from letting things “click” for so long.

Back to the tool. Joins in ArcGIS are notoriously slow, and 1:M joins are not allowed (technically they are, but in the sense that an arbitrary matching feature will be joined). Naturally, the relationship between the GIS and the asset register is 1:M.

For example, a single point is used to represent the physical, spatial location of an asset, reservoir WRV-00001. In the asset register, this reservoir is unbundled into various components – storage tank, building, fence etc. Each of these assets have their own unique asset ID, but all have the same GIS ID.

I now need to represent all these assets spatially. The points/lines will all be on top of each other, but that’s fine. The Make Query Table tool does exactly this, but it is…quirky. I’ve compiled a list of things to remember when using this tool (supplemented by this question on GIS.SE):

  1. Tables/feature classes in the relationship should be stored within the same database: I tend to remember this step only after I add the inputs and the tool shouts at me.
  2. Add the feature class first in the multivalue parameter: The format of the input relies on the format of the first argument in the multivalue parameter control. The feature class should be added first to ensure that the output is a layer, otherwise it will be a table view.
  3. Enclose table names and field names in quotes: For example, I wish to join the asset register table ar to the point layer points using their common field GIS_ID. By default, the tool encloses my whole expression in quotes "points.GIS_ID" = "ar.GIS_ID". This will cause the tool to fail. Add extra quotes around the field names "points"."GIS_ID" = "ar"."GIS_ID".
  4. Choose the NO_KEY_FIELD option: Trying to add key fields causes some erratic behaviour (???). Just don’t do it. By selecting this option, the existing ObjectID field from the input will be used.
  5. The output layer will appear to have no symbology: Go into the Layer properties, click the Symbology tab, click the existing symbol, then OK, OK. It’s a bug./li>
  6. Persist to disk!: Remember to export the layer to a feature class, otherwise the layer will only exist in the map document.

Create feature classes from a pandas data frame

I had a large CAD drawing which I had brought into ArcGIS, converted to a feature class and classified groups of features using a 3 letter prefix. I also had an spreadsheet containing a long list of those prefixes, along with additional columns of information for that prefix, including feature class name and shape type.

I wanted to create the feature classes for dozens of these prefixes, based on the values in my converted feature class, and a template feature class for the field structure. The Select geoprocessing tool could have easily split out all the different features per prefix for me, but that would have kept the CAD feature class structure, and not the structure that I wanted.

I figured this would be a good time to get into pandas (and eventually geopandas, maybe).


#
# @date 24/06/2015
# @author Cindy Williams
#
# Creates feature classes by looking up the
# name in a pandas data frame
#
# For use as a standalone script
#
import arcpy
import pandas as pd
# Set workspace
arcpy.env.workspace = r"C:\Some\Arb\Folder\work.gdb"
# Template feature class
fc_template = "ftr_template"
# Spreadsheet containing the values
xl_workbook = r"C:\Some\Arb\Folder\assets.xlsx"
lyr_source = arcpy.management.MakeFeatureLayer("ftr_source")
field_category = "Category"
# Get projection from template feature class
sr = arcpy.Describe(fc_template).spatialReference
# Create data frame and parse values
df = pd.read_excel(xl_workbook, 0, parse_cols[0,6], index_col="Prefix")
# Get the list of categories
categories = list(set(row[0] for row in arcpy.da.SearchCursor(lyr_source, field_category)))
for cat in categories:
print("Processing " + cat)
qry = """ "{0}" = '{1}' """.format(field_category, cat)
# Look up the category in the data frame and return the matching feature class name
fc_name = df.loc[cat, "Feature Class Name"]
try:
arcpy.management.CreateFeatureclass(arcpy.env.workspace,
fc_name,
"POINT",
fc_template,
"#",
"#",
sr)
print("Feature class created: " + fc_name)
lyr_cat = arcpy.management.MakeFeatureLayer(in_features=lyr_source,
where_clause=qry)
arcpy.management.Append(lyr_cat, fc_name, "NO_TEST")
except Exception as e:
print(e)
print("Finished " + cat)
print("Script complete.")

In Line 28, I load the first 7 columns on the first sheet in the workbook into a pandas data frame. I set the index column to the column called “Prefix”, so those values will be used for the lookup instead of the default int index pandas assigns.

In Line 37, the prefix value from the feature class is used to look up the corresponding feature class name in the pandas data frame. Once the feature class has been successfully created, a selection layer of the matching features is appended into the new feature class. I could use a SearchCursor and store the matching features in a python list to be copied into the new feature class, but that’s something I will test at another time.

Developing an asset management GIS data maintenance methodology: Part 5 – My preparation for the way forward

(This is Part 5 of a week long series of posts on the current project I am working on, to develop a reasonable GIS data maintenance strategy for Asset Management data. Read Part 1, 2, 3, 4.)

The work I’ve done over the last few weeks (and years) is all leading to one point. A single system, with all the data topologically correct, standardised and easily accessible.

An enterprise geodatabase, with versioning enabled for the Desktop team to maintain the data without fear of conflicts. Using SDE in this manner will automatically allow for checks to be in place, for example, where I could first check the reconciled versions before moving it to default.

Archiving would be set up, and the database would be regularly backed up. Map services would be published and be made available to the non-GIS users such as the Asset team. I’d prepare the services for consumption in their app of choice: ESRI Maps for SharePoint, ArcGIS for AutoCAD, a JavaScript viewer, an Excel document with the attribute tables embedded…

The services would have the sync capability enabled, so that when they go out in the field, a Collector map could be easily configured for data capture in an offline environment. Since they visit areas which are routinely out of cell coverage, this would be ideal (and better than carrying around a printed mapbook).

While I am busy with this year’s updates, I am keeping all of this in mind. Every little bit I can do now is a little bit less that I have to do later. Once this foundation is in place, we can start looking at more advanced aspects, such as turning all this data into geometric networks, and creating custom tools which can automatically calculate the remaining useful life, asset depreciation and answer all the questions that could possibly be asked.

Developing an asset management GIS data maintenance methodology: Part 4 – Data handover and the audit process

(This is Part 4 of a week long series of posts on the current project I am working on, to develop a reasonable GIS data maintenance strategy for Asset Management data. Read Part 1, Part 2 and Part 3 here).

Once all the new data has been processed and captured, or existing data flagged as asset upgrades, we have to extract several attributes for use by the asset guys. This includes the all important GIS ID, which will be used to link the asset register back to the GIS. The asset register contains a mountain of information about the status of the assets, remaining useful life and other financial information. In the GIS data, we are purely concerned with geometry: is this asset in its correct physical location, with the correct dimensions?

We will carry a few other attributes, such as material type, diameter of pipe, name (if relevant) and town. Once I’ve compiled all the data into this structure, I convert it to a spreadsheet filtered by feature type and including the lengths of the line data. The GIS length is used over the length supplied by the client as it represents the situation on the ground.

Once everything has been submitted to the client, a short while later the auditors will appear. Using some algorithm, they select a sample of the assets and send us a list of GIS IDs or asset names. I link this list up to our GIS and provide a KML file of the locations. This proves that we know where those assets listed on the asset register are (and that no one is trying to fabricate assets). Sometimes the auditors will only provide the asset names – that makes it a bit harder for me to link since we don’t always have the name attribute. It’s the reason why the GIS ID should appear in all documentation.

For one client, I went along with the project leader and sat with an auditor in front of ArcMap. I had to physically look up the assets as she read out the GIS ID or asset name. At that time, the data was not in a good state to begin with, so fortunately she had only sampled assets that I could easily retrieve. As part of this methodology development, I’m documenting the state of the data as it was received, the problems it causes (including unnecessary delays) and the estimated time it would take to clean up the full database for each client.

That’s the part I’m most excited about in this process. I can finally get my hands into the data and standardise it, so if there are any auditor queries, or someone wants to know something quickly about a particular asset, I will know exactly where to find it.

Developing an asset management GIS data maintenance methodology: Part 3 – Data (Processing, updating, working the magic)

(This is Part 3 of a week long series of posts on the current project I am working on, to develop a reasonable GIS data maintenance strategy for Asset Management data. Read Part 1 and Part 2 here.)

GIS data. ESRI has a tab in their help documentation dedicated to the topic. As a GIS person, I may have become accustomed to receiving (and accepting) data in almost every conceivable format, because I know that by some process, I can get it into GIS.

Here are a few examples of data I have received over the last few weeks, and the conversion process followed for each:

  • As-built drawings (dwg/dxf), coordinated, but missing a constant: Set the coordinate system in ArcCatalog. Add the study area boundary to ArcMap, followed by the Polyline layer from the CAD file. Right click the Polyline layer > Properties > Transformations > Enable Transformations > Coordinates. Click OK and go back into the Properties to enable transformations again because ArcMap only remembers it the second time. Add the constant to the coordinates, and press OK. Hope that the drawing falls in the right place. When it does, convert the Polyline layer to a feature class. Project it, then inspect the attribute table. Hope that the CAD technician has placed matching elements in the same layer. Extract the features by unique attribute, ignoring any extraneous data of type Circle/Arc, or any layout items.
  • Dozens of KML files, each containing a single point: Batch convert the KML files to feature classes. While this does create multiple GDBs, it ensures that each file is checked before being extracted for processing. If I’m fairly confident in the data, I will run a script to convert the KMLs and merge them into a single feature class at the same time.
  • PDF files of maps which have been drawn on and scanned: Convert the relevant pages from the PDF to JPG. Georeference the JPG if it contains identifying features such as cadastre (with erf numbers). If there is no line data, it may be OK to eyeball it when digitising.
  • A0 hard copies of maps/as-builts with no digital copy: Eyeball it.
  • Spreadsheets with road names and intersections (no GIS IDs): Format the spreadsheet so that it can be converted (no spaces in field names, remove unnecessary columns etc) to a file gdb table. Hope that the unique combination of road names and intersections will match perfectly with the road feature class.

Sometimes the CAD files are not coordinated, in which case I send it back. Sometimes we get old shapefiles, which have long lost their unique GIS IDs. One time, I received a personal geodatabase (!!!) containing feature classes with a single ID attribute each. Their “matching” attribute tables were stored in separated dbf (!!!) files per folder per service. These dbfs contained many attributes, everything besides the IDs needed to join the data back to the shapes. This is where I had to “work the magic” to get anything usable out.

I haven’t covered all the scenarios, but that’s just about getting the data into GIS. Once it’s in, the data needs to be digitised (if it’s new assets which have been added), or the previous datasets must be inspected and the relevant features extracted (if an asset has been upgraded).

Some discretion needs to be used throughout this process. Time constraints and the current state of the data for a municipality will determine the level of detail which is captured. By putting this methodology in place, I am hoping to change that approach so that in the future, a standard amount of data is captured in a standardised way.

Due to the growing amount of features we were being asked to record as assets, I decided to create a spreadsheet (which will eventually become a table in a database) to separate the services and to specify the prefixes needed for the GIS IDs. A GIS ID is composed of a prefix, a dash and a string of numbers. For example, the prefix for Water Reticulation Pipeline is WRP, so a feature in this feature class may have the GIS ID WRP-00101. Whenever a new asset is added, I run a script to autopopulate the next GIS IDs.


import arcpy
lyr = arcpy.management.MakeFeatureLayer(r"C:\Some\Arb\Folder\work.gdb\ftr_wrp_Pipe")
field = "GIS_ID"
def getWhereQry(exp):
return """ "{0}" IS {1} """.format(field, exp)
# Gets the previous highest GIS ID value. Set n = 0 for new feature class
n = max(int(row[0].split("-")[1]) for row in arcpy.da.SearchCursor(lyr, field, where_clause=getWhereQry("NOT NULL")))
numstring = "00000"
prefix = "WRP"
sep = "-"
# Populate blank GIS IDs with new values
with arcpy.da.UpdateCursor(lyr, field, where_clause=getWhereQry("NULL")) as cursor:
for row in cursor:
n += 1
numstring_length = len(numstring) – len(str(n))
row[0] = prefix + sep + numstring[:numstring_length] + str(n)
cursor.updateRow(row)

Currently, the list contains over 70 feature types we need to maintain. Each service has its own feature dataset. For example, fds_WaterSupply contains ftr_wrp_Pipe, ftr_why_Hydrant, ftr_wva_Valve etc. The naming convention is not only for consistency (all Water Supply prefixes start with W, all Stormwater prefixes start with SW), but also for the eventual transition to a SQL Server database. This way, the feature classes will be grouped according to the service it belongs to (because SQL Server Management Studio will display the feature class tables in alphabetical order, and because it ignores feature datasets. One of our clients actually pointed out this helpful tip).

I am enforcing certain topology rules based on the requests of the asset guys, such as roads are captured intersection to intersection, sewer pipelines are captured manhole to manhole, water pipelines are captured road intersection to road intersection, and parking areas are captured as polygons and converted to centroids with the polygon area attached to the point. I don’t have the actual topology set up, because at this stage it would add unnecessary complexity. Rather, this capturing convention will become a habit, and as we clean up the older datasets, we will automatically be cleaning topology errors as well.

Despite speaking at length about the data, I have only scratched the surface of what we do with the asset data when it comes to us. In Part 4, I will talk about what happens to the data once it’s been processed by GIS.

Developing an asset management GIS data maintenance methodology: Part 2 – Designing the workflow

(This is Part 2 of a week long series of posts on the current project I am working on, to develop a reasonable GIS data maintenance strategy for Asset Management data. Read Part 1 here.)

Yesterday, I gave a very brief overview of what it maintaining asset management data in GIS involves. Of course, the reality is much more complicated. Over the years that I have been involved with this process, my role has grown from simple data capture, to having full control of the data and the workflow needed to maintain it.

As the amount of work this year increased dramatically, I realised that I would need to document the process that I’ve had in my head. The workload would need to be shared with the Desktop GIS team. The first thing I did was to create a coherent folder structure for data storage. While we are transitioning the system, all of our data is still file-based. This already poses a challenge, as multiple people would need to access the data, and would most likely keep copies on their own devices to avoid data loss/slow access times on the network.

This is what I came up with:

FolderStructure

and this is the script which creates that basic structure:


import os
folder_assetarea = r"C:\Assets\AreaA" # The main folder for Area A
year = "2015" # The year to create the structure for
folder_year = os.path.join(fld_mun, year)
folder_list = [["conversion", ["cad_gis", "ftr_shp", "kml_gdb",
"pdf_jpg", "shp_gdb", "tbl_xls",
"xls_tbl",] "jpg_gis", "gps_gis",
"gis_cad"],
["data", ["recd", "sent"]],
["jpg", ["georef", "maps"]],
["mxd", ["site_mapbooks"]],
["pdf"],
["report", ["csv", "doc", "txt", "xls"]],
["workspace"]]
for fld in folder_list:
if len(fld) > 1:
for i in range(len(fld[1])):
subfolder_assetarea =os.path.join(folder_year, fld[0],fld[1][i])
if os.path.exists(subfolder_assetarea):
print("Folder {} exists".format(subfolder_assetarea))
else:
os.makedirs(subfolder_assetarea)
print("Created {}".format(subfolder_assetarea))
else:
subfolder_assetarea = os.path.join(folder_year, fld[0])
if os.path.exists(subfolder_assetarea):
print("Folder {} exists".format(subfolder_assetarea))
else:
os.mkdir(subfolder_assetarea)

It is still my intention to optimise that script and integrate it into my Asset Management Python Toolbox, but I haven’t had the time do it yet. That structure may look like overkill, but it’s the way I keep my sanity, and allows me to build checks into this process. If at any point the data needs to be reverted, I can retrieve the previous version. For example, if we receive a CAD drawing, the workflow is as follows:

  1. Save the CAD file under data\recd\date_assetperson
  2. Make a copy of the CAD file under conversion\cad_gis\date
  3. Georeference the CAD file using the projection of the local municipality/study area
  4. Convert the CAD layers (normally only the polylines, sometimes the points) to feature classes in a file gdb in the same folder
  5. Project the converted feature classes to a file gdbworkspace\date
  6. Process the data further in that working gdb
  7. Final working data is pulled from all the working gdbs and sorted in current.gdb
  8. Final data for handover is pushed into Municipality.gdb

This process has already saved me time over the last few weeks, where I had to fall back on a previous version of a feature class taken from a CAD drawing due to changing requirements. This is also why I am designing this workflow in an agile way – the requirements are constantly changing, and the data is different for each municipality. I’ve had to add more folders/feature datasets since I drew this up a month ago, and I’m still ironing out the kinks in the communication with the rest of the team.

That brings me to the next aspect of this workflow: the OneNote Notebook. The Asset GIS Notebook contains an Overview section at the top level, which has the contact details of the GIS team members and Asset Management team members. It also contains a breakdown of the folder structure with detailed explanations, as well as links to relevant external documentation, such as the Prefixes spreadsheet (more about that in Part 3).

For each municipal/study area folder in the Assets folder, there is a corresponding section group in the Notebook. This section group contains a General section (technical information such as projection, major towns in the region, project costing details) as well as sections per year. The year section contains all the tasks for the current year, such as

  1. Convert data received
  2. Create mapbooks for site
  3. Generate report

etc. Some of the tasks will be quite specific, depending on the state of the data and the client requirements. There is also a Paper Trail subsection, for all email/Lync/phone correspondence with the asset team. Any answers to questions we have about the data are recorded in this section, not only to cover the GIS Team for audit purposes, but also in case a team member must pick up a task where I have left off.

Of course, it would be terrible to lose all of this hard work. In lieu of a better system, I have MacGyvered a backup system of sorts, where each day before I leave, I sync the network folder to my local OneDrive for Business folder using SyncToy, which then syncs to the cloud. It’s not ideal, but it’s better than what I had before (which was nothing. If that network drive failed…)

Although there are other team members helping with the data capture portion of the work, and who have contributed to the development of the workflow, I still retain responsibility for the process. After they have finalised their data capture, I check the data in the feature classes, assign new GIS IDs (another tool for the toolbox) and load the data into its final structure (also another tool for the toolbox).

Tomorrow in Part 3, I will talk about the kind of data we work with. It will probably be even longer than this post, and will hopefully shed some light on why designing this workflow has been challenging.

Developing an asset management GIS data maintenance methodology: Part 1 – An Overview

(This is Part 1 of a week long series of posts on the current project I am working on, to develop a reasonable GIS data maintenance strategy for Asset Management data).

I’ve been meaning to write this series of posts for a while, and now that we are in the middle of “asset management” season at work, I figured that now was as good a time as any. For the last few years, I have assisted the asset management team in my unit with GIS support. Each year, they update the civil (and sometimes electrical) infrastructure asset registers of several local municipalities. This involves unbundling upgrade projects completed in the last financial year, amongst other things.

I’m not involved in that aspect of the work, so I’m not going to try to explain what they do. On my side, this is what would normally happen:

  1. I receive data (shapefiles, spreadsheets, pdfs, CAD drawings, hard copy maps that have been drawn on)
  2. I process/convert/work some magic on the data to get it into a usable GIS format (more on that in the next post)
  3. I give back a spreadsheet containing the new shape data from the upgrades, including the GIS IDs
  4. Sometimes I’ll give a mapbook of the area, if requested.

The shape data (lengths of lines) and the unique GIS IDs per feature are integrated into the asset register, and serve as a link between the GIS and the completed register. It’s very important that this link be maintained – I’ve had to find GIS data for records on an asset register where a stakeholder at the municipality had decided to remove the GIS IDs, hence why I put “work some magic” in point number 2 in the list above.

This year is different though. We are trying to transition to an integrated system, where instead of all the data lying in random spreadsheets on the project server, and random gdbs on my server, all the data should be stored in a central location for easy access (and auditing purposes). This may seem to be an obvious approach, but there’s about 7 years worth of data which needs to be transitioned (excluding the GIS data).

It’s an enormous task, and I’ve been given control over the GIS side. That means that I get to take the data I’ve gathered over the last 3 years and push it through some sort of transformation process, while also overseeing the current tasks and carrying out some of the work which can’t be done by the Desktop team (yet). I’m also writing a Python toolbox with a set of tools to automate some of the functions, such as generating GIS IDs based on a specific convention, automating the mapbook production, and creating the correct folder structure and feature class structure per municipality.

The slideshow below is what first triggered me to write some posts on this topic:

I also saw this article, and had a subsequent conversation with the author on Twitter about it. GIS can add a great deal of value to many asset management processes (and many other fields as well, but let’s stay focussed here). I’m breaking away from my normal schedule and will be posting every day this week as follows:

  • Part 2 – Designing the workflow
  • Part 3 – Data: Processing, updating, working the magic
  • Part 4 – Data handover and the audit process
  • Part 5 – My preparation for the way forward

I’m very excited to write these posts, because I will cover everything that makes up an excellent Enterprise GIS solution: some programming stuff (more Python in my life is always better), some database stuff (SQL and file gdb), some GIS stuff (ensuring all those shapes are in the right place) and some interconnected stuff (eventually mobile apps and cloud services).