I went digging through my old workspace and started looking at some of my old scripts. My style of coding back then is almost embarrassing now 🙂 but that’s just the process of learning. I decided to post this script I wrote just before ArcGIS released their Excel toolset in 10.2.
From what I can recall, I needed to create a file geodatabase table to store records of microbial sample data. Many of the field names were the chemical compound themselves, such as phosporus or nitrogen, or bacterial names. For brevity’s sake, I had to use the shortest field names possible while still retaining the full meaning.
I set up a spreadsheet containing the full list of field names in column
FIELD_NAMES and their aliases in
ALIAS. I created an empty table in a file gdb, and used a
SearchCursor on the spreadsheet to create the fields and fill in their aliases.
This solution worked for me at the time, but of course there are now better ways to do this.
I had a spreadsheet of coordinates, along with their addresses. The addresses were either inaccurate or missing. Without access to an ArcGIS licence, and knowing the addresses were not available on our enterprise geocoding service, I sought to find a quicker (and open-source) way.
I used the geocoder library to do this. I used it previously when I still had an ArcGIS Online account and a Bing key to check geocoding accuracy amongst the three providers.
Since I don’t have those luxuries anymore, I used pandas to read in the spreadsheet and reverse geocode the coordinates found in the the third and fourth columns. I then added a new column to the data frame to contain the returned address, and copied the data frame to a new spreadsheet.
After using pandas for quite some time now, I started to question if I was really using it effectively. After two MOOCs in R about 2 or 3 years ago, I realised that because my GIS work wasn’t in analysis, I would not be able to use it properly.
Similarly, because pandas is essentially the R of Python, I thought I wouldn’t be able to use all the features it had to offer. As it stands, I’m still hovering around in the data munging side of pandas.
I used a pandas mask to filter a spreadsheet (or csv) based on some value. I originally used this to filter out which feature classes need to be created from a list of dozens of templates, but I’ve also used it to filter transactions in the money tracking app I made for my household.
I needed to add a field to dozens of layers. Some of the layers contained the field already, some of them contained a similar field, and some of them did not have the field. I did not want to batch Add Field, because not only would it fail on the layers which already had the field, but it is super slow and I would then still have to transfer the existing values from the old field to the new field.
I wrote the tool in the ArcMap Python window, because I found it easier to load my layers into an mxd first as they were lying all over the place. The new field to be added is set up as a NumPy array, with the relevant dtype.
The script loops over all the layers in the document, adding the field via the Extend Table tool, and then transferring the values from the old field to the new field. Deleting the old field at the end would be an appropriate step to include, but I didn’t, purely because I’ve lost data that way before.
Back when I was doing a lot of GIS work for civil and electrical asset management, a common task I had was to ensure that all civil structures represented as a polyline (reticulation and bulk pipelines, eletrical feeder cables, road surfaces etc) were split at the physical road intersections as seen on the latest available aerial imagery, as well as where they crossed each other.
This task wasn’t too much trouble if the lines were already drawn across each other, as more often than not, the lines crossed at the road intersections. The result was usually achieved by using the Planarize option on the Advanced Editing toolbar. However, it became a problem for new areas which had to be digitised by hand (no CAD drawings or shapefiles received), or lines which had to be corrected.
I would have to visually check on the imagery where the road intersections were, and then manually split each polyline. I ran through some ideas, including digitising all road intersections as points (which would be good data to have anyway), and then splitting all the lines at the closest intersection by using the Split Line at Point tool.
I did however want to create a method of splitting the line multiple times on the fly, instead of just splitting it once (which is the option provided by the Split button on the Editor toolbar). I wanted to be able to select a line, click as many points along the line as I wanted, and then split the line into those points.
The tool iterates over all the points created by the user in the FeatureSet. For each point, a temporary diagonal is created to cut the original polyline. This new part is saved in a list and the tool runs again on the original polyline (minus the new part). Once all the points have been processed, the original polyline is removed from the feature class, and the new parts are inserted. All feature attributes are maintained.
My intention with this tool was to expand it with more features, fix the bugs (it would randomly not work) and upload it as a toolbox. However, the need to do that fell away. I also realised I was starting to get into a space where I was forcing as much code as I could into my work in order to get through the day.
I wrote this script for a former colleague last year. She had received a Word document containing about 600 tables which must have been dumped out of a database somewhere. The tables had the same header, and each table represented an “incident”, with dates, details etc.
She was requested to “put it into Excel”. After she had manually copied the first table into matching columns in a spreadsheet, she came to me. This type of thing is normally a task we would give to a student, as it has nothing to do with GIS. Nevertheless, when I saw the repeating structure I was sure I could come up with something to do this automatically.
The script finds all the tables in the document, and grabs the header from the first table to serve as the headings in the spreadsheet. It then iterates over all the tables, skips the header row and populates the spreadsheet with all the rows from the various tables.
It took about 15 minutes to write (had to play around with accessing the table elements correctly) and less than a minute to extract the data. That’s the amount of time it would have taken to copy 5 of the tables manually. At that pace it would have taken about 4 days to complete the process.
It’s been too long since I’ve posted some code. A few months ago I had a requirement to basically create a UML view of a file gdb and present each feature class as tables for inclusion in a functional specification Word document.
Sadly, since the demise of ArcGIS Diagrammer, there has never been something to take its place. The actual request I received was to take screenshots of the properties dialog box of each feature class in ArcCatalog, and paste those into the document with appropriate headings.
I recognised this request for the total waste of my time it would be, and promptly set about looking for an alternative. I realised I had python-docx already installed.
For each feature class in the geodatabase, a new paragraph is started with the name of the feature class in bold. A line break is inserted, followed by a table. The name and type of each field is added as a new row into the table, and a page break is inserted to start the properties of the next feature class on a new page.
It took me about an hour to look up alternative methods and to put this script together. Most of the time was spent on getting the cells in the table to insert properly, using the age-old method of trial and error. I did it this way to save myself the pain of manually inserting screenshots, knowing that if the format of the feature classes changed I would have to do new screenshots repeatedly.