Convert a list of field names and aliases from Excel to table using ArcPy

I went digging through my old workspace and started looking at some of my old scripts. My style of coding back then is almost embarrassing now 🙂 but that’s just the process of learning. I decided to post this script I wrote just before ArcGIS released their Excel toolset in 10.2.

From what I can recall, I needed to create a file geodatabase table to store records of microbial sample data. Many of the field names were the chemical compound themselves, such as phosporus or nitrogen, or bacterial names. For brevity’s sake, I had to use the shortest field names possible while still retaining the full meaning.

I set up a spreadsheet containing the full list of field names in column FIELD_NAMES and their aliases in ALIAS. I created an empty table in a file gdb, and used a SearchCursor on the spreadsheet to create the fields and fill in their aliases.

This solution worked for me at the time, but of course there are now better ways to do this.

Advertisements

Write contents of Word tables to a spreadsheet

I wrote this script for a former colleague last year. She had received a Word document containing about 600 tables which must have been dumped out of a database somewhere. The tables had the same header, and each table represented an “incident”, with dates, details etc.

She was requested to “put it into Excel”. After she had manually copied the first table into matching columns in a spreadsheet, she came to me. This type of thing is normally a task we would give to a student, as it has nothing to do with GIS. Nevertheless, when I saw the repeating structure I was sure I could come up with something to do this automatically.

The script finds all the tables in the document, and grabs the header from the first table to serve as the headings in the spreadsheet. It then iterates over all the tables, skips the header row and populates the spreadsheet with all the rows from the various tables.

It took about 15 minutes to write (had to play around with accessing the table elements correctly) and less than a minute to extract the data. That’s the amount of time it would have taken to copy 5 of the tables manually. At that pace it would have taken about 4 days to complete the process.

Convert spreadsheets to GDB tables and display XY

I’m sick today, so I decided to choose one of the simpler scripts for today’s post. This one automates a common task of the GIS professional: converting a spreadsheet of coordinates to GIS. The script assumes you’ve formatted each sheet appropriately.

For each sheet in the workbook, the script creates a file geodatabase table, and displays the coordinates as a layer in ArcMap using the WGS 1984 spatial reference. The script is very basic, and therefore will be easy to change to suit your own data.

Create feature classes from a pandas data frame

I had a large CAD drawing which I had brought into ArcGIS, converted to a feature class and classified groups of features using a 3 letter prefix. I also had an spreadsheet containing a long list of those prefixes, along with additional columns of information for that prefix, including feature class name and shape type.

I wanted to create the feature classes for dozens of these prefixes, based on the values in my converted feature class, and a template feature class for the field structure. The Select geoprocessing tool could have easily split out all the different features per prefix for me, but that would have kept the CAD feature class structure, and not the structure that I wanted.

I figured this would be a good time to get into pandas (and eventually geopandas, maybe).

In Line 28, I load the first 7 columns on the first sheet in the workbook into a pandas data frame. I set the index column to the column called “Prefix”, so those values will be used for the lookup instead of the default int index pandas assigns.

In Line 37, the prefix value from the feature class is used to look up the corresponding feature class name in the pandas data frame. Once the feature class has been successfully created, a selection layer of the matching features is appended into the new feature class. I could use a SearchCursor and store the matching features in a python list to be copied into the new feature class, but that’s something I will test at another time.

EX101x: Final recap

When I signed up for this MOOC, I expected more Python. I’ll quickly summarise the last 6 weeks of the course.

Highlights:

  1. Learning how to correctly use the Data Model introduced in Excel 2013. The way Felienne explained how to use it was quite simple, and realising that one uses a join between two ranges to create the model was an eye-opener.
  2. Double click the value field in a pivot table to get a new sheet with a copy of that data filtered on it: I can’t remember why I wrote this down, but I know it’s useful.
  3. More uses for named ranges
  4. Data checks: definitions and different types
  5. Using wild cards in COUNTIF: Seriously, I don’t know how I didn’t know this.
  6. DataNitro exists: This would have helped me about 2 years ago.

Lowlights:

  1. I was quite disappointed that Python only appeared in Week 7. More Python please!
  2. I was also disappointed that DataNitro was used, instead of accessing everything through the command line.
  3. Language barrier: I know the course coordinator is not a native English speaker. This created many ambiguous questions in the quizzes, as well as some weird phrasing in the videos.
  4. Fellow MOOC-ers not being constructive: I’ve mentioned on this blog several times that I don’t understand the need for many MOOCs to have peer assessments. One in particular took it a step further with mandatory forum posts. While I’m glad that EX101x did not do this, just opening up the “Discussion” below each quiz reminded me of why I don’t like the forums on these things.
    Many of the comments were about the way Felienne was dressed, or her “strange look”, or the way she spoke. Yes, the videos could have been better – maybe she didn’t have to take up almost half the frame, leaving the spreadsheet off to the side and hard to look at. What do her piercings have to do with that? Good grief.
  5. Not enough Python: I cannot stress this enough. I appreciate that I learnt some new things about Excel, but as a power user, I think I was expecting more analysis out of the course.
  6. Bringing in neo4j: What an interesting programme, but to bring it in at such a late stage (basically at the end of the course) did not make sense. I tried to run it but after getting errors I just abandoned it. I already had 81% so I chose not to complete the neo4j exercises. I feel like after they introduced Python to the people crash-course style, the different commands needed for neo4j was needlessly confusing.

Suggestions:

  1. More Python. More Python. More Python.
  2. The videos need to be redone in a clearer way.
  3. The course needs to be structured differently. Python should be introduced much sooner.
  4. Neo4j seems like a nice tool on its own. It should also have been introduced earlier in the course, to allow for more time for the people still getting to grips with Python.

Overall, I did enjoy this MOOC. I enjoyed it enough to pony up for a verified certificate:

EX101

If DelftX offers a follow-up course, I would definitely take it.

Using INDEX & MATCH instead of VLOOKUP

I’ve been using Excel since Office 97. I was about 8 years old, and had no idea what I was doing. By the time I had progressed to VLOOKUPs and PivotTables, I was a bit older (and wiser, I’d like to think).

It was to my chagrin that I discovered about 2 years ago, that instead of creating a clusterbomb of VLOOKUP/CHOOSE (sometimes with some nested IFs sprinkled in for good measure), I could have been doing INDEX/MATCH all along.

=INDEX(x, MATCH(y, z, 0))

x = Range to return a value from
y = Value to look up
z = Range to look up the value in

I’ve made a small spreadsheet to demonstrate. Please ignore the fact that copying the formulas from Excel Online has changed them to the American region structure, hence the semi-colons as delimiters instead of commas.

Cell E2 does the standard VLOOKUP

=VLOOKUP(D2;$A$2:$B$9;2;FALSE)

while F2 uses INDEX/MATCH to achieve the same thing

=INDEX($B$2:$B$9; MATCH(D2;$A$2:$A$9;0))

That looks a bit more complex than the VLOOKUP. The real power is demonstrated in the next example. To achieve E3, VLOOKUP does the following

=VLOOKUP(D3; CHOOSE({2\1};$A$2:$A$9;$B$2:$B$9);2;FALSE)

while INDEX/MATCH follows the same structure as before in F3

=INDEX($A$2:$A$9; MATCH(D3;$B$2:$B$9;0))

This is why I am now more inclined to use INDEX/MATCH. The ability to do a lookup in any column, and return a value from any column, is great. This really helps for those times when I’m not just rather loading the data from the spreadsheets into Python dictionaries and pulling it apart there.

EX101x: Week 1 and 2 recap

I haven’t been as consistent with these recaps as I would like to be, but as I get closer to my first exam in 5 years, I’ve been cooling it on the MOOC front. As a result, I’m ending up completing two weeks in one most of the time now.

Week 1 focussed on using conditional functions like IF, COUNTIF, SUMIF etc. Week 2 was about using lookup and search functions like VLOOKUP, INDEX and MATCH. A few things which I did for the first time:

  1. Import data into a sheet from a web feed: I’ve never had a need to use this function, and I’m not sure if I have the need now, but it’s good to know.
  2. Using a time criteria for COUNTIF: So useful, and I can’t believe that in my 15 years of Excel usage, I have never needed to do something like this.
  3. A use case for setting the range_lookup for VLOOKUP as TRUE: I use VLOOKUP a lot (or used to, before I switched to doing most of my analysis in Python). I also use a lot of nested IF statements for ranges of items, and some combination of INDEX/MATCH.

    At school, we were always taught “FALSE with VLOOKUP” because every scenario always required exact matches. I never though to question that, nor did I have the need to, because with all the GIS data that needs pre-processing, we always require exact matches.

    Now that I know that by creating a lookup table and using TRUE, VLOOKUP can be used to match value ranges the same way I would use a nested IF statement, I can see how that would really have helped me out on a few projects a while ago.

A big part of why I also focussed on learning more Python wherever I could was because I was tired of exiting ArcGIS, manipulating data in Excel, then bringing it back into ArcGIS. With Python, I could do all the data manipulation I wanted right there, and then take it into Excel for non-GIS people.

I’m also really pleased that I can still learn things in Excel to increase my productivity. One becomes so accustomed to doing things a certain way, that if that method is fast enough, one tends not to look for a replacement.