Leaving the GIS industry: bye for now

The GIS industry’s loss is software engineering’s gain. All the best Alex! Reblogging to highlight an excellent farewell post.

Alex Tereshenkov

After working for more than 10 years in geospatial industry, I have decided to change the field and focus on pure software engineering. I have quite enjoyed using GIS tools to solve various computational problems and software development tools to build useful GIS products and services for my customers. However, as my interests starting shifting towards clean code and programming per se – I have noticed I was reading in the bed Refactoring by Martin Fowler way more often than The ESRI guide to GIS analysis by Andy Mitchell – so I thought it would be useful to try changing the career path.

As I won’t be using any of the GIS software any longer, I won’t be able to post any new practical material that would be useful to GIS analysts and developers. However, in this post I would like to share some of the last thoughts I have…

View original post 2,177 more words

A rant about attachments

I remember when attachments were first introduced in ArcGIS Desktop (~~10.2 I think?~~ whoops it was ArcGIS 10). It was a very useful feature, and more functionality was added over the years.

It also made mobile data capture even easier. The fieldworkers would go out, do their assessments, and attach multiple photos to their points. However, attachments with Collector has caused me so much frustration. Specifically, syncing with attachments.

The nature of the work we do (and the economic environment we are in) means that by default, I take the maps offline so that the fieldworkers can carry out their assessments, and then sync back to AGOL when they are on lunch break (or whenever they can pick up WiFi). I discovered a few years ago that once one hits a certain threshold (like 20 attachments in the map), there are going to be problems syncing.

It will just outright fail, or take very long and may need to be attempted a number of times. Why is this? I don’t know. Over the years, I’ve encountered this issue on all types of devices – the latest iPhones, low-end Android tablets, high-end Android tablets, mid-range Android phones…

What it seems like to me is that Collector “expects” a certain connection speed, and when it doesn’t get it, it times out and rolls back the sync. Fair enough – I’ve found multiple delta tables on devices I’ve needed to recover the databases from due to failed sync attempts. On a current project, they are using rugged devices which have really awful network chips (as in, I need to stand about 1 or 2m away from the access point so that I can take the maps offline). Naturally, at the end of the first day, each device had dozens of features with multiple attachments each, which refused to sync.

They have been out in the field for 2 weeks. Everyday, I have to manually retrieve the databases from the device, recover them, and push them out into appropriate geodatabases once I’ve determined what’s inside them.

So clear

I can deal with all of that, because Python is a tool that I maaay have mentioned here before. What I cannot deal with is the fact that attachments are still lost during geoprocessing. The fact that it was added as an environment setting in ArcGIS 10.5 and has been available in ArcGIS Pro for a while is of little comfort to me as I currently have access to neither.

Fine. I store the GlobalIDs in another field, merge the features together into their correct feature classes, enable attachments and insert the records from the corresponding attachment tables. Of course, I forget that the relationship class is now messed up, as it’s linking through the (now incorrect) GlobalID fields instead of the fields I stored the original IDs in.

After staring at the screen cross-eyed, I then realise that I only need to provide the attachments as jpgs in a folder, which I can extract from the tables using the original IDs and write into subfolders based on the feature type. I don’t actually need to link them back together since the technician does not need to view the photos to complete the work in ArcMap. /endrant

Lidar – Accuracy Versus Resolution

Digital Coast GeoZone

[Update: In November 2014, ASPRS published new guidance for accuracy classes that includes recommendations for pulse spacing in addition to vertical accuracy.]

Resolution != Accuracy

It seems like every day I hear a statement about high-resolution lidar that bugs the heck out of me. I even hear it from our staff at the Center. So, I thought I’d write a little entry about it and see if any of them pick up on it. So, what I hear is, “We need high resolution lidar for X, Y, and Z.” Sometimes it’s for estimating sea level change impacts, sometimes it’s for habitat modeling; could be for nearly anything. While I do often hear it for sea level change, my example is wetlands delineation in relatively flat areas like South Carolina or Florida. To quote one friend (we’ll call him Randy) interested in wetlands:

It appears that most of it will be point spacing…

View original post 760 more words

The end is nigh…

I read a blog post a few weeks ago about the inevitable demise of ArcMap. When ArcGIS Pro launched a couple of years ago, I immediately started preparing for the end of ArcMap. By that, I mean I played around with Pro for a few weeks then put it away until the corporate overlords decided it was time to switch.

I had to sign up for a free trial the other day for something, and found this:

What happened to ArcMap?

After logging in and heading to the downloads, I found this:

What happened to ArcMap?!?!?!!

The forums pointed it out as well. I’ve been keeping a side eye on ArcGIS Pro development over the last few years, so I’m starting the transition from October, with the aim of using it as my daily driver by December. I’ve just started training our junior consultant who comes from a CAD background on GIS as well, so I may just start him out on ArcGIS Pro from the jump.

The search for the ultimate productivity system continues

I spend far too much time thinking about systems. I have a system for everything – my email accounts, my household, my work, my studies and for how all those systems interact. Sure, everyone has a system for those, you might say. For me though, it is my preferred form of procrastination, in the sense that I have a tangible result (a new system!) at the end of it, as opposed to just whiling away the time on YouTube.

Let’s take a look at the evolution of my productivity system. I started out with OneNote in 2011, integrating it with Outlook Tasks in 2012. That plodded along for a bit, until I realised that tracking tasks in Outlook is ridiculous and I needed something web-based. I fiddled around a bit with some to-do list managers and settled on Wunderlist. I kept trying to link it with OneNote, but that turned out to be a useless fight.

I then started messing around with Trello, essentially duplicating what I was doing in Wunderlist. I waffled between the two until Microsoft bought Wunderlist, and now created To-Do, which will eventually replace Wunderlist but has very few features at the time of writing. There’s also Planner, which is the Office 365 version of a combination of Trello and Jira I guess? I don’t know.

I feel like I’m in limbo. At work, I’m currently using Planner with the team to track project tasks, linked to the relevant Notebooks, Groups and SharePoint libraries (accessible through the OneDrive mobile apps). I did link Wunderlist to Planner, but that just seemed weird.

I’m still using Wunderlist for daily task reminders, bill reminders, shopping list and food prep. I’m using Trello to track the food inventory of my household. I’m (kind of) using the normal Outlook calendar to track appointments. In other words, my system is a mess and I need to throw everything out and start over!

Using itertools module within arcpy workflows

Alex Tereshenkov

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

view raw

arcpy_chain_cursors.ipynb

hosted with ❤ by GitHub

View original post

Thoughts on spatial data warehousing

I’ve been thinking quite a bit lately about how to store spatial data. It’s something I’ve covered here and my attitude towards this topic has evolved over the years.

The organisations I’ve worked in have mountains of spatial data accumulated over the years. The data is stored in shapefiles, geodatabases, normal databases, spreadsheets, documents, reports, photos…Why is it this way? It doesn’t have to be this way. It shouldn’t be this way!

In the course of my research for a topic for my project for next year, I’ve honed in on the methods for implementing an enterprise geoportal within an existing spatial data infrastructure. However, I feel like my focus is shifting to the data that the geoportal is trying to expose to a larger audience.

The concepts of a spatial data warehouse and a spatially enabled operational data store have been intriguing me. A regular GIS task involves comparing spatial data across a time period, analysing trends and presenting the results in a map or report. Why aren’t we storing this historical data in a SDW that’s optimised for reporting?

Non-spatial data can come from a variety of sources as well – spreadsheets, other databases etc. Another common GIS task is to spatially enable these datasets. Why are we not storing the outputs in a spatial enabled operational data store in an open format like GML?

I think it’s because to plan and implement a SDW/S-ODS takes time (and money). With a normal EDW, the organisation will not need much convincing to see the benefit of implementing one. “Spatial” is still seen as an “add-on”, or a “nice-to-have”.

The issue with names

I recently underwent a name change, and though I have yet to make it official (who wants to waste a Saturday at Home Affairs?), I have been thinking about the implications of my name change.

Now that my surname contains a hyphen and is 18 characters long (with my full name now 26 characters), I’ve been wondering how I should abbreviate it. Some hasty Googling shows that there is no standard for this. My entire life I just assumed that the first part of the surname takes precedence so the initials remain the same. In other words, Cindy Lee Williams (CLW) becomes Cindy Lee Williams-Jayakumar (CLW).

I toyed around with the idea of dropping my middle name (itself having been an issue with people assuming I’m Cindy-Lee and not Cindy Lee) to become Cindy Williams-Jayakumar (CW), but the thought of having only two initials terrified me.

I came across this blog post which calls out the assumptions programmers make when building systems which need to accept names (I’m guessing that’s about 95% of all systems). Now that my name has become slightly more complicated, I’m going to be more aware of my own assumptions when writing code, and not just when it comes to validating names.

I’ve also decided to be a bit more difficult and use CWJ as my initials. I had CLW for 27 years, it was time for a change.

Database access via Python

In my ongoing quest to do absolutely everything through Python, I’ve been looking a lot lately at manipulating databases. I’ve been using arcpy to access GIS databases for years, and last year I finally got around to using pyodbc (and pypyodbc) for accessing SQL Server databases.

Now that I’m in an Oracle environment, Oracle has provided the cx_Oracle library to directly connect to databases. I have yet to test that though. What I’m interested in at the moment is creating and accessing databases for personal use.

I considered MongoDB for a while, but I don’t think I want to go NoSQL yet. This is why I have been experimenting with SQLite (through the sqlite3 library), as it is included in the Python install, and has the delightful SpatiaLite extension. The slogan goes against my one of my mottos (Spatial is Special) while supporting my other motto (Everything is Spatial).

A rant about utilisation

Last week I posted a script that easily extracted a series of repeating tables from Word to Excel using my favourite programming language of the last 4 years, Python. I’d like to expand on the last paragraph I wrote:

It took about 15 minutes to write (had to play around with accessing the table elements correctly) and less than a minute to extract the data. That’s the amount of time it would have taken to copy 5 of the tables manually. At that pace it would have taken about 4 days to complete the process.

I was quite irritated when I wrote this script, and part of the reason is why I have been railing against utilisation as a metric for billing. The person who requested this task probably reckoned it would take about a day for my former colleaguge to get the data into Excel. The actual time, based on my estimate above, would have been 4 days. In reality, it turned out to only be an hour’s work in total (my time and my colleagues’s time). How do you bill that?

I would say split the difference and bill it as 2 days work – only an extra day on the expectation, while still 2 days’ short of the actual time it would have taken. This way one would be 2 days “ahead”, with time to do research, or catch up on other projects where the budget is low.

The catch with doing things that way is that you would need to keep track of when to submit work. If you give the work after an hour, but then book 2 days to the project, the next week the person who requested the work is either going to come question you, reject your hours, let it pass because you’ve done favours for them before, or not even pick up that the hours were booked because they aren’t doing the project management part of their job correctly. Guess which option happens most often?

What really happened of course is that my colleague only billed for that 1 hour, because the person requesting the work checked in after 2 hours to hear “if it’s done yet”. I’m no expert on what running a business or being a project manager should look like, but I think I have a good idea of what it shouldn’t look like.

What is the alternative to using the billable hour and utilisation as a measure? I don’t know, I didn’t study management and/or finance. This just one example I have from a time when I was in a purely technical role, in a company where output was based on utilisation. I’m now in more a hybrid role, where output is based on “did you do it before the deadline?”. I’ll be able to judge more clearly as time passes.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Everything is Spatial

Geodevelopment, GIS, and other things geo

Spatial Thoughts & Life