Recapping FOSS4G Boston 2017
FOSS4G (Free and Open-Source Software for Geospatial) is an annual conference focused on open-source geospatial software; for developers and users across the private sector, government, and academia to share and learn from each other’s work.
The international gathering rotates on a three-year cycle between Europe, North America, and another continent (2016 was in Bonn, Germany; 2015 in Seoul, South Korea). This year’s conference was held in Boston, Massachusetts and Zhenyang Hua and I were lucky enough to attend.
Rather than giving a play-by-play recap of each session, I wanted to share some common themes and highlight some of the projects I found most interesting.
Until recently though, running geoprocessing tasks would require offloading a job to a server-side platform with something like a Web Processing Service (WPS). Increasingly, innovative JS libraries are giving us the power to bring such tasks down to the client.
Here are some other notable projects using JS to do advanced geoprocessing:
- rbush (and similarly, kdbush): Builds a spatial index using an r-tree structure. Very fast bounding box and nearest-neighbor queries.
- tile-reduce: Performs massively parallel operations on vector tile data and aggregates results.
- supercluster: Ultra-fast clustering for large sets of point data. Works with GeoJSON point features to generate a tree of clusters at each zoom level.
3D in the Browser
As I’ve mentioned in previous blog posts, ever since the introduction of WebGL, I’ve dreamed of the day we’d see plugin-free, 3D maps in a browser. After what I saw at FOSS4G, I believe the technology is mature and that day is here.
Advancements in 3D data mean we’re modeling the real world more and more realistically every day. No longer is it acceptable to add a height value, extrude a 2D geometry, and call it “3D”. New data formats and viewing libraries mean that BIM and CAD models, 3D polygons, elevation models, photogrammetric models, and points clouds can all be rendered in their full glory on the web.
Building a 3D data pipeline involves a lot of pieces, here are some key libraries:
- cesium: 3D map display library, maintained by AGI. See also: ol-cesium, to integrate with OpenLayers.
- 3D Tiles: Open standard for storing 3D data, developed by Cesium. Uses adaptive subdivision to efficiently store even sparse data in hierarchical levels of detail. Currently seeking approval as an OGC community standard.
- entwine and greyhound: Process point cloud datasets into a spatially-index, optimized tree structure (Entwine) and stream them over the web (Greyhound).
- plas.io and potree: WebGL point cloud renderers.
Broken record here: modern browsers are powerful. We used to let servers render map tiles for us, and send them down to the client as static JPEG/PNG images. Now we can leverage the power of the browser and do that rendering dynamically on the client.
Vector tiles are GeoJSON features, typically compressed in a protocol buffer binary format, and have several advantages over raster tiles. We can use a data-driven approach to dynamically style features. Features can be interactive, and retain their attributes (for querying, info popups, etc.). Vector tiles also allow us to create maps with a continuous zoom or that adapt to 3D environments, where raster tiles typically restrict zooming to a fixed set of scales.
But vector tiles aren’t useful in every scenario. Tiling the data means that only a subset of the data is loaded in the browser; this is great for performance, but a server-side component may still be required to compute statistics, etc. Chris Whong, of NYC Planning Labs, also noted that some datasets don’t simplify well and result in a large number of features in each tile as you zoom out (the example he gave was a parcel dataset)—in such cases, it may still be useful to render raster tiles server-side or use a hybrid approach that switches between vector/raster tiles based on scale.
See Chris Whong’s GitHub page for an incredibly comprehensive list of libraries for authoring, serving, and rendering vector tiles. Coming soon, PostGIS version 2.4 will have the ability to export vector tiles directly from the database!
P.S. – Vector tiles can be used to turn maps into art. At JS.Geo, Hanbyul Jo from Mapzen told us how she fed vector tiles to a 3D printer and used other manufacturing processes to turn maps into chocolate, ice cubes, and other beautiful objects. She started a project called tile-exporter to convert vector tiles to .OBJ files for 3D printing. Also, peruse Andy Woodruff’s “Expressive Cartography with Code”, for more examples of using code to create beautiful maps.
In 2014 Amazon introduced a new cloud service called “Lambda”, and Microsoft and Google soon followed with Azure Functions and Google Cloud Functions, respectively. These capabilities have set off a new trend cloud infrastructure, commonly referred to as “serverless”. Of course, there is still a server managed behind-the-scenes, but the idea of these services is that you bundle up a script or other small piece of code, which will be given a small amount of resources and limited time to run. Execution of the functions is then triggered by an external stimulus (scheduled time, file uploaded to storage bucket, etc.). And the real value is this: you pay only for the resources you use. Rather than paying for an always-on machine sitting in the cloud waiting for a request, resources for serverless functions are spun up on-demand, and you’re only charged for the brief time of that execution.
As Andrew Thompson from CARTO put it, this a fundamental shift from the machine as a unit of scale to the function as a unit of scale. If you can engineer your solution to run within the constraints imposed (low CPU, low RAM, limited execution time), the cost savings can be immense, and your solution can scale to great lengths. Tools like Serverless Framework exist to help manage architectures like this.
Several companies discussed using AWS Lambda to develop serverless geoprocessing architectures. CARTO demonstrated how they made their SuperBowl map for Twitter: using Lambda functions to pull from Twitter’s search API, convert the data, and push it to CARTO using their SQL API. Development Seed is helping NASA move their massive and ever-growing data stores to the cloud using a processing pipeline built on AWS Lambda. I was most inspired by Azavea, who are writing Lambda functions using MapBox’s rasterio Python library to run remote sensing and other image processing algorithms on-demand for their users.
“Coding as a First Resort”
This idea from the Thursday’s keynote given by Joe Cheng of RStudio really struck me. Typically, in geoprocessing, we think of coding as a last resort. Only when the GUI or built-in tools presented to us don’t do what we want, do we turn to write a script to it our way. Instead, we should start with code, all of our analysis should be done with code. A GUI is a form of abstraction given to you by the author of the application, but code lets you create your own abstractions in a way that’s repeatable and sharable.
Cheng presents this as a core tenet of the R language, that R is about interacting with and exploring your data—working with your data as you code, not knowing what you’ll find. But I’ll go further and say that R is just one example of powerful ways to write code to do geoprocessing with relative ease. Work that’s being done in the open source community—Turf, Python and JS extensions for PostGIS, serverless architectures, vector tiles, WebGL, and more—are changing the ways we think about working with spatial data and giving us powerful tools on new platforms.