Geospatial Processing with Julia

yeesian@MaptimeBoston

Overview

  • Why Julia
  • Geospatial processing with Julia
  • Recent Development
  • Opportunities and Challenges

Why another language?

  • programming languages as mediating devices: strike a balance between human and computer needs
    • Ill-specifying: internal contradictions, or systematically flawed (unsafe, erroneous).
    • Under-specifying: not enough detail to analyze validity/meaning of programs.
    • Over-specifying: needless idiosyncrasies, so implementations are overly constrained.

http://graydon2.dreamwidth.org/1839.html

Two Cultures of Computing

  • Batch Processing ("expensive hardware, rarely reprogrammed")
  • Interactive Computing (using the computer as a notebook or a whiteboard)
  • Ousterhout's Dichotomy (one "script" and one "system" language)
    • [script] and [system]
    • R and RCpp
    • Python and Cython
    • Matlab and MEX files
    • ...

http://graydon2.dreamwidth.org/3186.html

Beyond Ousterhout's Dichotomy

  • The "Goldilocks" perspective:

    • reject the notion that human interactivity and batch performance are in inherent tension
    • trade some efficiency for features (interactivity, safety, reflection, debugging simplicity, etc)
  • Spoiler: it involves, to a large extent, Lisp.

    • tidy definition: for experimentation and fast implementation.
    • easy to inspect, modify, visualize and searching the insides of running systems.
    • high quality REPLs, sophisticated mixed-mode editors, browsers, debuggers and GUIs

http://graydon2.dreamwidth.org/189377.html

Why Today, and not Yesterday

In a sense, 20 years after the web smashed the language landscape to pieces, we've come to a place where the criteria for a "fast language" has been nudged forwards over the line that was holding back a lot of techniques that used to be considered "too expensive to use in production" -- say, ubiquitous virtual dispatch, garbage collection, latent types -- became "acceptable costs", culturally.

http://graydon2.dreamwidth.org/189377.html

What about current libraries: JavaScript

http://maptime.io/anatomy-of-a-web-map/#84

What about current libraries: Python

What about current libraries: Python

Geospatial Processing with Julia

  • Follows the hands-on example at the end of https://github.com/jwass/maptime-boston-python.
  • Introduce the following libraries (work-in-progress):
    • ArchGDAL.jl (similar to rasterio and fiona)
    • GeoDataFrames.jl (similar to geopandas)

variables and values

In [1]:
x = 1
x, typeof(x)
Out[1]:
(1,Int64)
In [2]:
x = 1.0
x, typeof(x)
Out[2]:
(1.0,Float64)
In [3]:
x = "1.0"
x, typeof(x)
Out[3]:
("1.0",ASCIIString)
In [4]:
x = true
x, typeof(x)
Out[4]:
(true,Bool)

data-type conversions

In [5]:
for x in (Float64(1),
          string(1.0),
          parse(Float64, "1.0"),
          parse(Int, "1"),
          Int(true))
    @show x, typeof(x)
end
(x,typeof(x)) = (1.0,Float64)
(x,typeof(x)) = ("1.0",ASCIIString)
(x,typeof(x)) = (1.0,Float64)
(x,typeof(x)) = (1,Int64)
(x,typeof(x)) = (1,Int64)

control-flow (if-else and for-loops)

In [6]:
for word in ("julia", "at", "map", "time", "boston")
    if contains(word, "t")
        println("t: $word")
    elseif contains(word, "a")
        println("a: $word")
    else
        println("*: $word")
    end
end
a: julia
t: at
a: map
t: time
t: boston

defining functions

In [7]:
function printword(word)
    if contains(word, "t")     println("t: $word")
    elseif contains(word, "a") println("a: $word")
    else                       println("*: $word")
    end
end

for word in ("julia", "at", "map", "time", "boston")
    printword(word)
end
a: julia
t: at
a: map
t: time
t: boston

collections (arrays, tuples, dictionaries)

In [11]:
for x in (("julia", "at", "map", "time", "boston"),
          ["julia", "at", "map", "time", "boston"],
          Dict("language" => "julia",
               "event" => "maptime",
               "location" => "boston"))
    @show x
    for item in x
        println("\t $item")
    end
end
x = ("julia","at","map","time","boston")
	 julia
	 at
	 map
	 time
	 boston
x = ASCIIString["julia","at","map","time","boston"]
	 julia
	 at
	 map
	 time
	 boston
x = Dict("event"=>"maptime","location"=>"boston","language"=>"julia")
	 "event"=>"maptime"
	 "location"=>"boston"
	 "language"=>"julia"

Other Learning Resources

For more, check out the following resources:

Recap: Python Geospatial Libs

Now: Julia Geospatial Libs

GDAL.jl

work in progress at https://github.com/visr/GDAL.jl

In [12]:
using GDAL
GDAL.allregister() # register the drivers

dataset = GDAL.openex("data/rodents.geojson", GDAL.GDAL_OF_VECTOR, C_NULL, C_NULL, C_NULL)
Out[12]:
Ptr{GDAL.GDALDatasetH} @0x00007fd704d161f0
In [13]:
layer = GDAL.datasetgetlayerbyname(dataset, "OGRGeoJSON")
Out[13]:
Ptr{GDAL.OGRLayerH} @0x00007fd704d117a0
In [14]:
feature = GDAL.getnextfeature(layer) # first feature
@show GDAL.getfieldasinteger64(feature, 0) # id
@show GDAL.getfieldasstring(feature, 1) # OPEN_DT
GDAL.destroy(feature)
GDAL.getfieldasinteger64(feature,0) = 48
GDAL.getfieldasstring(feature,1) = "11/09/2011 02:54:02 PM"
In [15]:
feature = GDAL.getnextfeature(layer) # second feature
@show GDAL.getfieldasinteger64(feature, 0)
@show GDAL.getfieldasstring(feature, 1)
GDAL.destroy(feature)
GDAL.getfieldasinteger64(feature,0) = 67
GDAL.getfieldasstring(feature,1) = "03/05/2014 08:46:53 AM"
In [16]:
GDAL.close(dataset)
GDAL.destroydrivermanager()

ArchGDAL.jl

work-in-progress at https://github.com/yeesian/ArchGDAL.jl

In [17]:
import ArchGDAL

@time ArchGDAL.registerdrivers() do
    ArchGDAL.read("data/rodents.geojson") do dataset
        println(dataset)
    end
end
GDAL Dataset (Driver: GeoJSON/GeoJSON)
File(s): data/rodents.geojson 
Number of feature layers: 1
  Layer 0: OGRGeoJSON (wkbPoint), nfeatures = 10756

  0.520685 seconds (392.96 k allocations: 17.541 MB, 2.62% gc time)
In [18]:
@time ArchGDAL.registerdrivers() do
    ArchGDAL.read("data/rodents.geojson") do dataset
        for (i,feature) in enumerate(ArchGDAL.getlayer(dataset, 0))
            i > 2 && break
            println(feature)
        end
    end
end
Feature
  (index 0) geom => POINT
  (index 0) id => 48
  (index 1) OPEN_DT => 11/09/2011 02:54:02 PM

Feature
  (index 0) geom => POINT
  (index 0) id => 67
  (index 1) OPEN_DT => 03/05/2014 08:46:53 AM

  0.294920 seconds (97.84 k allocations: 4.423 MB)

GeoDataFrames

experiment-in-progress at https://github.com/yeesian/GeoDataFrames.jl

In [19]:
using GeoDataFrames; const GD = GeoDataFrames

neighborhoods = GD.read("data/Boston_Neighborhoods.shp")
DataFrames.head(neighborhoods)
Out[19]:
geometry0AcresNameOBJECTIDSHAPE_areaSHAPE_len
1Geometry: MULTIPOLYGON (((-71.1259265672231 42.2720044534673 ... 34)))1605.56181523Roslindale16.99382726723e753563.9125971
2Geometry: POLYGON ((-71.104991583 42.3260930173557,-71.10487 ... 557))2519.23531679Jamaica Plain21.09737890396e856349.9371614
3Geometry: POLYGON ((-71.0904337145803 42.3357612955289,-71.0 ... 289))350.85216018Mission Hill31.52831200976e717918.7241135
4Geometry: POLYGON ((-71.098108339852 42.3367217099475,-71.09 ... 475))188.61119227Longwood Medical Area48.21590353542e611908.7571476
5Geometry: POLYGON ((-71.0666286565676 42.3487740128554,-71.0 ... 554))26.53973299Bay Village51.15607076939e64650.63549327
6Geometry: POLYGON ((-71.0583778032908 42.3498224214285,-71.0 ... 285))15.63984554Leather District6681271.6720133237.14053697
In [20]:
rodents = GD.read("data/rodents.geojson")
DataFrames.head(rodents)
Out[20]:
geometry0idOPEN_DT
1Geometry: POINT (-71.0721 42.3224)4811/09/2011 02:54:02 PM
2Geometry: POINT (-71.0617 42.3477)6703/05/2014 08:46:53 AM
3Geometry: POINT (-71.0464 42.3357)8610/19/2011 03:21:20 PM
4Geometry: POINT (-71.0168 42.3834)9505/31/2012 01:37:49 PM
5Geometry: POINT (-71.1059 42.3099)9611/20/2015 09:37:19 AM
6Geometry: POINT (-71.0765 42.342)18609/20/2012 09:11:27 AM
In [21]:
function numrodents(neighborhood)
    sum([ArchGDAL.contains(neighborhood.ptr, r.ptr) for r in rodents[:geometry0]])
end

@time numrodents(neighborhoods[1,:geometry0])
  1.530536 seconds (174.90 k allocations: 5.028 MB, 0.62% gc time)
Out[21]:
156
In [22]:
@time neighborhoods[:numrodents] = Int[numrodents(n) for n in neighborhoods[:geometry0]]
DataFrames.head(neighborhoods)
 42.293394 seconds (2.75 M allocations: 55.753 MB, 0.03% gc time)
Out[22]:
geometry0AcresNameOBJECTIDSHAPE_areaSHAPE_lennumrodents
1Geometry: MULTIPOLYGON (((-71.1259265672231 42.2720044534673 ... 34)))1605.56181523Roslindale16.99382726723e753563.9125971156
2Geometry: POLYGON ((-71.104991583 42.3260930173557,-71.10487 ... 557))2519.23531679Jamaica Plain21.09737890396e856349.9371614491
3Geometry: POLYGON ((-71.0904337145803 42.3357612955289,-71.0 ... 289))350.85216018Mission Hill31.52831200976e717918.7241135157
4Geometry: POLYGON ((-71.098108339852 42.3367217099475,-71.09 ... 475))188.61119227Longwood Medical Area48.21590353542e611908.75714764
5Geometry: POLYGON ((-71.0666286565676 42.3487740128554,-71.0 ... 554))26.53973299Bay Village51.15607076939e64650.6354932781
6Geometry: POLYGON ((-71.0583778032908 42.3498224214285,-71.0 ... 285))15.63984554Leather District6681271.6720133237.1405369710

our first plot

In [23]:
@time GeoDataFrames.plot(neighborhoods,
    label=:Name,
    plt = Plots.plot(bg = :white))
[Plots.jl] Initializing backend: pyplot
 22.263985 seconds (18.15 M allocations: 793.389 MB, 1.63% gc time)
Out[23]:

switching to a different backend

for reference: https://juliaplots.github.io/backends/

In [24]:
using Plots
plotlyjs()
WARNING: using Plots.PyPlot in module Main conflicts with an existing identifier.
Out[24]:
Plots.PlotlyJSBackend()
In [25]:
@time GeoDataFrames.plot(neighborhoods,
    label=:Name,
    plt = Plots.plot(bg = :white))