Geospatial Processing with Julia

yeesian@MaptimeBoston

Overview

  • Why Julia
  • Geospatial processing with Julia
  • Recent Development
  • Opportunities and Challenges

Why another language?

  • programming languages as mediating devices: strike a balance between human and computer needs
    • Ill-specifying: internal contradictions, or systematically flawed (unsafe, erroneous).
    • Under-specifying: not enough detail to analyze validity/meaning of programs.
    • Over-specifying: needless idiosyncrasies, so implementations are overly constrained.

http://graydon2.dreamwidth.org/1839.html

Two Cultures of Computing

  • Batch Processing ("expensive hardware, rarely reprogrammed")
  • Interactive Computing (using the computer as a notebook or a whiteboard)
  • Ousterhout's Dichotomy (one "script" and one "system" language)
    • [script] and [system]
    • R and RCpp
    • Python and Cython
    • Matlab and MEX files
    • ...

http://graydon2.dreamwidth.org/3186.html

Beyond Ousterhout's Dichotomy

  • The "Goldilocks" perspective:

    • reject the notion that human interactivity and batch performance are in inherent tension
    • trade some efficiency for features (interactivity, safety, reflection, debugging simplicity, etc)
  • Spoiler: it involves, to a large extent, Lisp.

    • tidy definition: for experimentation and fast implementation.
    • easy to inspect, modify, visualize and searching the insides of running systems.
    • high quality REPLs, sophisticated mixed-mode editors, browsers, debuggers and GUIs

http://graydon2.dreamwidth.org/189377.html

Why Today, and not Yesterday

In a sense, 20 years after the web smashed the language landscape to pieces, we've come to a place where the criteria for a "fast language" has been nudged forwards over the line that was holding back a lot of techniques that used to be considered "too expensive to use in production" -- say, ubiquitous virtual dispatch, garbage collection, latent types -- became "acceptable costs", culturally.

http://graydon2.dreamwidth.org/189377.html

What about current libraries: JavaScript

http://maptime.io/anatomy-of-a-web-map/#84

What about current libraries: Python

What about current libraries: Python

Geospatial Processing with Julia

  • Follows the hands-on example at the end of https://github.com/jwass/maptime-boston-python.
  • Introduce the following libraries (work-in-progress):
    • ArchGDAL.jl (similar to rasterio and fiona)
    • GeoDataFrames.jl (similar to geopandas)

variables and values

In [1]:
x = 1
x, typeof(x)
Out[1]:
(1,Int64)
In [2]:
x = 1.0
x, typeof(x)
Out[2]:
(1.0,Float64)
In [3]:
x = "1.0"
x, typeof(x)
Out[3]:
("1.0",ASCIIString)
In [4]:
x = true
x, typeof(x)
Out[4]:
(true,Bool)

data-type conversions

In [5]:
for x in (Float64(1),
          string(1.0),
          parse(Float64, "1.0"),
          parse(Int, "1"),
          Int(true))
    @show x, typeof(x)
end
(x,typeof(x)) = (1.0,Float64)
(x,typeof(x)) = ("1.0",ASCIIString)
(x,typeof(x)) = (1.0,Float64)
(x,typeof(x)) = (1,Int64)
(x,typeof(x)) = (1,Int64)

control-flow (if-else and for-loops)

In [6]:
for word in ("julia", "at", "map", "time", "boston")
    if contains(word, "t")
        println("t: $word")
    elseif contains(word, "a")
        println("a: $word")
    else
        println("*: $word")
    end
end
a: julia
t: at
a: map
t: time
t: boston

defining functions

In [7]:
function printword(word)
    if contains(word, "t")     println("t: $word")
    elseif contains(word, "a") println("a: $word")
    else                       println("*: $word")
    end
end

for word in ("julia", "at", "map", "time", "boston")
    printword(word)
end
a: julia
t: at
a: map
t: time
t: boston

collections (arrays, tuples, dictionaries)

In [11]:
for x in (("julia", "at", "map", "time", "boston"),
          ["julia", "at", "map", "time", "boston"],
          Dict("language" => "julia",
               "event" => "maptime",
               "location" => "boston"))
    @show x
    for item in x
        println("\t $item")
    end
end
x = ("julia","at","map","time","boston")
	 julia
	 at
	 map
	 time
	 boston
x = ASCIIString["julia","at","map","time","boston"]
	 julia
	 at
	 map
	 time
	 boston
x = Dict("event"=>"maptime","location"=>"boston","language"=>"julia")
	 "event"=>"maptime"
	 "location"=>"boston"
	 "language"=>"julia"

Other Learning Resources

For more, check out the following resources:

Recap: Python Geospatial Libs

Now: Julia Geospatial Libs

GDAL.jl

work in progress at https://github.com/visr/GDAL.jl

In [12]:
using GDAL
GDAL.allregister() # register the drivers

dataset = GDAL.openex("data/rodents.geojson", GDAL.GDAL_OF_VECTOR, C_NULL, C_NULL, C_NULL)
Out[12]:
Ptr{GDAL.GDALDatasetH} @0x00007fd704d161f0
In [13]:
layer = GDAL.datasetgetlayerbyname(dataset, "OGRGeoJSON")
Out[13]:
Ptr{GDAL.OGRLayerH} @0x00007fd704d117a0
In [14]:
feature = GDAL.getnextfeature(layer) # first feature
@show GDAL.getfieldasinteger64(feature, 0) # id
@show GDAL.getfieldasstring(feature, 1) # OPEN_DT
GDAL.destroy(feature)
GDAL.getfieldasinteger64(feature,0) = 48
GDAL.getfieldasstring(feature,1) = "11/09/2011 02:54:02 PM"
In [15]:
feature = GDAL.getnextfeature(layer) # second feature
@show GDAL.getfieldasinteger64(feature, 0)
@show GDAL.getfieldasstring(feature, 1)
GDAL.destroy(feature)
GDAL.getfieldasinteger64(feature,0) = 67
GDAL.getfieldasstring(feature,1) = "03/05/2014 08:46:53 AM"
In [16]:
GDAL.close(dataset)
GDAL.destroydrivermanager()

ArchGDAL.jl

work-in-progress at https://github.com/yeesian/ArchGDAL.jl

In [17]:
import ArchGDAL

@time ArchGDAL.registerdrivers() do
    ArchGDAL.read("data/rodents.geojson") do dataset
        println(dataset)
    end
end
GDAL Dataset (Driver: GeoJSON/GeoJSON)
File(s): data/rodents.geojson 
Number of feature layers: 1
  Layer 0: OGRGeoJSON (wkbPoint), nfeatures = 10756

  0.520685 seconds (392.96 k allocations: 17.541 MB, 2.62% gc time)
In [18]:
@time ArchGDAL.registerdrivers() do
    ArchGDAL.read("data/rodents.geojson") do dataset
        for (i,feature) in enumerate(ArchGDAL.getlayer(dataset, 0))
            i > 2 && break
            println(feature)
        end
    end
end
Feature
  (index 0) geom => POINT
  (index 0) id => 48
  (index 1) OPEN_DT => 11/09/2011 02:54:02 PM

Feature
  (index 0) geom => POINT
  (index 0) id => 67
  (index 1) OPEN_DT => 03/05/2014 08:46:53 AM

  0.294920 seconds (97.84 k allocations: 4.423 MB)

GeoDataFrames

experiment-in-progress at https://github.com/yeesian/GeoDataFrames.jl

In [19]:
using GeoDataFrames; const GD = GeoDataFrames

neighborhoods = GD.read("data/Boston_Neighborhoods.shp")
DataFrames.head(neighborhoods)
Out[19]:
geometry0AcresNameOBJECTIDSHAPE_areaSHAPE_len
1Geometry: MULTIPOLYGON (((-71.1259265672231 42.2720044534673 ... 34)))1605.56181523Roslindale16.99382726723e753563.9125971
2Geometry: POLYGON ((-71.104991583 42.3260930173557,-71.10487 ... 557))2519.23531679Jamaica Plain21.09737890396e856349.9371614
3Geometry: POLYGON ((-71.0904337145803 42.3357612955289,-71.0 ... 289))350.85216018Mission Hill31.52831200976e717918.7241135
4Geometry: POLYGON ((-71.098108339852 42.3367217099475,-71.09 ... 475))188.61119227Longwood Medical Area48.21590353542e611908.7571476
5Geometry: POLYGON ((-71.0666286565676 42.3487740128554,-71.0 ... 554))26.53973299Bay Village51.15607076939e64650.63549327
6Geometry: POLYGON ((-71.0583778032908 42.3498224214285,-71.0 ... 285))15.63984554Leather District6681271.6720133237.14053697
In [20]:
rodents = GD.read("data/rodents.geojson")
DataFrames.head(rodents)
Out[20]:
geometry0idOPEN_DT
1Geometry: POINT (-71.0721 42.3224)4811/09/2011 02:54:02 PM
2Geometry: POINT (-71.0617 42.3477)6703/05/2014 08:46:53 AM
3Geometry: POINT (-71.0464 42.3357)8610/19/2011 03:21:20 PM
4Geometry: POINT (-71.0168 42.3834)9505/31/2012 01:37:49 PM
5Geometry: POINT (-71.1059 42.3099)9611/20/2015 09:37:19 AM
6Geometry: POINT (-71.0765 42.342)18609/20/2012 09:11:27 AM
In [21]:
function numrodents(neighborhood)
    sum([ArchGDAL.contains(neighborhood.ptr, r.ptr) for r in rodents[:geometry0]])
end

@time numrodents(neighborhoods[1,:geometry0])
  1.530536 seconds (174.90 k allocations: 5.028 MB, 0.62% gc time)
Out[21]:
156
In [22]:
@time neighborhoods[:numrodents] = Int[numrodents(n) for n in neighborhoods[:geometry0]]
DataFrames.head(neighborhoods)
 42.293394 seconds (2.75 M allocations: 55.753 MB, 0.03% gc time)
Out[22]:
geometry0AcresNameOBJECTIDSHAPE_areaSHAPE_lennumrodents
1Geometry: MULTIPOLYGON (((-71.1259265672231 42.2720044534673 ... 34)))1605.56181523Roslindale16.99382726723e753563.9125971156
2Geometry: POLYGON ((-71.104991583 42.3260930173557,-71.10487 ... 557))2519.23531679Jamaica Plain21.09737890396e856349.9371614491
3Geometry: POLYGON ((-71.0904337145803 42.3357612955289,-71.0 ... 289))350.85216018Mission Hill31.52831200976e717918.7241135157
4Geometry: POLYGON ((-71.098108339852 42.3367217099475,-71.09 ... 475))188.61119227Longwood Medical Area48.21590353542e611908.75714764
5Geometry: POLYGON ((-71.0666286565676 42.3487740128554,-71.0 ... 554))26.53973299Bay Village51.15607076939e64650.6354932781
6Geometry: POLYGON ((-71.0583778032908 42.3498224214285,-71.0 ... 285))15.63984554Leather District6681271.6720133237.1405369710

our first plot

In [23]:
@time GeoDataFrames.plot(neighborhoods,
    label=:Name,
    plt = Plots.plot(bg = :white))
[Plots.jl] Initializing backend: pyplot
 22.263985 seconds (18.15 M allocations: 793.389 MB, 1.63% gc time)
Out[23]:

switching to a different backend

for reference: https://juliaplots.github.io/backends/

In [24]:
using Plots
plotlyjs()
WARNING: using Plots.PyPlot in module Main conflicts with an existing identifier.
Out[24]:
Plots.PlotlyJSBackend()
In [25]:
@time GeoDataFrames.plot(neighborhoods,
    label=:Name,
    plt = Plots.plot(bg = :white))

Plotly javascript loaded.

To load again call

init_notebook(true)

[Plots.jl] Initializing backend: plotlyjs
  6.675909 seconds (6.51 M allocations: 285.039 MB, 1.54% gc time)
Out[25]:

making a chloropleth

In [26]:
@time GeoDataFrames.plot(neighborhoods,
    label = :Name,
    plt = Plots.plot(bg = :white),
    fillvalue = :numrodents,
    legend = false)
  0.312263 seconds (1.11 M allocations: 55.259 MB, 4.18% gc time)
Out[26]:

choosing a color palette

In [27]:
@time GeoDataFrames.plot(neighborhoods,
    label = :Name,
    plt = Plots.plot(bg = :white),
    fillvalue = :numrodents,
    legend = false,
    colorgrad = Plots.cgrad(:blues))
  0.177945 seconds (1.04 M allocations: 52.231 MB, 6.05% gc time)
Out[27]:

For more options: check out the documentation at https://juliaplots.github.io/.

other tools for interactivity

In [28]:
using Interact
pyplot()

@manipulate for cg in [:inferno, :heat, :blues]
    GeoDataFrames.plot(neighborhoods,
        label = :Name,
        plt = Plots.plot(bg = :white),
        fillvalue = :numrodents,
        legend = false,
        colorgrad = Plots.cgrad(cg))
end
Out[28]:

If you were to make a change, what might it be?

Source: https://xkcd.com/833/

If you were to make a change, what might it be?

Source: https://twitter.com/MapOfTheWeek/status/588055199719251970

If you were to make a change, what might it be?

Source: http://xkcd.com/1138/ (and discussion)

If you were to make a change, what might it be?

Source: http://maptimeboston.github.io/web-maps-101/#52

Recap: Julia Geospatial Libs

Coming Up: Julia Geospatial Libs

Coming Up: Julia Geospatial Libs

Coming Up: A grammar of data manipulation

work-in-progress at https://github.com/davidagold/jplyr.jl and https://github.com/yeesian/SQLQuery.jl/tree/parse-expr

In [29]:
using SQLQuery

@sqlquery tbl |> select(a,b)
Out[29]:
SELECT a,
       b
  FROM tbl
In [30]:
@sqlquery tbl |> filter(a > 100, b == "maptime")
Out[30]:
SELECT *
  FROM tbl
 WHERE a > 100
   AND b == 'maptime'
In [31]:
@sqlquery tbl |> orderby(desc(a))
Out[31]:
  SELECT *
    FROM tbl
ORDER BY a DESC

you compose "verbs" using "pipes" |>

In [32]:
@sqlquery tbl |>
    select(*, result = sqrt(column1, col2), a = min(2,column3)) |> 
    filter(result > 1000) |>
    orderby(desc(a))
Out[32]:
  SELECT *
    FROM (SELECT *
            FROM (SELECT *,
                         sqrt(column1,col2) AS result,
                         min(2,column3) AS a
                    FROM tbl)
           WHERE result > 1000)
ORDER BY a DESC

Behind the Scenes

In [33]:
expression = :(tbl |>
    select(*, result = sqrt(column1, col2), a = min(2,column3)) |> 
    filter(result > 1000) |>
    orderby(desc(a)))
Out[33]:
:(((tbl |> select(*,result=sqrt(column1,col2),a=min(2,column3))) |> filter(result > 1000)) |> orderby(desc(a)))
In [34]:
dump(SQLQuery._sqlquery(expression))
SQLQuery.OrderbyNode{SQLQuery.FilterNode{SQLQuery.SelectNode{Symbol}}} 
  input: SQLQuery.FilterNode{SQLQuery.SelectNode{Symbol}} 
    input: SQLQuery.SelectNode{Symbol} 
      input: Symbol tbl
      args: Array(Any,(3,))
        1: Symbol *
        2: Expr 
          head: Symbol kw
          args: Array(Any,(2,))
          typ: Any
        3: Expr 
          head: Symbol kw
          args: Array(Any,(2,))
          typ: Any
    args: Array(Expr,(1,)) [:(result > 1000)]
  args: Array(Any,(1,))
    1: Expr 
      head: Symbol call
      args: Array(Any,(2,))
        1: Symbol desc
        2: Symbol a
      typ: Any
In [35]:
SQLQuery.translatesql(SQLQuery._sqlquery(expression))
Out[35]:
"  SELECT *\n    FROM (SELECT *\n            FROM (SELECT *,\n                         sqrt(column1,col2) AS result,\n                         min(2,column3) AS a\n                    FROM tbl)\n           WHERE result > 1000)\nORDER BY a DESC"

Applying it to Practice

In [36]:
macro inspect(args...)
    ArchGDAL.registerdrivers() do
        ArchGDAL.read("data/test-2.3.sqlite") do dataset
            sqlcommand = SQLQuery.translatesql(SQLQuery._sqlquery(args))
            ArchGDAL.executesql(dataset, sqlcommand) do results
                GeoDataFrames.geodataframe(results)
            end
        end
    end
end

@inspect towns |>
    select(*) |>
    limit(5)
Out[36]:
geometry0PK_UIDNamePeoplesLocalCouncCountyRegion
1Geometry: POINT (427002.77 4996361.33)1Brozolo435100
2Geometry: POINT (367470.48 4962414.5)2Campiglione-Fenile1284100
3Geometry: POINT (390084.12 5025551.73)3Canischio274100
4Geometry: POINT (425246.99 5000248.3)4Cavagnolo2281100
5Geometry: POINT (426418.89 4957737.37)5Magliano Alfieri1674100
In [37]:
@sqlquery towns |>
    select(name, peoples) |>
    filter(peoples > 350000) |>
    orderby(desc(peoples))
Out[37]:
  SELECT *
    FROM (SELECT *
            FROM (SELECT name,
                         peoples
                    FROM towns)
           WHERE peoples > 350000)
ORDER BY peoples DESC
In [38]:
@inspect towns |>
    select(name, peoples) |>
    filter(peoples > 350000) |>
    orderby(desc(peoples))
Out[38]:
namepeoples
1Roma2546804
2Milano1256211
3Napoli1004500
4Torino865263
5Palermo686722
6Genova610307
7Bologna371217
8Firenze356118
In [39]:
@sqlquery towns |>
    select(ntowns = count(*),
           smaller = min(peoples),
           bigger = max(peoples),
           totalpeoples = sum(peoples),
           meanpeoples = sum(peoples) / count(*))
Out[39]:
SELECT count(*) AS ntowns,
       min(peoples) AS smaller,
       max(peoples) AS bigger,
       sum(peoples) AS totalpeoples,
       sum(peoples) / count(*) AS meanpeoples
  FROM towns
In [40]:
@inspect towns |>
    select(ntowns = count(*),
           smaller = min(peoples),
           bigger = max(peoples),
           totalpeoples = sum(peoples),
           meanpeoples = sum(peoples) / count(*))
Out[40]:
ntownssmallerbiggertotalpeoplesmeanpeoples
18101332546804570061477036
In [41]:
@inspect highways |>
    select(PK_UID,
           npts = npoint(geometry),
           astext(startpoint(geometry)),
           astext(endpoint(geometry)),
           x(nthpoint(geometry,2)),
           y(nthpoint(geometry,2))) |>
    orderby(desc(npts))
Out[41]:
PK_UIDnptsST_AsText(ST_StartPoint(geometry))ST_AsText(ST_EndPoint(geometry))ST_X(ST_PointN(geometry,2))ST_Y(ST_PointN(geometry,2))
17746758POINT(632090.156998 4835616.546126)POINT(663300.737479 4795631.803342)632086.00966488334.835625748753577e6
27755120POINT(663292.190654 4795627.307765)POINT(632085.166691 4835620.171885)663295.99249545324.795626489419861e6
31534325POINT(668247.593086 4862272.349444)POINT(671618.13304 4854179.734158)668232.52928495384.862273561966714e6
42053109POINT(671613.424233 4854121.472532)POINT(654264.259259 4855357.41189)671610.52361430314.854129554368173e6
57732755POINT(619601.675367 4855174.599496)POINT(668724.797158 4862015.941886)619593.71153968534.855174743988363e6
67672584POINT(669230.644526 4861399.656095)POINT(656778.219794 4841754.820045)669235.29231719174.861388341674283e6
72072568POINT(654262.489833 4855356.779528)POINT(671604.674669 4854161.831221)654264.19751972664.8553668330274895e6
81512333POINT(678698.183542 4835739.644472)POINT(671608.752851 4854176.222572)678649.26990537664.835784869429852e6
91492206POINT(671618.220589 4854185.937448)POINT(678650.789552 4835773.241197)671629.62334458674.854190205654052e6
107692200POINT(657836.87225 4842388.82151)POINT(668753.698596 4861983.767314)657846.63465086984.842390798813466e6
113642057POINT(664216.052197 4910551.772908)POINT(667841.057054 4863143.709969)664158.72246650794.9105169433443e6
127471992POINT(642334.928386 4910026.621566)POINT(652123.994165 4868904.786248)642336.63923157914.910026658828413e6
134921915POINT(778873.993104 4885381.495322)POINT(697280.883854 4850452.857642)778875.61072597714.885383951525334e6
146981879POINT(1075657.324742 4665962.507322)POINT(1088521.809982 4641907.256511)1.0756515959733103e64.665966841282304e6
157701834POINT(671524.375018 4854054.315521)POINT(671605.495811 4854157.963121)671511.13392708724.854045698825569e6
165411805POINT(768687.708934 4875800.342242)POINT(727246.015043 4847936.331512)768668.92669344184.87588074814125e6
174141794POINT(625250.079541 4789515.994231)POINT(668731.310305 4848638.152835)625323.02221870144.789529259853636e6
184881794POINT(625250.079541 4789515.994231)POINT(668731.310305 4848638.152835)625323.02221870144.789529259853636e6
195371727POINT(696416.823534 4849747.49765)POINT(752357.078757 4867509.169718)696416.89381167624.849745114356477e6
204951678POINT(695366.937305 4849890.749257)POINT(745339.690695 4839447.009088)695334.40642783784.849880253898576e6
217721637POINT(647197.376462 4862795.32992)POINT(619427.347367 4855345.192365)647179.8581081284.862792628927675e6
224971622POINT(745213.378733 4839490.103422)POINT(794928.023983 4851103.971745)745228.26835604484.839462011895631e6
232901599POINT(771225.986098 4823572.33269)POINT(687099.41214 4823942.325964)771227.66571565874.823574847990383e6
241301521POINT(726585.57513 4775073.367238)POINT(671429.902096 4854168.307884)726587.17457142844.775130678919593e6
25371482POINT(668539.140339 4858263.038813)POINT(629686.499389 4802622.574554)668539.01916250494.858267814741329e6
267661470POINT(676469.53734 4849480.120832)POINT(671602.595908 4854175.191147)676468.98013086034.849474041332171e6
277171457POINT(1070154.669552 4450928.712064)POINT(1027109.998032 4461662.354314)1.0701546695516466e64.450928712063506e6
286761452POINT(597074.183727 4906394.152034)POINT(610608.952173 4856418.991794)597072.89662311124.906365525872452e6
294241446POINT(665274.669503 4847056.310631)POINT(654847.253359 4785649.133626)665122.73500110454.847119316587441e6
307631439POINT(685560.007889 4847315.867006)POINT(738015.356591 4883984.372051)685624.45474338464.847298580681927e6
In [42]:
@sqlquery highways |>
    select(PK_UID,
           npts = npoint(geometry),
           astext(startpoint(geometry)),
           astext(endpoint(geometry)),
           x(nthpoint(geometry,2)),
           y(nthpoint(geometry,2))) |>
    orderby(desc(npts))
Out[42]:
  SELECT *
    FROM (SELECT PK_UID,
                 ST_NumPoints(geometry) AS npts,
                 ST_AsText(ST_StartPoint(geometry)),
                 ST_AsText(ST_EndPoint(geometry)),
                 ST_X(ST_PointN(geometry,2)),
                 ST_Y(ST_PointN(geometry,2))
            FROM highways)
ORDER BY npts DESC

Opportunities and Challenges

Opportunities and Challenges

Opportunities and Challenges

Opportunities and Challenges

Learn tools, and use tools, but don't accept tools. Always distrust them; always be alert for alternative ways of thinking. This is what I mean by avoiding the conviction that you "know what you're doing". -- Bret Victor, The Future of Programming

Opportunities and Challenges

  • Roger Bivand, The R Development Process
    • The very specific and uncodified skill set in R development and maintenance, covering not only a mix of programming languages, but also attitudes to checking the consequences of commits
    • Recruitment is problematic; doing R, CRAN or R-Forge maintenance does not bring tenure, and very often insufficient understanding for the dilemmas faced by maintainers is shown
    • R will benefit from the maturing of alternatives such as Julia [and Python], and movements between projects among developers should be seen as a useful sharing of experience

Some Ideas for New Packages

  1. LeafletJS.jl -- Julia interface to leaflet.js visualization library
  2. MapboxJS.jl -- Julia interface to mapbox.js visualization library
  3. MapStyle.jl -- specification for styling plots (e.g. CartoCSS, or Mapbox GL Style)
  4. MapTiles.jl -- fetch map tiles from different services
  5. GDALBenchmarks.jl -- provide benchmark datasets to measure performance of GDAL/wrappers
  6. Better Raster Processing Tools