Category Archives: GDAL

Remarks on SVN-trac to GitHub migration

GRASS GIS is an open source geoinformation system which is developed by a globally distributed team of developers. Besides the source code developers also message translators, people who write documentation, those who report bugs and wishes and more are involved.

1. Early days… from pre-Internet to CVS and SVN

While GRASS GIS is under development since 1982 (no typo!) it has been put into a centralized source code management system in December 1999. Why so late? Because the World Wide Web (WWW) became available in the 1990s along with tools like browsers and such, followed by the development of distributed source code management tools. We moved on 29th Dec 1999 (think Y2K bug) the entire code into our instance of CVS (Concurrent Versioning System). With OSGeo being founded in 2006, we migrated the CVS repository to SVN (Subversion for the source code management) and trac (bug and wish tracker) on 8 Dec 2007.

2. Time to move on: git

Now, after more than 10 years using SVN/trac time had come to move on and join the large group of projects managing their source code in git (see also our related Wiki page on migration). Git comes with numerous advantages, yet we needed to decide which hosting platform to use. Options where github.com, gitlab.com, gitlab or gitea on OSGeo infrastructure, or other platforms. Through a survey we found out that the preference among contributors is GitHub. While not being open source itself it offers several advantages: it is widely known (good to get new developers interested and involved), numerous OSGeo projects are hosted there under the GitHub “OSGeo organization“.

If all fails (say, one day GitHub no longer being a reasonable choice) the import of our project from GitHub to GitLab is always possible. Indeed, we meanwhile mirror our code on OSGeo’s gitea server.

Relevant script code and migration ticket:

Relevant steps:

  • migrated SVN trunk -> git master
  • migrated and tagged release branches (milestones)
  • deleted “develbranch6” (we compared it to “releasebranch_6_4” and didn’t discover relevant differences)
  • Fix commit messages (yes, we really wanted to be brave, updating decades of commit messages!):
    • references to old RT tracker tickets (used Dec 2000 – Dec 2006)
    • references to old GForge tracker tickets (used Jan 2007 – Dec 2008)
    • references to other trac tickets (#x -> https://trac.osgeo.org/…)

3. Source code migration: the new git repositories

  • github repository “grass” (repo)

    • Source code from 1999 to present day (SVN-trunk -> git-master)
    • all 7.x release branches
  • github repository “grass-legacy” (repo)

    • separate repository for older GRASS GIS releases (3.2, 4.x, 5.x, 6.x), hence source code now available in git since 1987!
  • github repository “grass-addons” (repo)

    • repository for addons
  • github repository “grass-promo” (repo)
    • repository for promotional material
  • github repository “grass-website” (repo)
    • repository for upcoming new Website

4. Remarks on the “grass-legacy” repository

What special about it:

  • the source code goes back to 1987!
  • file timestamps (which I tried to preserve for decades :-) have been used to reconstruct the source code history (e.g., releasebranch_3_2)
  • junk files removed (plenty of leftover old binary files, files consisting of a special char only etc)
  • having this grass-legacy repo available in parallel to the main grass repo which contains the  recent source code we have a continuous source code coverage from 1987 to today in git.
  • size is about 250MB

What’s missing

  • the 4.3 source code doesn’t have distinct timestamps. Someone must have once packaged without mtime preservation… a pity. Perhaps a volunteer may fix that by carrying over the timestamps from GRASS GIS 4.2 in case the md5sum of a file is identical (or so).

5. Trac issue migration

A series of links had to be updated. Martin Landa invested days and days on that (thanks!!). He used the related GDAL efforts as a basis (Even Rouault: thanks!). As the date for the trac migration we selected 2007-12-09 (r25479) as it was the first SVN commit (after the years in CVS). The migration of trac bugs to github (i.e. transfer of trac ticket content) required several steps:

Link updates in the ticket texts:

  • links to other tickets (now to be pointed to full trac URL). Note that there were many styles of referring in the commit log message which had to be parsed accordingly
  • links to trac wiki (now to be pointed to full trac URL)
  • links source code in SVN (now to be pointed to full trac URL)
  • images and attachments (now to be pointed to full trac URL)

Transferring:

  • “operating system” trac label into the github issue text itself (following the new issue reporting template)
  • converting milestones/tickets/comments/labels
  • converting trac usernames to Github usernames
  • setting assignees if possible, set new “grass-svn2git” an assignee otherwise
  • slowing down transfer to match the 60 requests per second API limit rate at github

6. Fun with user name mapping

Given GRASS GIS’ history of 35+ years we had to invest major effort in identifying and mapping user names throughout the decades (see also bug tracker history). The following circumstances could be identified:

  • user present in CVS but not in SVN
  • user present in SVN but not in CVS
  • user present in both with identical name
  • user present in both with different name (well, in our initial CVS days in 1999 we often naivly picked our surnames like “martin”, “helena”, “markus”, “michael” … cute yet no scaling very much over the years!) as some were changed in the CVS to SVN migration in 2007, leading to
    • colliding user names
  • some users already having a github account (with mostly different name again)

We came up with several lookup tables, aiming at catching all variants. Just a “few” hours to dig in old source code files and in emails for finding all the missing email addresses…

7. Labels for issues

We cleaned up the trac component of the bug reports, coming up with the following categories which have to be visually grouped by color since the label list is just sorted alphabetically in github/gitlab:

  • Issue category:
    • bug
    • enhancement
  • Issue solution (other than fixing and closing it normally):
    • duplicate
    • invalid
    • wontfix
    • worksforme
  • Priority:
    • blocker
    • critical
    • feedback needed
  • Components:
    • docs
    • GUI
    • libs
    • modules
    • packaging
    • python
    • translations
    • unittests
    • Windows specific

Note that the complete issue migration is still to be done (as of Nov. 2019). Hopefully addressed at the GRASS GIS Community Sprint Prague 2019.

8. Setting up the github repository

In order to avoid users being flooded by emails due to the parsing of user contributions which normally triggers an email from github) we reached out to GitHub support in order to temporarily disable these notifications until all source code and selected issues were migrated.

The issue conversion rate was 4 min per trac bug to be converted and uploaded to github. Fairly slow but likely due to the API rate limit imposed and the fact that the migration script above generates a lot of API requests rather than combined ones..
Note to future projects to be migrated: use the new gihub import API (unfortunately we got to know about its existence too late in our migration process).

Here out timings which occurred during the GRASS GIS project migration from SVN to github:

  • grass repo: XX hours (all GRASS GIS 7.x code)
  • grass-legacy repo: XX hours (all GRASS GIS 3.x-6.x code)
  • NNN issues: XX hours – forthcoming.

9. New issue reporting template

In order to guide the user when reporting new issues, we will develop a small template – forthcoming.

10. Email notifications: issues to grass-dev and commits to grass-commit

We changed the settings from SVN post-hook to Github commit notifications and they flow in smoothly into the grass-commit mailing list. Join it to follow the development.

Overall, after now several months of using our new workflow we can state that things work fine.

GDAL 2.1 packaged for Fedora 23 and 24

GDAL logoThe new GDAL 2.1 is now also packaged for Fedora 23 and 24 which is possible due to the tireless efforts of various Fedora packagers.

Repo: https://copr.fedorainfracloud.org/coprs/neteler/GDAL/

Installation Instructions:

su

# Fedora 23+24:
# install this extra repo
dnf copr enable neteler/GDAL

# A) in case of update, simply
dnf update

# B) in case of new installation (gdal-devel is optional)
dnf install gdal gdal-python gdal-devel

GDAL/OGR 1.11.0 released

The new version 1.11.0 of GDAL/OGR (http://www.gdal.org/) which offers major new features has been released. GDAL/OGR is a C++ geospatial data access library for raster and vector file formats, databases and web services.  It includes bindings for several languages, and a variety of command line tools.

Highlights:

More complete information on the new features and fixes in the 1.11.0 release can be found at http://trac.osgeo.org/gdal/wiki/Release/1.11.0-News

The new release can be downloaded from:

50th ICA-OSGeo Lab established at Fondazione Edmund Mach (FEM)

We are pleased to announce that the 50th ICA-OSGeo Lab has been established at the GIS and Remote Sensing Unit (Piattaforma GIS & Remote Sensing, PGIS), Research and Innovation Centre (CRI), Fondazione Edmund Mach (FEM), Italy. CRI is a multifaceted research organization established in 2008 under the umbrella of FEM, a private research foundation funded by the government of Autonomous Province of Trento. CRI focuses on studies and innovations in the fields of agriculture, nutrition, and environment, with the aim to generate new sharing knowledge and to contribute to economic growth, social development and the overall improvement of quality of life.

The mission of the PGIS unit is to develop and provide multi-scale approaches for the description of 2-, 3- and 4-dimensional biological systems and processes. Core activities of the unit include acquisition, processing and validation of geo-physical, ecological and spatial datasets collected within various research projects and monitoring activities, along with advanced scientific analysis and data management. These studies involve multi-decadal change analysis of various ecological and physical parameters from continental to landscape level using satellite imagery and other climatic layers. The lab focuses on the geostatistical analysis of such information layers, the creation and processing of indicators, and the production of ecological, landscape genetics, eco-epidemiological and physiological models. The team pursues actively the development of innovative methods and their implementation in a GIS framework including the time series analysis of proximal and remote sensing data.

The GIS and Remote Sensing Unit (PGIS) members strongly support the peer reviewed approach of Free and Open Source software development which is perfectly in line with academic research. PGIS contributes extensively to the open source software development in geospatial (main contributors to GRASS GIS), often collaborating with various other developers and researchers around the globe. In the new ICA-OSGeo lab at FEM international PhD students, university students and trainees are present.

PGIS is focused on knowledge dissemination of open source tools through a series of courses designed for specific user requirement (schools, universities, research institutes), blogs, workshops and conferences. Their recent publication in Trends in Ecology and Evolution underlines the need on using Free and Open Source Software (FOSS) for completely open science. Dr. Markus Neteler, who is leading the group since its formation, has two decades of experience in developing and promoting open source GIS software. Being founding member of the Open Source Geospatial Foundation (OSGeo.org, USA), he served on its board of directors from 2006-2011. Luca Delucchi, focal point and responsible person for the new ICA-OSGeo Lab is member of the board of directors of the Associazione Italiana per l’Informazione Geografica Libera (GFOSS.it, the Italian Local Chapter of OSGeo). He contributes to several Free and Open Source software and open data projects as developer and trainer.

Details about the GIS and Remote Sensing Unit at http://gis.cri.fmach.it/

Open Source Geospatial Foundation (OSGeo) is a not-for-profit organisation founded in 2006 whose mission is to support and promote the collaborative development of open source geospatial technologies and data.

International Cartographic Association (ICA) is the world authoritative body for cartography and GIScience. See also the new ICA-OSGeo Labs website.

Processing Landsat 8 data in GRASS GIS 7: Import and visualization

banner_landsat_rgb

The Landsat 8 mission is a collaboration between the U.S. Geological Survey (USGS) and National Aeronautics and Space Administration (NASA) which continues the acquisition of high-quality data for observing land use and land cover change.

The Landsat 8 spacecraft which was launched in 2013 carries they following key instruments:

  • OLI: the Operational Land Imager which collects data in the visible, near infrared, and shortwave infrared wavelength regions as well as a panchromatic band. With respect to Landsat 7 two new spectral bands have been added: a deep-blue band for coastal water and aerosol studies (band 1), and a band for cirrus cloud detection (band 9). Furthermore, a Quality Assurance band (BQA) is also included to indicate the presence of terrain shadowing, data artifacts, and clouds.
  • TIRS: The Thermal Infrared Sensor continues thermal imaging and is also intended to support emerging applications such as modeling evapotranspiration for monitoring water use consumption over irrigated lands.

The data from Landsat 8 are available for download at no charge and with no user restrictions.

For our analysis example, we’ll obtain (freely – thanks to NASA and USGS!) a Landsat 8 scene from http://earthexplorer.usgs.gov/

First of all, you should register.

1. Landsat 8 download procedure

1. Enter Search Criteria:

  • path/row tab, enter Type WRS2: Path: 16, Row: 35
  • Date range: 01/01/2013 – today
  • Click on the “Data sets >>” button

2. Select Your Data Set(s):

  • Expand the entry + Landsat Archive
    [x] L8 OLI/TIRS
  • Click on the “Results >>” button

(We jump over the additional criteria)

4. Search Results

From the resulting list, we pick the data set:

earthexplorer_selection_lsat8Entity ID: LC80160352013134LGN03
Coordinates: 36.04321,-79.28696
Acquisition Date: 14-MAY-13

Using the “Download options”, you can download the data set (requires login). Select the choice:
[x] Level 1 GeoTIFF Data Product (842.4 MB)

You will receive the file “LC80160352013134LGN03.tar.gz”.

2. Unpacking the downloaded Landsat 8 dataset

To unpack the data, run (or use a graphical tool at your choice):

tar xvfz LC80160352013134LGN03.tar.gz

A series of GeoTIFF files will be extracted: LC80160352013134LGN03_B1.TIF, LC80160352013134LGN03_B2.TIF, LC80160352013134LGN03_B3.TIF, LC80160352013134LGN03_B4.TIF, LC80160352013134LGN03_B5.TIF, LC80160352013134LGN03_B6.TIF, LC80160352013134LGN03_B7.TIF, LC80160352013134LGN03_B8.TIF, LC80160352013134LGN03_B9.TIF, LC80160352013134LGN03_B10.TIF, LC80160352013134LGN03_B11.TIF, LC80160352013134LGN03_BQA.TIF

We may check the metadata with “gdalinfo“:

gdalinfo LC80160352013134LGN03_B1.TIF
Driver: GTiff/GeoTIFF
Files: LC80160352013134LGN03_B1.TIF
Size is 7531, 7331
Coordinate System is:
PROJCS["WGS 84 / UTM zone 17N",
  GEOGCS["WGS 84",
  DATUM["WGS_1984",
  SPHEROID["WGS 84",6378137,298.257223563,
...
Pixel Size = (30.000000000000000,-30.000000000000000)
...

3. Want to spatially subset the Landsat scene first?

If you prefer to cut out a smaller area (subregion), check here for gdal_translate usage examples.

4. Import into GRASS GIS 7

Note: While this Landsat 8 scene covers the area of the North Carolina (NC) sample dataset, it is delivered in UTM rather than the NC’s state plane metric projection. Hence we preprocess the data first in its original UTM projection prior to the reprojection to NC SPM.

Using the Location Wizard, we can import the dataset easily into a new location (in case you don’t have UTM17N not already created earlier):

grass70 -gui

grass7_loc_wizard1
grass7_loc_wizard2
grass7_loc_wizard3
grass7_loc_wizard4
grass7_loc_wizard5
grass7_loc_wizard6
grass7_loc_wizard7
grass7_loc_wizard8
grass7_loc_wizard9

 

 

 

Now start GRASS GIS 7 and you will find the first band already imported (the others will follow shortly!).

For the lazy folks among us, we can also create a new GRASS GIS Location right away from the dataset on command line:

grass70 -c LC80160352013134LGN03_B10.TIF ~/grassdata/utm17n

5. Importing the remaining Landsat 8 bands

The remaining bands can be easily imported with the raster import tool:

grass7_import1

The bands can now be selected easily for import:

grass7_import2

  • Select “Directory” and navigate to the right one
  • The available GeoTIFF files will be shown automatically
  • Select those you want to import
  • You may rename (double-click) the target name for each band
  • Extend the computation region accordingly automatically

Click on “Import” to get the data into the GRASS GIS location. This takes a few minutes. Close the dialog window then.

In the “Map layers” tab you can select the bands to be shown:

grass7_visualize1

6. The bands of Landsat 8

(cited from USGS)

Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) images consist of nine spectral bands with a spatial resolution of 30 meters for Bands 1 to 7 and 9. New band 1 (ultra-blue) is useful for coastal and aerosol studies. New band 9 is useful for cirrus cloud detection. The resolution for Band 8 (panchromatic) is 15 meters. Thermal bands 10 and 11 are useful in providing more accurate surface temperatures and are collected at 100 meters. Approximate scene size is 170 km north-south by 183 km east-west (106 mi by 114 mi).

Landsat 7 Wavelength (micrometers) Resolution (meters) Landsat 8 Wavelength (micrometers) Resolution (meters)
Band 1 – Coastal aerosol 0.43 – 0.45 30
Band 1 – Blue 0.45 – 0.52 30 Band 2 – Blue 0.45 – 0.51 30
Band 2 – Green 0.52 – 0.60 30 Band 3 – Green 0.53 – 0.59 30
Band 3 – Red 0.63 – 0.69 30 Band 4 – Red 0.64 – 0.67 30
Band 4 (NIR) 0.77 – 0.90 30 Band 5 – Near Infrared (NIR) 0.85 – 0.88 30
Band 5 (SWIR 1) 1.55 – 1.75 30 Band 6 – SWIR 1 1.57 – 1.65 30
Band 7 (SWIR 2) 2.09 – 2.35 30 Band 7 – SWIR 2 2.11 – 2.29 30
Band 8 – Panchromatic 0.52 – 0.90 15 Band 8 – Panchromatic 0.50 – 0.68 15
Band 9 – Cirrus 1.36- 1.38 30
Band 6 – Thermal Infrared (TIR) 10.40 -12.50 60* (30) Band 10 – Thermal Infrared (TIRS) 1 10.60 – 11.19 100* (30)
Band 11 – Thermal Infrared (TIRS) 2 11.50- 12.51 100* (30)
* ETM+ Band 6 is acquired at 60-meter resolution. Products processed after February 25, 2010 are resampled to 30-meter pixels. * TIRS bands are acquired at 100 meter resolution, but are resampled to 30 meter in delivered data product.

7. Natural color view (RGB composite)

Due to the introduction of a new “Cirrus” band (#1), the BGR bands are now 2, 3, and 4, respectively. See also “Common band combinations in RGB” for Landsat 7 or Landsat 5, and Landsat 8.

From Digital Numer (DN) to reflectance:
Before creating an RGB composite, it is important to convert the digital number data (DN) to reflectance (or optionally radiance). Otherwise the colors of a “natural” RGB composite do not look convincing but rather hazy (see background in the next screenshot). This conversion is easily done using the metadata file which is included in the data set with i.landsat.toar:

grass7_landsat_toar0
grass7_landsat_toar1
grass7_landsat_toar2
grass7_landsat_toar3

Now we are ready to create a nice RGB composite (hint 2015: i.landsat.rgb has been renamed to i.colors.enhance):

grass7_landsat_rgb0

grass7_landsat_rgb1

Select the bands to be visually combined:

grass7_visualize2

… and voilà !

grass7_landsat_rgb2

8. Applying the Landsat 8 Quality Assessment (QA) Band

One of the bands of a Landsat 8 scene is named “BQA” which contains for each pixel a decimal value representing a bit-packed combination of surface, atmosphere, and sensor conditions found during the overpass. It can be used to judge the overall usefulness of a given pixel.

We can use this information to easily eliminate e.g. cloud contaminated pixels. In short, the QA concept is (cited here from the USGS page):

Cited from http://landsat.usgs.gov/L8QualityAssessmentBand.php‎

For the single bits (0, 1, 2, and 3):
0 = No, this condition does not exist
1 = Yes, this condition exists.

The double bits (4-5, 6-7, 8-9, 10-11, 12-13, and 14-15) represent levels of confidence that a condition exists:
00 = Algorithm did not determine the status of this condition
01 = Algorithm has low confidence that this condition exists (0-33 percent confidence)
10 = Algorithm has medium confidence that this condition exists (34-66 percent confidence)
11 = Algorithm has high confidence that this condition exists (67-100 percent confidence).

Detailed bit patterns (d: double bits; s: single bits):
d – Bit 15 = 0 = cloudy
d – Bit 14 = 0 = cloudy
d – Bit 13 = 0 = not a cirrus cloud
d – Bit 12 = 0 = not a cirrus cloud
d – Bit 11 = 0 = not snow/ice
d – Bit 10 = 0 = not snow/ice
d – Bit 9 = 0 = not populated
d – Bit 8 = 0 = not populated
d – Bit 7 = 0 = not populated
d – Bit 6 = 0 = not populated
d – Bit 5 = 0 = not water
d – Bit 4 = 0 = not water
s – Bit 3 = 0 = not populated
s – Bit 2 = 0 = not terrain occluded
s – Bit 1 = 0 = not a dropped frame
s – Bit 0 = 0 = not fill

Usage example 1: Creating a mask from a bitpattern

We can create a cloud mask (bit 15+14 are set) from this pattern:
cloud: 1100000000000000

Using the Python shell tab, we can easily convert this into the corresponding decimal number for r.mapcalc:

Cited from http://landsat.usgs.gov/L8QualityAssessmentBand.php‎

Welcome to wxGUI Interactive Python Shell 0.9.8

Type "help(grass)" for more GRASS scripting related information.
Type "AddLayer()" to add raster or vector to the layer tree.

Python 2.7.5 (default, Aug 22 2013, 09:31:58) 
[GCC 4.8.1 20130603 (Red Hat 4.8.1-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> int(0b1100000000000000)
49152

Using this decimal value of 49152, we can create a cloud mask:

# set NULL for cloudy pixels, 1 elsewhere:
r.mapcalc "cloudmask = if(LC80160352013134LGN03_BQA == 49152, null(), 1 )"

# apply this mask
r.mask cloudmask

In our sample scene, there are only tiny clouds in the north-east, so no much to be seen. Some spurious cloud pixels are scattered over the scene, too, which could be eliminated (in case of false positives) or kept.

Usage example 2: Querying the Landsat 8 BQA map and retrieve the bitpattern

Perhaps you prefer to query the BQA map itself (overlay the previously created RGB composite and query the BSA map by selecting it in the Layer Manager). In our example, we query the BQA value of the cloud:

Using again the Python shell tab, we can easily convert the decimal number (used for r.mapcalc) into the corresponding binary representation to verify with the table values above.

>>> x=61440
>>> print(bin(x & 0xffffffff))
0b1111000000000000

Hence, bits 15,14,13, and 12 are set: cloudy and not a cirrus cloud. Looking at the RGB composite we tend to agree :-) Time to mask out the cloud!

wxGUI menu >> Raster >> Mask [r.mask]

Or use the command line, as shown already above:

# remove existing mask (if active)
r.mask -r

# set NULL for cloudy pixels, 1 elsewhere:
r.mapcalc "cloudmask = if(LC80160352013134LGN03_BQA == 61440, null(), 1 )" --o

# apply the new mask
r.mask cloudmask

The visual effect in the RGB composite is minimal since the cloud is white anyway (as NULL cells, too). However, it is relevant for real calculations such as NDVI (vegetation index) or thermal maps.

We observe dark pixels around the cloud originating from thin clouds. In a subsequent identification/mask step we may eliminate also those pixels with a subsequent filter.

See also Processing Landsat 8 data in GRASS GIS 7: RGB composites and pan sharpening

Scaling up globally: 30 years of FOSS4G development

Scaling up globally: 30 years of FOSS4G development. Keynote at FOSS4G-CEE 2013, Romania

In my presentation I briefly review 3 decades of Open Source GIS development, from the ’80th to the present.

winGRASS 6.4.0RC2 now available!

The new winGRASS 6.4.0RC2 is now available! It has been packaged into the OSGeo4W installer which also provides QGIS, Mapserver, GDAL and other goodies.

OSGeo4W has two modes – express and advanced:

  • “Express” gives you a short list of suggested packages to select from which have been widely tested.
  • “Advanced” gives a long detailed list of packages and subpackages including development versions and so forth. For the moment GRASS is available in the “advanced” install. After a period of testing it may be moved to the “express” section.

OSGeo4W Installer download:
http://download.osgeo.org/osgeo4w/osgeo4w-setup.exe
Please download it and test. The wizard will guide you through the installation process.

Find GRASS in
Advanced -> … -> Select Packages
All -> Desktop -> grass & tcltk: ActiveTcl wxpython (just click to select) -> Next …

Feedback is appreciated in two “channels”:

OSGeo4W installer related issues:

GRASS related issues:

Update 24-Jan-2009: The package was fixed to support the new wxpython graphical user interface.

BTW: The OSGeo4W team is seeking a new GRASS package maintainer, we hope to find a candidate soon – please speak up.

GDAL/OGR 1.6.0 Released

The GDAL/OGR project team is pleased to announce the release of GDAL/OGR 1.6.0. This release represents 11 months of new feature development since the GDAL/OGR 1.5.0 release, as well as many bug fixes. The release source code is available as described at:

http://trac.osgeo.org/gdal/wiki/DownloadSource

The new release includes the following feature highlights, with detailed information on changes at:

http://trac.osgeo.org/gdal/wiki/Release/1.6.0-News

New Raster Drivers:

  • BLX Magellan Topo Driver (New)
  • EIR Erdas Imagine Raw Driver (New)
  • Oracle GeoRaster Driver (New)
  • GRIB Driver (now part of standard distribution)
  • LCP (FARSITE) Driver (New)
  • Terralib driver (New)

New Vector Drivers:

  • Geoconcept Export Driver (New)
  • INGRES RDBMS Driver (New)
  • XPlane/Flightgear Driver (New)

General:

  • RFC 20: Preliminary support for alternate axis orientations in SRS.
  • On the fly access to data in .zip and .gz compressed files.
  • RFC 22: Support for reading from, and warping with RPC (Rational Polynomial Coefficients) from various formats.
  • Gaussian and Mode resampling added for overview generation.
  • Added raster to vector (polygon) conversion utility.
  • Added raster proximity utility.
  • Added raster sieve filtering utility.
  • Added -expand rgb/rgba ability to gdal_translate
  • Added cutline / blend support to gdalwarp.
  • Added -segmentize option to ogr2ogr to break long line segments.
  • RFC 23: Preliminary support for character set transcoding (in OGR for now).
  • RFC 21: OGR SQL supports type casting and field name aliasing

Building a cluster for GRASS GIS and other software from the OSGeo stack

Nov 2008

Lucky to have (access to) a cluster? Here some notes on how to do geospatial number crunching on it a.k.a. HPC (High Performance Computing).

Preparing the disks
We decided to use the ext3 file system. An initial problem was the formatting of the RAID5 disk set since it exceeded the file system specifications. Then, setting the ext3 block size to 4k instead of 1k we could format it.

Storage: a home for GIS data
The disks are available via NFS to all nodes (blades in our case). All raw/original data sets and the GRASS database are sitting in an NFS exported directory which I even link on my laptop to easily add/access/modify stuff.

Front-end machine and blades configuration
The cluster is a (currently) 56CPU blades system, we’ll expand to 128 CPUs later this year (16 blades with 2 procs a 4 core and 16GB RAM per blade). Additionally, we have a front-end machine to run the job manager and to link in further disks, all via NFS.
The blades are configured diskless, i.e. that once started, they receive their operating system from the front end machine via network (10GB/s ethernet). Like this, we have a single directory on the front end which contains all software, this is then propagated to all blades. Very convenient. We use Scientific Linux (the LiveDVD copied onto the disk, there is a special directory to store your modifications which are then merged in on the fly once you boot the blades, pretty cool concept). The job software is (SUN) Grid Engine, also free software. Job control with GRASS I have described here:

http://grass.osgeo.org/wiki/Parallel_GRASS_jobs
-> Grid Engine

GRASS: Avoiding replicated import of large data files through virtual linking
New in GRASS 6.4 is that you can just register a geodata file on the fly with r.external. Altogether I have 1.4TB of new GIS data from our province, naturally I didn’t want to by a new disk array just for my provincial GRASS location! Here r.external comes handy to minimize the “import” to a few bytes. As expected, it leverages GDAL to get data into GRASS, the overhead is minimal.

Power consumption
Power consumption is measured, too: The entire system consumes around 2000W (each blade less than 200W), so it’s going into the direction of “green” computing (there is no such thing!). If we had a solar panel at least…

Outcome
All in all a very nice solution. I made a stress test and removed all internal switches and shut down the blades while I was processing 8000 MODIS satellite maps. Everything survived and the Grid engine job manager collected the crashed jobs and restarted them without complaining. All resulting maps are collected in the target GRASS mapset and could be even exported to common GIS formats, if needed.
If you want to run Web Processing Services (e.g., pyWPS), you can likewise send each session to a node, giving you enormous possibilities for your customers.

Edit 2014:

“Big data” challenges on a cluster – limits and our solutions to overcome them:

  • 2008: internal 10Gb network connection way too slow
    • Solution: TCP jumbo frames enabled (MTU > 8000) to speed up the internal NFS transfer
  • 2009: hitting an ext3 filesystem limitation (not more than 32k subdirectories allowed but having more files in cell_misc/ as each GRASS GIS raster map consists of multiple files)
    • Solution: adopting XFS filesystem [yikes! …. all to be reformated, i.e. some terabyes had to be “parked” temporarily]
  • 2012: Free inodes on XFS exceeded
    • Solution: Update XFS version [err, reformat everything again]
  • 2013: I/O saturation in NFS connection between chassis and blades
    • Solution: reduction to one job per blade (queue management), 21 blades * 2.5 billion input pixels + 415 million output pixels
  • 2014: GlusterFS saturation
    • Solution: Buy and use a new 48 port switch, ti implement 8-channel trunking (= 8 Gb/s)