Spatial proximity searching using lat/longs

posted July 12th 2006 at 0434 EDT in All, Articles, Python, Programming, mysql, SQL

This article covers how to search a database via latitude and longitude to find matches based on spatial proximity (or distance). This code is for mysql (also some in python), but it should be easily adaptable to other languages.

The example below will center around searching for information based on zip code data. You can download a file with zip codes mapped to latitude and longitudes at cfdynamics. I have a sample loadZipData.py which I used to populate my database table from the text file.

So we start out with a zip code database table


CREATE TABLE `zip` (
  `zip` varchar(12) NOT NULL DEFAULT '',
  `latitude` float NOT NULL DEFAULT '0',
  `longitude` float NOT NULL DEFAULT '0',
  `city` varchar(50) DEFAULT NULL,
  `state` varchar(50) DEFAULT NULL,
  `county` varchar(50) DEFAULT NULL,
  `zip_class` varchar(50) DEFAULT NULL,
  PRIMARY KEY  (`zip`)
);
 

For simplicity i'll query the database for a single zip code so we have a latitude and longitude to work with


>>> results = db.getSql("select * from zip where zip = '21401' ")
>>> print results[0]
>>> {'city': 'ANNAPOLIS', 'zip': '21401', 'zip_class': 'STANDARD', 'longitude': -76.610100000000003, 'county': 'ANNE ARUNDEL','state': 'MD', 'latitude': 38.967300000000002}
 

Now that we know the latitude and longitude of annapolis maryland, we can construct a query to find things near there. (This query below will work for any lat/long pair it obviously doesn't have to be the one for my hometown above)

The formula (src) to find a distance is as follows


degrees(acos(
        sin(radians(latitude))
        * sin( radians(:1))
        + cos(radians(latitude))
        * cos( radians(:1))
        * cos( radians(longitude - :2) )
        ) ) * 69.09)
 

Where :1 is the parameter of the latitude we are searching near, and :2 is the longitude we are searching near. We will use truncate(value,1) to get our data back in the format 4.2 miles (ie: with one decimal place).

We will also use the sql phrase HAVING distance < 5 to filter our matches to only those zip codes within 5 miles of our search target.

This yields the following python code to make a mysql query



>>> matches = db.getSql("""
SELECT zip,city,state,
truncate((degrees(acos(
        sin(radians(latitude))
        * sin( radians(%s))
        + cos(radians(latitude))
        * cos( radians(%s))
        * cos( radians(longitude - %s) )
        ) ) * 69.09),1) as distance
FROM zip
HAVING distance < 5
ORDER BY distance"
"" , ( results[0].latitude,
results[0].latitude, results[0].longitude))
>>> len(matches)
27
 

So we see that there are 27 other zip codes within 5 miles of annapolis. (ok, 26 when you subtract the one we started with).


>>> for z in matches: print z.zip,z.city+",",z.state,z.distance, "miles away"
...
21401 ANNAPOLIS, MD 0.0 miles away
21061 GLEN BURNIE, MD 0.3 miles away
20751 DEALE, MD 0.7 miles away
21122 PASADENA, MD 0.7 miles away
21037 EDGEWATER, MD 0.9 miles away
21062 GLEN BURNIE, MD 0.9 miles away
21098 HANOVER, MD 0.9 miles away
21123 PASADENA, MD 0.9 miles away
21404 ANNAPOLIS, MD 0.9 miles away
21411 ANNAPOLIS, MD 0.9 miles away
21412 ANNAPOLIS, MD 0.9 miles away
20764 SHADY SIDE, MD 1.1 miles away
21146 SEVERNA PARK, MD 1.1 miles away
21113 ODENTON, MD 1.3 miles away
21144 SEVERN, MD 1.3 miles away
21106 MAYO, MD 1.5 miles away
21077 HARMANS, MD 1.7 miles away
21108 MILLERSVILLE, MD 2.0 miles away
20776 HARWOOD, MD 2.4 miles away
21403 ANNAPOLIS, MD 3.0 miles away
20779 TRACYS LANDING, MD 3.2 miles away
20778 WEST RIVER, MD 3.4 miles away
20765 GALESVILLE, MD 3.5 miles away
21225 BROOKLYN, MD 3.5 miles away
21032 CROWNSVILLE, MD 3.7 miles away
21012 ARNOLD, MD 4.4 miles away
21140 RIVA, MD 4.5 miles away
 

And thats how the searches are done in my new Motorcycle Classified Ad website.

9 Responses

  1. #1 Tom Chivertont
    2 years, 4 months ago

    If you are lucky enough to have oracle, they have a spatial pack you can buy that offers native functions for this and more

  2. #2 PaulH
    2 years, 4 months ago

    since you’re using centroids of zipcode polygons, what you’re doing is kind of coarse. if you dont care that much, you can simplify this greatly by using a simple bounding box search using the original search coords +/- your seach radius as the min/max for your WHERE clause. or more siomply put a box instead of a circular buffer.

    postgreSQL & postGIS is a far better choice for spatial data. btw did you know that mysql now has spatial functionality? it’s pretty primitive but it has it.

  3. #3 Javier Arturo Rodriguez
    2 years, 4 months ago

    PostgreSQL does spatial querying as well using custom data types. I used it for a ViendoMexico.com prototype.
    Nice query, though.

  4. #4 jehiah
    2 years, 4 months ago

    @PaulH : Yes a spatial database would be better if there is a heavy need for processing spatial data. You also have a good point that a simple box’d area search would be simpler. Hopefully the solution I proposed is exactly between those two. (thats what I needed anyway - an accurate search that worked in my existing database setup).

    The zip code example is what I needed, but as we all know; thats not the only lat/long data that people need to search; it’s just a common way to do it. After all, who knows the lat/long of their home town?

  5. […] Jehiah (of relative date function fame) now has some information about spatial proximity searching: This article covers how to search a database via latitude and longitude to find matches based on spatial proximity (or distance). This code is for mysql (also some in python), but it should be easily adaptable to other languages. […]

  6. #6 web-ology.com » links for 2006-08-12
    2 years, 3 months ago

    […] Spatial proximity searching using lat/longs (tags: python maps database sql mysql proximity distance howto programming) […]

  7. #7 Jon-Carlos Rivera
    2 years, 2 months ago

    Good article! I used the exact same formula in an e-commerce site I developed about a year back only I had optimized the query a bit more. You have inspired me to write a small article on the method I ended up using in mysql. If you are interested in a simple method to increase in query speed check out my article at: http://www.voxclandestina.com/2006-09-01/optimizing-spatial-proximity-searches-in-sql/

  8. #8 Rick MacConnell
    1 year, 11 months ago

    Thanks for the example. I was able to use it to do a proximity search in mysql with no problems. I might try to optimize it to preselect based on a bounding box and then do the distance calculation from that subset if performance becomes an issue.

  9. #9 Lucas
    1 year, 5 months ago

    For certain queries - this calculation results to NULL. Try the following lat & long in this calculation:

    80017 | +39.702142 | -104.806670

    I had to bracket the whole ‘truncate’ statement with an IFNULL() to get it to work. Strange but true.

Leave a comment