Parse the Craigslist stream.

by James on September 14, 2010

So I am trying to find a house to rent.  Everyday I am checking Craigslist, but this gets old fast!  So finnaly I am like… I can write a script for that.  I fire up a bash prompt and get to work.

Here is what I came up with. It is a simple scripts that curls an rss feed, strips out all the html mumbo jumbo, looks for a list of specific cities that I am willing to move to, saves my results to a file, and then finally emails me the list . The next time the script runs it checks to make sure that it does not email me the same results. I am using mutt to send an email from the bash script.


#!/bin/bash

axold=0

if [ -f update.txt ];then
   rm -rf update.txt
fi
# The rss feed for my house search
rss="http://sandiego.craigslist.org/search/apa?query=house&srchType=A&minAsk=&maxAsk=&bedrooms=2&format=rss"

curl -s "$rss" | grep -E '<item rdf:about=|<title>' | sed s'/<item rdf:about="//'g \
        | sed s'/<title><!\[CDATA\[/ /'g | sed s'/">//'g  \
		| sed s'/]]><\/title>//'g \
		| sed s'/<title>craigslist san diego | apts\/housing for rent search &#x22;house&#x22;<\/title>/ /' \
		| sed -n -e ":a" -e "$ s/html\n/html /gp;N;b a" \
		| grep -i 'ocean beach\|point loma\|pacific beach\|southpark\|south park\|oceanbeach' > tmp.txt


# 1st loop for checking the new listings againts the old
while read newline
	do

    # 2nd loop loads the list of old matches to check
	while read line   
		do   
		   if [[ "$newline\n" == "$line\n" ]];then
				axold=1
				break
			fi
	done < oldlist.txt

	if [ $axold != 1 ];then
		echo "$newline" >> update.txt
		echo "$newline" >> oldlist.txt
	fi
	
	axold=0
	
done < tmp.txt

rm -rf tmp.txt

#send email if update.txt exists
if [ -f update.txt ];then
   mutt -n -s "New Craigslist Post"  -- "james@zlabx.com" < update.txt
fi

rm -rf update.txt

Now lets make it check craigslist every 15mins.

[zlabx]$ crontab -e
  */15 * * * * /path/to/my/script/housesearch.sh

One comment

Hey!

This was super useful. I made a few modifications and posted it to github: https://github.com/ohmarchitect/clwatch

Thanks,
-OA

by Ohm Architect on June 28, 2014 at 8:27 am. Reply #

Leave your comment

Required.

Required. Not published.

If you have one.