Parse the Craigslist stream.
by James on September 14, 2010
So I am trying to find a house to rent. Everyday I am checking Craigslist, but this gets old fast! So finnaly I am like… I can write a script for that. I fire up a bash prompt and get to work.
Here is what I came up with. It is a simple scripts that curls an rss feed, strips out all the html mumbo jumbo, looks for a list of specific cities that I am willing to move to, saves my results to a file, and then finally emails me the list . The next time the script runs it checks to make sure that it does not email me the same results. I am using mutt to send an email from the bash script.
#!/bin/bash axold=0 if [ -f update.txt ];then rm -rf update.txt fi # The rss feed for my house search rss="http://sandiego.craigslist.org/search/apa?query=house&srchType=A&minAsk=&maxAsk=&bedrooms=2&format=rss" curl -s "$rss" | grep -E '<item rdf:about=|<title>' | sed s'/<item rdf:about="//'g \ | sed s'/<title><!\[CDATA\[/ /'g | sed s'/">//'g \ | sed s'/]]><\/title>//'g \ | sed s'/<title>craigslist san diego | apts\/housing for rent search "house"<\/title>/ /' \ | sed -n -e ":a" -e "$ s/html\n/html /gp;N;b a" \ | grep -i 'ocean beach\|point loma\|pacific beach\|southpark\|south park\|oceanbeach' > tmp.txt # 1st loop for checking the new listings againts the old while read newline do # 2nd loop loads the list of old matches to check while read line do if [[ "$newline\n" == "$line\n" ]];then axold=1 break fi done < oldlist.txt if [ $axold != 1 ];then echo "$newline" >> update.txt echo "$newline" >> oldlist.txt fi axold=0 done < tmp.txt rm -rf tmp.txt #send email if update.txt exists if [ -f update.txt ];then mutt -n -s "New Craigslist Post" -- "james@zlabx.com" < update.txt fi rm -rf update.txt
Now lets make it check craigslist every 15mins.
[zlabx]$ crontab -e */15 * * * * /path/to/my/script/housesearch.sh
One comment
Hey!
This was super useful. I made a few modifications and posted it to github: https://github.com/ohmarchitect/clwatch
Thanks,
-OA
by Ohm Architect on June 28, 2014 at 8:27 am. #