About a year ago, I decided I wanted to visit Zion National Park with my brother. I thought planning a May trip to a national park two months in advance was very early and should be enough time to get the back country and campsite permits I wanted.
I was wrong. Yet, I still found a way to book 3 nights at Watchman campsite and have the trip I wanted with Node, Casper, and MongoDB.
Back to the story
Everything was fully booked up: Watchman campground, Narrows overnight, Buckskin Gulch overnight. I learned the hard way that if you want to do any national park backcountry hikes, 6 months to a year in advance is more appropriate.
Despite this setback, I was really determined to figure out a way to at least stay at the Watchman campsite. I had this idea that since everything was booked up so far in advance, maybe there would be a few cancellations. Looking over their reservation system, I saw no option to be notified if somebody cancels their reservation. I could check the website every chance I could and hope that I would stumble across a cancellation, but I thought that was leaving too much to chance.
Then it struck me: What if I wrote a program to check the permit system every 5 minutes and email me if it detected an empty slot on the days that I wanted?
Writing the program
Since there was no public API for what I wanted to do, I knew that some sort of headless browser would be the solution. I chose Casper since I had heard a lot about it before, but looking back on it, a headless browser that worked more seamlessly with Node like node-horseman or zombie would have been more appropriate.
My first step was to reproduce in Casper what I was doing manually check for cancellations on the NPS Zion Narrows website. If you click around, you’ll quickly find that trying to find out if there are permits available really sucks. You have to click through 5-6 hard refreshes to figure that out. The system for NPS campsites, recreation.gov, isn’t much better.
I then looked at the HTML structure of the pages to properly parse for the information I was looking for. It was what you’d imagine — tables nested within tables nested in divs with maybe a class or id thrown in, but never for the specific container you want.
Once I got the initial process working, I started integrating this Casper job into a node server. I found a neat cron-like Node task scheduler called node-schedule that works quite nicely. It was during this integration that I resorted to spawning the Casper task as a child process and capturing the output as stdout. The next time I write a program like this I’d probably choose a different headless browser framework so I could avoid that.
Since I’m paranoid, I wanted to make sure that the program was still running at any given moment so I decided to save the results to a Mongo database and added some routes to see the results of the last few iterations of the program.
The first alert email I received alerted me to an availability of Watchman campsite for the dates that I wanted! Within 5 minutes of receiving the email I booked it. My brother and I had a great time. Unfortunately, I never received an alert for the Narrows or Buckskin Gulch overnight permits because none became available in the online system.
You can check out the code I wrote at my github repo zion-scraper.
Ideas for improvement
- Use a headless browser that blends more seamlessly with Node.
- Refactor the code to be less specific so that it could be used to search for any campsite for any date for the websites I learned how to scrape.
- Refactor to remove the need to store web scraper state in the global object with the use of more functional programming (pure functions).
- Better error handling for situations when the structure of the web page changes.