Help Us Save Hardwood Paroxysm: a Bloggissist's Plea

EDIT: Call off the dogs. Matt was able to recover everything after a lot of hard work. We can stop cache-hunting now -- although I have to state that I'm pretty impressed we were able to collect over 525 posts in the email inbox of our cache-dump email account. Excellent crowdsourcing. Sorry it ended up being unnecessary, but had the website been unable to be recovered, it was pretty important that we get things from the cache before the caches expired. Thanks to everyone who was a part of this, and my apologies for anyone who feels it was a waste of time.

I woke up today and went to Hardwood Paroxysm, intending to look up an old piece I read every now and then for inspiration. Imagine my surprise when I found, well... nothing. I immediately checked Twitter and heard the news -- server got hacked, entire blog was deleted, things looked grim. Very sad story. I've actually had limited experience trying to recover lost websites before. Specifically, I had a forum I ran in high school whose website was unexpectedly wiped. We tried to save as many posts as we could, but we didn't get much. Most of it (including the tales of Spiderdude, a bro-ified Spiderman knock-off that only a high schooler like me would find funny) was lost to the endless ether of the internet. In trying to recover everything, though, I became at least a little more knowledgable in figuring out how to go about recovering a site when the server-side data unexpectedly vanishes. To the uninitiated, here are two key points to keep in mind.

  • Caches have everything. ... Sort of. There are three main cache servers that spider virtually everything on the web and keep records for varying lengths of time. Google, Yahoo, and the Wayback Machine are my three mainstays -- there are quite a lot more, but those tend to have everything you need (with the others coming into play only later in the process). The process of accessing files cached by Google is simple -- you search for something, hover over it, then click on the "Cached" link that comes up on the right side of the page. As seen below, on the far right side of the image.

  • Time is of the essence. This is why I say "sort of." Caches have a catch. They've got a relatively quick churn rate, and because of this, a webpage that no longer exists only stays cached in Google for a limited amount of time. The time varies based on how popular the website is -- I'm not sure what the algorithm is, exactly, but after a certain amount of time if the webpage no longer exists the Google cache picks up on it and removes the file. The Wayback machine doesn't work like that, however, it picks up historical data quite a bit less often than the Google/Yahoo caches. So it may not be as useful for this exercise.

Why is this relevant? We can still backup Hardwood Paroxysm. There are two ways we can do this -- either through sifting through the RSS feeds of people who don't delete old articles, or by downloading articles based on cache data. I've already started the second process, but given the incredible amount of material amassed by the Hardwood Paroxysm crew, there's absolutely no way I can do it alone. And that's where you come in. After the jump, I outline the ways that you can help save Hardwood Paroxysm's archives and preserve the content of one of the best basketball blogs to ever grace the web. Let's get to it.

 • • •

From my local drive, I was able to save many of the key style elements from HP's page -- the logo, the CSS stylesheets, et cetera. I also was able to save the text of HP's last 10 posts, which is good, because most caches don't seem to have them. When thinking about how to best organize the task of sifting through caches for over a thousand posts, I came across what I think is a relatively good structure for backing up HP before the content churns out of the cache. We start by searching the Google cache for specific HP authors, crowdsourcing the task to one or two authors per person so that work isn't duplicated. We can collect the document text by copying the cached page text into emails and sending them to an email account set up specifically to take old HP articles, so they're saved in a place we know they won't vanish any time soon. Then we can turn the archives over to the HP writers so they can undertake the task of repopulating the blog with content. That may have been a bit hard to follow, so here's an easier to follow instruction manual:

STEP 1: PICK AN AUTHOR

Here is an incomplete list of Hardwood Paroxysm authors, provided to me by Matt Moore in no particular order. Italicized authors are ones whose work is already backed up:

  1. Rob Mahoney -- (Articles to be found by Mogias)
  2. David Sparks -- (Articles backed up.)
  3. Zach Harper -- (Articles to be found by AJ)
  4. Jared Wade -- (Articles to be found by Alex Dewey)
  5. Matt Moore -- (Articles to be found by Iz)
  6. Scott Leedy -- (Articles to be found by Adam Koscielak)
  7. Curtis Harris -- (Articles to be found by Jordan White)
  8. James Herbert -- (Articles backed up.)
  9. Jovan Buha
  10. Steve McPherson -- (Articles backed up.)
  11. Sean Highkin -- (Articles to be found by Jordan White)
  12. Danny Chau -- (Articles to be found by Blake Potosh)
  13. Connor Huchton -- (Articles to be found by Moglas)
  14. Jared Dubin -- (Articles backed up.)
  15. Jon Nichols
  16. Amin Vafa
  17. Eric Maroun -- (Articles backed up.)
  18. Noam Schiller -- (Articles to be found by Ian Dougherty)
  19. Conrad Kaczmarek -- (Articles backed up.)
  20. Andrew Lynch -- (Articles backed up.)
  21. Joey Whelan -- (Articles to be found by Aaron McGuire)
  22. Josh Tucker -- (Articles to be found by Tim)

 

Please comment on this post with your name and the name of an author, if you'd like to take the task of helping to back up their work. We'll put your name here, so that nobody else duplicates your work in getting their articles back. Once you've got an author, move on to Step 2.

STEP 2: LOOK UP THEIR WORK

Go to Google and search for the following string:

site:http://www.hardwoodparoxysm.com "[author name]"

This should bring up several pages of results, with each of their posts on Hardwood Paroxysm.

STEP 3: EXAMINE THAT CACHE, DOGGS

So, you have a list of articles. By hovering over the article link, you'll get a menu on the right side of the screen with a screenshot of what the article once looked like and a link that reads "Cached". Click the link. You may notice that it takes a long time to load -- if that's the case, just click through to the text-only version, which'll be in the right corner of the box on the top of the screen. Like this:

STEP 4: EMAIL 

This is the important part. Copy the text of the article -- including the title, author, and date-stamp -- into your email program. Then send it to savehp@gothicginobili.com, with the email titled as so:

[Article Author] - [Article Title]

Then go back to the original google search from step two, go to their next article, and repeat.

 • • •

It's a tedious, mind-numbing process. But it's probably the easiest and most organized way to get HP's content stored before it gets churned out from the cache. If anyone has better ideas, I'd definitely be up for editing this post or altering the strategy. But I thought it'd be good to start something now, before articles start dropping like flies and the cache gets emptied. It's all at our fingertips right now, if we can organize enough and get it before it goes. This whole thing sucks, but hopefully we can minimize the damage and recover as much as we can. Good luck, campers.

37 comments on “Help Us Save Hardwood Paroxysm: a Bloggissist's Plea

  1. I am going through my own archive and have all of that covered. Huge thanks to everyone for helping out with this today.

  2. I have all of my stuff in more or less the form it takes when posted on my computer, so don't make my articles a priority. Primarily what I'd have to do is find image files again and relink to some videos. And I sometimes change stuff a bit once I've copied it into the CMS and read it, but that's no big deal.

    • Thanks a ton. The database was restored, so the exercise in crowdsourcing can end, but thanks a lot for all your help.

    • It looks like Matt was able to get the host company to restore the database, so we can stop the cache hunting. Thanks a TON for all the work. A successful crowdsourcing requires a bunch of people who are willing to do stuff and I'm very thankful there were dudes like you around to grab this stuff, even if it ended up being a bit superfluous to the final solution.

    • It looks like Matt was able to get the host company to restore the database, so we can stop the cache hunting. Thanks a TON for all the work, you recovered a hell of a lot of posts haha.

  3. If people can help out with recovering picture and video files, that would be a huge help. There must be thousands of those floating around.

  4. I'd totally love to help, but I sadly lack the time. Instead, I'll just dish the link around and see if anyone else can lend a hand. The awesomeness that is HP must not disappear.

  5. I would love to help, however, i am busy until sunday afternoon with school and work. I seriously almost cried when i went to HP today.

  6. If I remember correctly, Trey Kirby and Holly Mackenzie both wrote for HP at some point too. I'm sure there are a few other writers we're missing.

  7. David Sparks is done but I came across him mentioned in a Josh Tucker article he seems to have quite a few entries, if you want to add him to the list I can do him but it might take me awhile.

        • Belay that -- it looks like Matt was able to get his web host to restore the pre-hacked database. The Cache hunting can end, because all the posts are there. Thank god. Thanks a ton for all the time spent on this project.

          • I have to say, you did a great job with this project. This really could've been necessary, and maybe it still will turn out to be helpful to HP.

            But seriously, who uses "belay" outside of a military order or a business meeting?

            "So you'll have fries, a burger and a sundae, then?"
            "On second thought, belay that sundae."
            "With what? Sprinkles? M&Ms? Sprees? You remember Sprees? Sour as shit, but still chewy? Is that what you want me to belay your sundae with?"

  8. Pingback: Smoke These Joints: Week of April 15 | I GO HARD NOW

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>