Warning to regular readers of this blog: SEVERE Nerd Alert.
A lot of folks I know who started their blogs out on Blogger have used HaloScan for commenting since before Blogger implemented comments. Since HaloScan is shutting down in the next few days, you’d think you might want to move all your old comments to Blogger.
There’s really no practical reason why someone at Blogger can’t write some sort of comments parser to handle the XML files that HaloScan spits out, but so far, they haven’t. If you want to get it done right now, the only way I found to make it work is a ridiculously cumbersome process.
Basically, that process is to import everything into a WordPress blog where it can all be properly combined, then re-export it, run it through python script, and upload it back into Blogger.
I’ve decided to write up the entire procedure I went through both as an exercise in writing documentation and in order to help anyone else who’s crazy enough to want to try this. If you think you have the patience for this (or would just like to see exactly how insane I am), hit the “read the rest” link that follows.
I will warn you that there’s a pretty decent degree of difficulty on this: There is at least some command line usage involved. There is setting up of a local host on your computer (albeit a dead-easy to use one). There is a LOT of trial and error in this process, and you have to be comfortable with recognizing when things just didn’t work and you need to start over, or at least take several steps back.
This is also very time consuming. The main reason I had time to futz with all this is that I am currently unemployed.
Also, be sure to read this all the way through these instructions and check out the known issues before you get started. There might be a dealbreaker in there, and I’d really hate for anyway to get halfway through this ridiculous process and realize that they wasted half a day for something they can’t use.
I’d like to say up front that this would absolutely not be possible without the work of Justin Watt (you’ll see why in steps 4/6), and if this works, you should totally donate to his beer fund.
If you can accept all those caveats, here are the instructions I’ve put together based on how I got this to (finally) work. I’ve tried to make it as clear as possible, but some of this stuff gets pretty complicated.
Step 1: Install XAMPP on your computer.
XAMPP is a free local server with PHP and MySQL tools built right into it, and which works on Windows, OS X, and Linux (and Solaris if you REALLY want to get out there).
You can also run it as a webserver, but for the purposes of this set of instructions, I’m actually keeping it off-line so that the blog I’m doing this on for a friend of mine remains unpublished (since he only allows selected readers on Blogger).
Step 2: Install WordPress on your XAMPP Local Host.
Great set of instructions here for Windows XP. The main difference for setting it up for OS X is actually in the installation of XAMPP, which the XAMPP website covers pretty simply. Note that when you’re in Applications > XAMPP folder, you’ll see a shortcut to the “htdocs” folder that you’ll want to dump all the WordPress stuff into.
One thing I did notice when I did a Get Info on it is that the “htdocs” folder is marked read-only for “everyone”, and you’ll want to make sure it’s marked read/write so that your XAMPP server can access it.* On the Mac, hit command-I to Get Info on the folder, then at the bottom of the window that opens up you’ll see dropdown menus that will allow you to change the permissions easily.
*- Again, I’m assuming you will NOT be putting the WordPress workaround on the Web, because there are huge security issues with marking a file as read/write for everyone on a live server, and I would STRONGLY recommend against doing this if you’re working with a live server.
On the sidebar of WordPress’s admin page, there’s a Tools > Import feature, and one of the types of blogs you can choose is Blogger. You’ll have to sign in with your Google Account to authorize the import, but once you’ve done that, the rest of the process is automated.
I encountered two minor issues with the importer. The first was that there were about 15 or so posts that didn’t come over, which had to be manually re-added. Out of 2300, I was pretty much okay with that, but going through and figuring out which posts were missed was kind of a pain in the ass.
The second issue was that for some reason the WordPress tool to import from Blogger pulled an extra “>” in at the beginning of every. single. post. from a BlogSpot blog. It’s a little annoying, but it’s also kind of good as an indicator of what posts have been imported and/or reimported.
I will note – about a year ago I imported the blog you’re reading now to WordPress from a Blogger blog about I’d been publishing via FTP for years, and didn’t have the “>” issue. Don’t know if it’s a new bug in the import tool or if it’s something to do with BlogSpot, but the issue was there.
Step 4: Make sure all your Blogger posts have the Post Number somewhere in them.
The easiest way to do that is to go into your Blogger template add the following bit of code in right after the <$BlogItemBody$> string:
<font color=”[your blog’s background color]”>postID=<$BlogItemNumber$></font>
Making it the same color as your background will make it visible to the script that needs to pull the post ID number, but invisible to anyone actually looking at your site (unless they happen to highlight it). I tried doing this as an anchor but the script wasn’t able to pull it, it’s got to be right in the actual post.
If you don’t mind the postID for every post being visible to your readers while you perform all this nonsense, you can just put in postID=<$BlogItemNumber$> .
This page has two totally invaluable PHP scripts for this process written by Justin Watt: “wp-get-blogger-post-IDs” and “import-haloscan.” Run step 2 in that page’s instructions to download, install, and run the “wp-get-blogger-post-IDs” script. This will pull in all your post IDs that you set up a minute ago so that the comments can be matched to the appropriate post.
Here’s the bad news if you have a protected blog: You WILL have to make your Blogger blog temporarily available to anyone if it’s not already through the Settings > Permissions tab on Blogger.
You don’t have to allow any search engine indexing or anything, it just takes the “this blog is open to invited readers only” wall down temporarily, so unless someone knows your URL and specifically goes to look at it in the short bit where the wall is down, there’s nothing to be concerned about.
The good news is that Blogger automatically preserves your readers list so that the second you’re done getting all the info you need, you can immediately turn that protection back on.
NOTE: I did notice when running the PHP scripts on the XAMPP local server that that they can be a little slow, so your wall of protection may need to be down for up to an hour or more, depending on how many posts you have.
Step 6: Put your HaloScan Export files in the “htdocs/wordpress” file and number them sequentially.
If you have more than one HaloScan export file, go ahead and number them sequentially so they can be imported as “export1.xml”, “export2.xml”. Then place those files in your main htdocs/wordpress file.
I will note, when I imported 1400 comments to over 2 exports to this blog via WordPress, it did it just fine, but it choked on trying to import all 8,000+ comments at once on the blog I’m working with on this giant mess.
For my friend’s blog, I wound up just importing each export file one at a time, throwing all the others in a folder I marked “exports” so I knew where they were, but the script would ignore them. You can keep them numbered sequentially so you can keep track of which ones you’ve already imported, just only have one at a time in the main “wordpress” folder.
Step 7: Run import-haloscan.php.
Remember this page where you got the “wp-get-blogger-post-IDs” script? Well, the second script on that page, import-haloscan.php, is the second piece of this, located in that page’s Step 3. Follow that page’s instructions on how to download, install, and run that script.
The “import-haloscan” script takes all the post IDs you brought in and matches them up with the comments in your HaloScan Export file(s).
Step 8: Check and make sure your comments imported correctly.
Make sure the number of comments you imported for each post matches up. You may have a very few missing comments – When I did it for this blog, I lost 7 out of around 1400 comments, and frankly, I’d rather have 99.5% than none.
However, if you’re missing a ton of comments, then you might want to try deleting all the comments (which can be done in bulk from the “comments” tab on the sidebar) and reimporting each export file one at a time.
Step 9: Export from WordPress.
Now that you’ve gotten all your posts and comments linked up and in one place, it’s time to start getting them back over to Blogger.
Go to Tools > Export, and hit the “Download Export File” button. All your posts and their comments will export as a big XML file to your default download directory.
Step 10: Download the Google Blog Converters App Engine.
The Data Liberation Front has put together a series of Python scripts that will translate the XML WordPress puts out into something that Blogger can understand. You can download a big old folder of scripts from their Google Code page.
Note that you do need to have a recent version of Python installed for it to work, but most recent OS’s come with a version that will work pre-installed. If you don’t have Python installed, here’s a link to the Python site which will give you more info on how to make that happen.
Step 11: Fire up your command line.
On OS X, Terminal works fantastically for this because you can just drag and drop the files you need.
Once Terminal is up and running, drag the “wordpress2blogger.sh” script from the “bin” file in the big downloaded file o’scripts into the terminal window. You’ll see a plus sign to let you know that the script is able to be added, and then the script’s name will just show up in the window.
Then, drag in the XML document that exported from WordPress into the terminal window using the same procedure. Once both are added, hit enter. The script will think for a minute, then spit out an enormous amount of text into the terminal window.
Step 12: Create the document to upload to Blogger.
Edited to add 02.18.10: Excellent tip from Kevin in the comments that will allow you to skip part of this step:
When executing the command line version, you can automatically capture the terminal output instead of letting it scroll by and then re-selecting/editing. On any *nix system like OSX or linux you just redirect the output into a file with “>”.
sh wordpress2blogger.sh > mynewfile.txt
Back to our regular programming….
In Terminal, go to Shell > Export Text As. This will export everything in the Terminal window as a .txt file. However, you’ll need to go in and do a couple things before it’s ready for upload. If you’re using another command line interface, you can also just do a select all on all text and paste it into a blank document.
Open this .txt document in your favorite text Editor – I prefer TextWrangler because it’s got an option to soft-wrap text so you don’t have to scroll sideways for miles.
At the top of the document, select everything before the “<?xml version=’1.0’…” and delete it, since that’s just stuff that was only relevant to the terminal.
Go down to the very bottom of the document, and make sure you delete the “[your username]’s-Computer:~ [your username]$”. This is also only something that is useful to the Terminal.
Once you have deleted both of these items, do a Save As… and make sure to save it as a .xml file, and use a name that will allow you to distinguish it as the file that needs to be uploaded to Blogger, like “WordPress Export For Upload To Blogger.xml”.
Step 13: Upload the file to Blogger…in a test blog.
I would strongly, strongly recommend setting up a test blog before you re-upload everything to your main blog, since in order to do so, you’re basically going to have to nuke your main blog.
I set up a test blog on BlogSpot that I restricted so that only my friend and I could see it, then uploaded the XML file generated in Step 12. This allowed me to check that all the posts and comments had made it over – Which was good because the first time I tried it, I realized I’d screwed something up and managed to only import comments prior to 2004, and had to go back several steps.
If your upload succeeds and everything looks good in your test blog…
Step 14: Backup, then nuke the content on your main Blogger Blog.
Again, I cannot emphasize enough: Backup, backup, backup. Things go sideways. You want a backup of everything. To backup your Blogger Blog, go to Settings > Basic and at the top there’s a link to Export Blog. Click that, and then click the big old button that says “Download Blog.”
Make sure you note where that file is and possibly rename it something like “Backed up main blog” so you can find it if things go wrong.
Once you have that file completely downloaded, you will need to delete all your existing posts so that you don’t wind up with either a) duplicates or b) posts with comments which won’t import because they were marked as duplicates.
To do this go to Posting > Edit Posts. Click on Select All, and you’ll be told that you’ve selected all the visible posts on the screen, and asked if you’d like to select all [however many] posts you have. Click to select all [however many] of your posts, then scroll down to the bottom of the screen and click “Delete Selected.”
Your template will be unaffected, this will just get rid of all your content (Don’t panic, we’re bringing it back in with…)
Step 15: Upload the file to your main Blogger blog.
If you got it working for your test blog, this should work for your main blog. You may have to remove some residual HaloScan commenting code (and add some Blogger code back in) from your template to get all the comments to show up properly, but you should be good to go, except for the Known Issues listed below.
1. This only works with comments that have actually been exported by HaloScan – Once you’ve upgraded to Echo, it spits out a totally different type of XML file that cannot be read by the “import-haloscan” script and unfortunately I’m not enough of a code monkey yet to remedy this myself.
2. The Python parser to go from the exported WordPress to your Blogger re-upload seems to only parse the GMT dates/times that WP spits out, not the actual times stuff was posted (the WP-generated export file contains both pieces of data). Depending on where you live, you can wind up with all your posts up to 12 hours off. For me and my friend, this wasn’t a big issue, but for some people whose blogs are more timestamp-sensitive, this may be a dealbreaker.
3. If you have a blog with a restricted readership, be sure to note that in Step 5 you will need to make it temporarily available to everyone.
4. Two minor issues with WordPress’s Blogger Import tool (failing to import a very few old posts; randomly adding a “>” to every single imported post from BlogSpot) are detailed in Step 3.
I am absolutely open to suggestions of how I could have done this more easily, but I did quite a bit of digging around and couldn’t even find instructions for a process this ridiculous and cumbersome, let alone anything simpler.
Hope this helps a few people out, or at least inspires the folks at Blogger to finally put together a HaloScan comment importer. Because this method is completely insane.