Gem to import a WordPress XML dump into your Rails app
Go to file
Will Bradley 05bf79536d Updating shortcode gem version 2014-03-18 14:09:38 -07:00
lib Adding wordpress shortcode support 2014-03-17 23:27:02 -07:00
spec Beginning to remove refinery stuff 2014-03-04 15:47:53 -07:00
support/templates/haml Adding wordpress shortcode support 2014-03-17 23:27:02 -07:00
.gitignore Image import working 2011-06-13 15:02:40 +02:00
.rspec Conversion in a gem finished 2011-06-02 14:41:31 +02:00
.travis.yml Trying to test under multiple ruby versions 2011-08-12 18:57:41 +02:00
Gemfile Updating shortcode gem version 2014-03-18 14:09:38 -07:00
Gemfile.lock Updating shortcode gem version 2014-03-18 14:09:38 -07:00
Guardfile Updated to latest refinerycms(-blog) 2011-08-12 18:44:00 +02:00
MIT-LICENSE Beginning to remove refinery stuff 2014-03-04 15:47:53 -07:00
README.rdoc Updating with more rake tweaks and some post customizations 2014-03-11 17:57:28 -07:00
Rakefile Removed RDoc rake tasks (not needed for now) 2011-07-27 14:07:54 +02:00
wordpress-import.gemspec Updating gem version 2014-03-17 23:28:50 -07:00

README.rdoc

= Wordpress-import

This little project is an importer for WordPress XML dumps into Rails.

It's been somewhat customized for one particular project; you probably want to fork this and modify it to fit your app's schema.

It's a fork of Marc Remolt's Refinerycms-wordpress-import ( https://github.com/mremolt/refinerycms-wordpress-import )

You can find the source code on github: https://github.com/zyphlar/wordpress-import

Keep in mind that links to other pages of your blog are just copied, as WordPress exports them as <a>-Tags. 
If your site (blog) structure uses new urls, the links WILL break! For example, if you used 
the popular WP blog url structure "YYYY-MM/slug", be warned that Refinery just uses "blog/slug". 
So your inner site links will point to the old WP url. 

== Prerequisites

TODO


== Installation

Just add the gem to your projects Gemfile:

  gem 'wordpress-import'

Or if you want to stay on the bleeding edge: 

  gem 'wordpress-import', :git => 'git://github.com/zyphlar/wordpress-import.git'

and run

  bundle


== Usage

Importing the XML dump is done via rake tasks:

  rake wordpress:reset_blog 

This one basically deletes all data from blog relevant tables (taggings, tags, blog_comments, 
blog_categories, blog_posts, blog_categories_blog_posts). 
Use this one first, if you want a clean import of your old blog. 

  rake wordpress:import_blog[file_name] 

This one does all the heavy work of parsing the dump and importing the data into refinery tables. 
The parameter is the path to the dump file. Got a report from a Mac user, that the ~
didn't work in the path. I'll have a look at it, but till then, don't use it please. 

If you don't want to import draft posts, you can set the ENV variable ONLY_PUBLISHED to true:


  rake wordpress:import_blog[file_name] ONLY_PUBLISHED=true

The task will then skip all posts that are not published.

  rake wordpress:reset_and_import_blog[file_name]

This one combines the two previous tasks. 

If you also want to import the cms part of WordPress, three more rake tasks manage
the import into RefineryCMS Pages:

  rake wordpress:reset_pages

This task deletes all data from the cms tables, ensuring a clean import. Otherwise existing 
pages could break the import because of duplicate IDs. 

  rake wordpress:import_pages[file_name] 

This task imports all the WordPress pages into Refinery. The page structure (parent - child)
is preserved. 

If you want to skip the draft pages, add the ONLY_PUBLISHED parameter to this task, 
just like with wordpress:import_blog.

  rake wordpress:import_pages[file_name] ONLY_PUBLISHED=true

If you want to clean the tables and import in one task:

  rake wordpress:reset_and_import_pages[file_name]

Finally, if you want to reset and import all data including media (see below):

  rake wordpress:full_import[file_name]


== Importing media files

The WP XML dump contains absolute links to media files linked inside posts, like:

www.mysite.com/wordpress/wp-content/uploads/2011/05/cv.txt

The dump does NOT contain the files itself! To get them imported, this gem downloads the files 
from the given URL and imports them to refinery. So for a working media import the old site with
the media URLs must still be online. 

After importing the files, this gem replaces the old links in pages and blog posts with the 
new generated ones. It parses all existing records searching for the right pattern. That
means, you have to import pages and posts FIRST to get the URLs replaced. 

Now to the rake tasks for media import: 

  rake wordpress:reset_media

This task deletes all data from the media tables (images and resources), ensuring a clean import.

  rake wordpress:import_and_replace_media[file_name] 

This task imports all the WordPress media into Refinery. After the import it parses all
pages and blog posts, replacing the legacy links with the current refinery ones.

If you want to clean the tables and import in one task:

  rake wordpress:reset_import_and_replace_media[file_name] 

== Usage on ZSH

One more hint for users of zsh (like myself): 

The square brackets following the rake task need to be escaped on zsh, as they have a 
special meaning there. So the syntax is:

  rake wordpress:reset_and_import_blog\[file_name\]

Ugly, but it works. This is the case for all rake tasks by the way, not just mine. 


== Feedback

This is still a very new gem. It manages to import my own blog and a standard WordPress 3.1 dump with some sample posts. 
The first feedback is quite good, so it seems, the gem doesn't eat the machines it is installed on. 

If you want to help make it even more stable, please throw your own WP dumps against it 
and see what happens. If you encounter any bugs, please file a bug report here on github.
A sample dump that breaks this gem would be really helpful in that case. 

For extra karma, fork it, fix it yourself and send a pull request! ;-)