141 lines
4.8 KiB
Plaintext
141 lines
4.8 KiB
Plaintext
= Wordpress-import
|
|
|
|
This little project is an importer for WordPress XML dumps into Rails.
|
|
|
|
It's been somewhat customized for one particular project; you probably want to fork this and modify it to fit your app's schema.
|
|
|
|
It's a fork of Marc Remolt's Refinerycms-wordpress-import ( https://github.com/mremolt/refinerycms-wordpress-import )
|
|
|
|
You can find the source code on github: https://github.com/zyphlar/wordpress-import
|
|
|
|
Keep in mind that links to other pages of your blog are just copied, as WordPress exports them as <a>-Tags.
|
|
If your site (blog) structure uses new urls, the links WILL break! For example, if you used
|
|
the popular WP blog url structure "YYYY-MM/slug", be warned that Refinery just uses "blog/slug".
|
|
So your inner site links will point to the old WP url.
|
|
|
|
== Prerequisites
|
|
|
|
TODO
|
|
|
|
|
|
== Installation
|
|
|
|
Just add the gem to your projects Gemfile:
|
|
|
|
gem 'wordpress-import'
|
|
|
|
Or if you want to stay on the bleeding edge:
|
|
|
|
gem 'wordpress-import', :git => 'git://github.com/zyphlar/wordpress-import.git'
|
|
|
|
and run
|
|
|
|
bundle
|
|
|
|
|
|
== Usage
|
|
|
|
Importing the XML dump is done via rake tasks:
|
|
|
|
rake wordpress:reset_blog
|
|
|
|
This one basically deletes all data from blog relevant tables (taggings, tags, blog_comments,
|
|
blog_categories, blog_posts, blog_categories_blog_posts).
|
|
Use this one first, if you want a clean import of your old blog.
|
|
|
|
rake wordpress:import_blog[file_name]
|
|
|
|
This one does all the heavy work of parsing the dump and importing the data into refinery tables.
|
|
The parameter is the path to the dump file. Got a report from a Mac user, that the ~
|
|
didn't work in the path. I'll have a look at it, but till then, don't use it please.
|
|
|
|
If you don't want to import draft posts, you can set the ENV variable ONLY_PUBLISHED to true:
|
|
|
|
|
|
rake wordpress:import_blog[file_name] ONLY_PUBLISHED=true
|
|
|
|
The task will then skip all posts that are not published.
|
|
|
|
rake wordpress:reset_and_import_blog[file_name]
|
|
|
|
This one combines the two previous tasks.
|
|
|
|
If you also want to import the cms part of WordPress, three more rake tasks manage
|
|
the import into RefineryCMS Pages:
|
|
|
|
rake wordpress:reset_pages
|
|
|
|
This task deletes all data from the cms tables, ensuring a clean import. Otherwise existing
|
|
pages could break the import because of duplicate IDs.
|
|
|
|
rake wordpress:import_pages[file_name]
|
|
|
|
This task imports all the WordPress pages into Refinery. The page structure (parent - child)
|
|
is preserved.
|
|
|
|
If you want to skip the draft pages, add the ONLY_PUBLISHED parameter to this task,
|
|
just like with wordpress:import_blog.
|
|
|
|
rake wordpress:import_pages[file_name] ONLY_PUBLISHED=true
|
|
|
|
If you want to clean the tables and import in one task:
|
|
|
|
rake wordpress:reset_and_import_pages[file_name]
|
|
|
|
Finally, if you want to reset and import all data including media (see below):
|
|
|
|
rake wordpress:full_import[file_name]
|
|
|
|
|
|
== Importing media files
|
|
|
|
The WP XML dump contains absolute links to media files linked inside posts, like:
|
|
|
|
www.mysite.com/wordpress/wp-content/uploads/2011/05/cv.txt
|
|
|
|
The dump does NOT contain the files itself! To get them imported, this gem downloads the files
|
|
from the given URL and imports them to refinery. So for a working media import the old site with
|
|
the media URLs must still be online.
|
|
|
|
After importing the files, this gem replaces the old links in pages and blog posts with the
|
|
new generated ones. It parses all existing records searching for the right pattern. That
|
|
means, you have to import pages and posts FIRST to get the URLs replaced.
|
|
|
|
Now to the rake tasks for media import:
|
|
|
|
rake wordpress:reset_media
|
|
|
|
This task deletes all data from the media tables (images and resources), ensuring a clean import.
|
|
|
|
rake wordpress:import_and_replace_media[file_name]
|
|
|
|
This task imports all the WordPress media into Refinery. After the import it parses all
|
|
pages and blog posts, replacing the legacy links with the current refinery ones.
|
|
|
|
If you want to clean the tables and import in one task:
|
|
|
|
rake wordpress:reset_import_and_replace_media[file_name]
|
|
|
|
== Usage on ZSH
|
|
|
|
One more hint for users of zsh (like myself):
|
|
|
|
The square brackets following the rake task need to be escaped on zsh, as they have a
|
|
special meaning there. So the syntax is:
|
|
|
|
rake wordpress:reset_and_import_blog\[file_name\]
|
|
|
|
Ugly, but it works. This is the case for all rake tasks by the way, not just mine.
|
|
|
|
|
|
== Feedback
|
|
|
|
This is still a very new gem. It manages to import my own blog and a standard WordPress 3.1 dump with some sample posts.
|
|
The first feedback is quite good, so it seems, the gem doesn't eat the machines it is installed on.
|
|
|
|
If you want to help make it even more stable, please throw your own WP dumps against it
|
|
and see what happens. If you encounter any bugs, please file a bug report here on github.
|
|
A sample dump that breaks this gem would be really helpful in that case.
|
|
|
|
For extra karma, fork it, fix it yourself and send a pull request! ;-)
|