First you have to login on http://www.blogger.com and use preferences (aka Einstellungen):
data:image/s3,"s3://crabby-images/214bd/214bd9ea7a2b355f26df26a1a26477bf06a973e1" alt=""
There you have to use etcetera (aka Sonstiges):
data:image/s3,"s3://crabby-images/f9514/f95146dd66b2bc717b2a2325d833b27fd1320720" alt=""
There you find a link "export blog" (aka Blog exportieren):
data:image/s3,"s3://crabby-images/4302e/4302e38357d3b8c3056f4d5e333bccd9f3d7191c" alt=""
After the following dialog you get one big xml-file:
data:image/s3,"s3://crabby-images/80d65/80d65dfdd54da82d9bd5e7f07de25d0f2b105e32" alt=""
I got a file named blog-05-26-2012.xml. This file contains everything of you blog:
First extract only the lines with the xml-tag "entry":
If you want to get a file with one line per post like "date**title**content" you can use the following command:
Html is escaped with < and >. To reformat this the following to sed commands can be used:
- Layout
- Users
- Configuration
- All postings (incl. comments, date, labels, ...)
- Locales
- Meta description
- Timezone, timestamp format
- ...
First extract only the lines with the xml-tag "entry":
grep "<entry>" blog-05-27-2012.xml > blog.entry.xmlThen put every entry in a new line:
sed 's/<entry>/\n<entry>/g' blog.entry.xml > blog.newline.xmlNow you have some line wiht configuration details. You can remove them with this command:
grep -v "<email>noreply@blogger.com</email>" blog.newline.xml |grep "<author>" > blog.posts.xmlNow this XML contains a lot of tags:
- id
- author
- title
- content
- link
- published
- updated
- uri
- category
- name
If you want to get a file with one line per post like "date**title**content" you can use the following command:
cat blog.posts.xml | sed $'s/<title type=\'text\'>/gruzelwurbel/g'|sed $'s/<\/title><content type=\'html\'>/gruzelwurbel/g'|sed 's/<\/content>/gruzelwurbel/g'|sed 's/<published>/gruzelwurbel/g'|sed 's/<\/published>/gruzelwurbel/g'| awk -F gruzelwurbel '{printf("%s**%s**%s\n",$2,$4,$5)}'-> 2008-01-01T00:00:00.000-08:00**Gästebuch**Um einen Kommentar im Gästebuch zu hinterlassen bitte "Kommentar veröffentlichen" anklicken.
Html is escaped with < and >. To reformat this the following to sed commands can be used:
cat file | sed 's/</</g' | sed 's/>/>/g' > newfile
No comments:
Post a Comment