list2xml script – Mnemosyne vocabulary flashcards!

Sat, 07 August 2010

With the October SAT coming up, I had to work on my vocabulary. I found some wordlists on the Internet and in books, but I didn’t find it very convenient to learn from a list.

Flash cards seem to work really well. I found a program called ‘mnemosyne’ that manages flash cards. You assign grades to each card depending on how familiar you are with the question. It then schedules the cards to reappear at appropriate times (‘bad’ cards appear soon, ‘good’ cards appear later).

Now I needed some way to put words into this thing without have to use its ‘Add cards’ feature (I’m rather lazy, and besides, it would take forever that way). I found a SAT cards database online that was in the mnemosyne format, but it was huge, and it seemed that the other lists I found online and in books before were more ‘accurate’. This called for automation!

I wrote a bash script that took as its standard input a newline-separated list of words and gave as standard output a file in the mnemosyne XML format with the words as the ‘questions’ and their definitions as the ‘answers’ (these are the definitions as given by WordNet – I’ve found them to be the best, with good examples and synonyms too):-

#!/bin/sh

echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
echo "<mnemosyne core_version=\"1\" time_of_start=\"1224014401\" >"
echo "<category active=\"0\">"
echo " <name>$1</name>"
echo "</category>"

while read word
do
    echo
    echo "<item>"
    echo "<cat>$1</cat>"
    echo "<Q>"$word"</Q>"
    echo "<A>"
    curl dict://dict.org/d:$word:wn | head --lines=-3 | tail --lines=+5
    echo "</A>"
    echo "</item>"
    echo
done

echo "</mnemosyne>"

The first argument is the name of the mnemosyne ‘category’ you want the cards to be put into. For example, you can run it like this (assuming you named the script list2xml.sh and have permission to execute it):-

cat wordlist | ./list2xml.sh SAT-Words-1 > satWords1.xml

There are lots of lists online you could convert into the newline-separated-word format required by the script using a vim macro or sed or something. You could even scan in some books and use an OCR. :P

Here’s a picture from mnemosyne with a script-converted category open:-


Back to posts

Menu

Home
Log
Biography
Projects
Music
Art

About
Contact
GitHub

Links

Ogre
Blender
Bullet
Arch

Unix is user-friendly. It's just very selective about who its friends are.

- Anonymous