Google Sitemaps needs a list of URLs to optimize crawling. Usually, this is no problem, since Google supplies a script you can run on your server to build that list.
But that fails, if the content of your site is not stored in HTML-files, but in TXT-files like DokuWiki does. So here is what I did to build that list of URLs:
We do all of this in the directory where DokuWiki stores the pages:
cd /home/hdocs/beta.linuxbasics.org/data find ./ -iname "*.txt"
give us
./wiki/syntax.txt ./wiki/dokuwiki.txt ./wiki/playground.txt ./start.txt ./tutorials/pre/start.txt ...
which is the URL except that:
The editor 'sed' can help us with those replacements. It is the source of Perl's s/-command, so if you know Perl, this will be familiar: <code bash> sed -e 's#^./#http://LinuxBasics.org/#g ; s/.txt$g' </code>
This uses '#' as a delimiter instead of '/'. Why? Because it looks much better then the version with slashes: “s/^./http:\/\/LinuxBasics.org/g”
find ./ -iname "*.txt" | sed -e 's#^./#http://LinuxBasics.org/#g ; s/.txt$//g'
gives us what we want:
http://LinuxBasics.org/wiki/syntax http://LinuxBasics.org/wiki/dokuwiki http://LinuxBasics.org/wiki/playground http://LinuxBasics.org/start http://LinuxBasics.org/tutorials/pre/start http://LinuxBasics.org/tutorials/pre/md5sum