Added support for the danbooru.donmai.us board.
Looks like Sankaku Complex has finalised their layout, so I updated the spider accordingly and introduced a handful of new features.
It's all on the code page.
I moved everything over from blog.krhis.net to krhis.net and gave the site a bit of a facelift.
I've put some torrents up:
ftp://krhis.net/ is up. There's no uploading and a 10kbs cap, so don't expect much.
Sankaku Complex changed their layout earlier today, so I updated the spider. In addition to that I have also updated the other boards to v0.3 to better handle failed connections. Check out the code page.
I got my name on Cops last night.
wlsort.py is a python script intended to manage word lists. Without any arguments, this script will remove junk (eg. empty lines), remove duplicates, and sort a word list. However, if needed you can also generate a word list with various combinations of upper/lower cases, append digits, and specify a minimum/maximum word length.
Features:
Image board spiders under code updated to use urllib2 and select a random user agent. I have included a collection of common user agents spanning a variety of operating systems, browsers, versions, and bots. I guess anything is better then 'Python-urllib/2.6'.
Ok, first of all the image board admins out there are going to hate me for this.
With that out of the way; I have created four scripts in Python that can be used to spider tags specified by the user and collect a list of working URLs linking directly to its content (images and so on) by generating a neat .lst file that you can pipe through wget. Four popular image boards are supported, for now you can spider Gelbooru, Imouto, Konachan, and Sankaku Complex.
Just specify one or more tags like so:
$ python gelbooru.py ice_cream maid
After it's done and you have your list, run wget:
$ wget --limit-rate=10k -c -i tag.lst
Please don't abuse this, throttle your stuff and let it run over night. You can expect these scripts to break if the board admin drastically changes the layout.
Ok, that's it. My last post to this blog was on 2008-07-21 and it was regarding the PiQ team and Newtype USA. Both are distant memories.
That was 431 days... or 1 year, 2 months, and 4 days ago.
Time for a complete overhaul. To start with I have completely wiped out all previous entries from the database. I did have plans on carrying them over, but I think starting fresh would be better. Secondly, I'm ditching WordPress in exchange for something lighter. It's a great bloging platform but I just don't have time to do the maintenance, not to mention the amount of overhead needed to generate one single page.
I've been using WordPress for over 3 years and received 73,770 total visits from 2006-07-10 to 2009-09-25. That's pretty amazing considering that it comes out to 63 visits per day, even after subtracting spiders.