There's a mailing list for discussion of sitescooper. To join, send a mail to <sitescooper-request /at/ netnoteinc.com> with the one word subscribe in the message body to join. If you're already on the list, send a mail to <sitescooper-request /at/ netnoteinc.com> with the word unsubscribe in the message body to unsubscribe. Note: the mail addresses above are "spam-protected", so you need to change the " /at/ " parts to an @ sign to send a mail to them!
If you have a site you think others will like, mail the .site file to the list, and I'll stick it in the distribution -- and list your name in the CREDITS section. Same goes for bug patches!
If you find one, send a bug report to the list (or myself) and I'll try to get around to fixing it. Could take a while though, as I don't get paid for this stuff. BTW I really like bugfix patches if you feel like submitting one after finding a bug ;)
Some of the post-processing and HTML cleanup code include ideas and code shamelessly stolen from http://pilot.screwdriver.net/ , Christopher Heschong's <chris at screwdriver.net> webpage-to-pilot conversion tool.
Included in the distribution is a copy of Algorithm::Diff, an implementation of the Longest Common Subsequence algorithm, Copyright 1998, 1999 M-J. Dominus (mjd-perl-diff /at/ plover.com).
Also Robb Canfield (robbc /at/ canfield.com) has kindly provided Table.pm, "a general purpose HTML table converter that tries, usually successfully, to convert wide tables to long lists. In general it copies the table headers and rotates them down for each row." It's used when you set "TableRender: list".
Both are free software; you can redistribute it and/or modify it under the same
terms as Perl itself.  Also included under the Artistic license are:
HTML-Parser 2.23, by Gisle Aas;   © 1995-1999 Gisle Aas. All rights reserved.
Libwww-perl 5.45, © 1995-1999 Gisle Aas. All rights reserved, and © 1995 Martijn Koster. All rights reserved.
MIME-Base64 2.11, Copyright 1995-1999 Gisle Aas <gisle /at/ aas.no>.
URI 1.04, Copyright 1998-1999 Gisle Aas, Copyright 1998 Graham Barr.
These are included to ease the task of installation.
Here's a list of people who've contributed to sitescooper, either with .site files, patches, or suggested fixes and functionality:
Carsten Clasohm, <cc /at/ clasohm.com>: fix for diffing sites with newlines in the href tags, regional_germany sites.
michael d. ivey <ivey /at/ gweezlebur.com>: packaging sitescooper as a .deb, and general Debian compliance -- thanks!
Stefan Schwingeler <stefan /at/ schwingeler.de>: fix for ContentsSkipURL, regional_germany sites. Stefan and Carsten are responsible, between them, for all the sites in the regional_germany category -- thanks guys!
Pierre-Yves Letournel <e-py.letournel /at/ wanadoo.fr>: regional_francais: afp.site, le_monde.site, 01_informatique.site, lmi_hebdo.site, lmi_quotidien.site.
Jacques Turbé <jturbe /at/ cybercable.fr>: regional_francais: lemondecomplet.site nouvelobs.site libe_portrait_du_jour.site libe_rebonds.site libe_q.site journaldunet_dossiers.site echos_infos.site, and journaldunet.site. Jacques and Pierre-Yves have, between them, provided all the sites in regional_francais, which is great!
Jason Simpson <jason /at/ xio.com>: contributed seattletimes.site
Joe Pfeiffer <pfeiffer /at/ cs.nmsu.edu>: HTML rendering fixes, lots of sites
dLux <dlux /at/ dlux.hu>: sites for Debian Weekly News, Freshmeat, Hirnet, Linux.Hu, Palmcentral, updated Linux Today
Andrew Fletcher <fletch /at/ computer.org>: MacOS support
spacehog /at/ knowfear.knowfear.net>: yahoo_top_stories.site
Jason C. Axley <jason /at/ axley.net>: installation instructions update for RedHat 6.0, and SRPM for the URI module.
Kennis Koldewyn <kennis.koldewyn /at/ wcom.com>: NY Times sites.
Michael Lapsley <mlapsley /at/ ndirect.co.uk>: fixed bug with "-refresh -fromcache".
Jason Yanowitz <yanowitz /at/ poboxes.com>: site file for The Guardian.
Kevin Olson <kevolson /at/ visi.com>: fixed bug with RichReader command-line.
Vince <reverso /at/ club-internet.fr>: contributed le_temps.site.
Dave Collins: <Dave.Collins /at/ tiuk.ti.com>: fix for (no text to write) when text started with a quote char.
Albert K T Hui <avatar /at/ deva.net>: lots of regional_hk site files, and fixed to allow more 8-bit text; also HTML abuse by Sing Tao Daily worked around.
Alastair Rankine <arankine /at/ lucent.com>: fairfax_it.site
Kevin L. Dupree <kdupree /at/ flash.net>: image-only site support.
Andy Rabagliati <andyr /at/ wizzy.com>: csmonitor.site and KPilot support.
Memeteau, Michael <Michael.Memeteau /at/ autoeuropa.pt>: site files.
Derek Glidden <dglidden /at/ illusionary.com>: fixed lots of delinquent site files, added science_daily.site, spaceref.site.
Justin Henry <jhenry /at/ fjicl.com>: A fine selection of sites: updated salon.site; gist_tv.site; cats_cradle.site; clark_howard.site; morbid_fact_du_jour.site; news_observer.site; ny_times_handheld.site; roger_ebert.site; usa_today.site; weather24.site, and wral_tv.site.
[ README ]|[ Installing ]|[ on UNIX ]|[ on Windows ]|[ on a Mac ]|[ Running ]|[ Command-line Arguments Reference ]|[ Writing a Site File ]|[ Contributing ]|[ GPL ]|[ Home Page ]