Access Keys:
Skip to content (Access Key - 0)
My Area (Access Key - 2)


Toggle Sidebar
Your Rating: Results: PatheticBadOKGoodOutstanding! 6 rates
Labels:
primo primo Delete
pipes pipes Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Web Page crawler

Tags: , Last Updated: Oct 30, 2009 01:17


  • Description

    A collection of scripts and configuration files used to convert web page content to XML. Pipe to load into Primo is included. This is based on Swish-e spider.pl open source software. See README.txt file for more detail.


  • Author: John Osborn
  • Additional author(s):
  • Institution: University of Iowa
  • Year: 2009
  • License: BSD style
  • Short description: Use, modification and distribution of the code are permitted provided the copyright notice, list of conditions and disclaimer appear in all related material.
  • Link to terms: [Detailed license terms]
  • Skill required for using this code:
    Advanced

State

Stable (in our environment)

Programming language

perl, SQL

Software requirements

perl, perl DBI

Screen captures

Author(s) homepage

Download

WebCrawl.zip

Working example

http://smartsearch.uiowa.edu/primo_library/libweb/action/search.do?frbg=&dum=true&vid=uiowa&vl(freeText0)=libweb+about&fn=search&mode=Basic&ct=search&srt=rank&indx=1&tab=default_tab

Using the following Ex Libris open interfaces

Changes

Version 1.2

Version 1.1

Release notes

Installation instructions

TO DO list

Known issues

Comments

Page Attachments

File NameCommentSizeNumber of Downloads
WebCrawl.zip19 kB145

Added by John Osborn on May 01, 2009 20:33, last edited by Conf Admin on Oct 30, 2009 01:17

Adaptavist Theme Builder Powered by Atlassian Confluence