{"id":60,"date":"2010-02-12T13:33:09","date_gmt":"2010-02-12T17:33:09","guid":{"rendered":"http:\/\/bender.library.american.edu:8083\/archives\/?p=60"},"modified":"2010-02-12T13:33:09","modified_gmt":"2010-02-12T17:33:09","slug":"american-universitys-web-harvesting-project-a-work-in-progress","status":"publish","type":"post","link":"https:\/\/blogs.library.american.edu\/archives\/american-universitys-web-harvesting-project-a-work-in-progress\/","title":{"rendered":"American University&#8217;s Web Harvesting Project: A Work in Progress"},"content":{"rendered":"<p>\t\t\t\tAs we recently completed our first year of web harvesting, it seems a fitting time to make a progress report.\u00a0The original scope of this project was to document the online presence of student organizations and to collect web only publications. We presented our proposal in the fall of 2008 just as AU was finalizing plans to launch its new website the following spring. In light of this, we expanded our scope to cover the University\u2019s entire website.\u00a0 American University selected the Internet Archive\u2019s Archive-It service for this project.\u00a0Archive-It has a user friendly web interface through which you can set up and schedule crawls.\u00a0The Internet Archive stores the web sites collected, generates reports, and offers technical support.\u00a0Because of the evanescent nature of the web, it is important to review the reports generated by Archive-Its crawler.\u00a0These reports document the success\/failures of the crawl.\u00a0By reviewing this data, we can identify crawler traps and write code to prevent future problems.\u00a0Over the course of the last year, we have conducted four major crawls and several smaller ones.\u00a0We reaped the benefits of this project within several months of starting.\u00a0We have already received inquiries from students seeking copies of articles they had written for an online publication.\u00a0The publication\u2019s web site was temporarily down and the harvested version was the only source of their work.\u00a0The archived version of AU\u2019s website is available through the Archive-It site.\u00a0I invite you to browse the archives.\u00a0Start at the following site: <a href=\"http:\/\/www.archive-it.org\/public\/all_collections\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/www.archive-it.org\/public\/all_collections<\/a> and select one of AU\u2019s Collections.\u00a0For those of you familiar with the Wayback Machine, it only has data for <a href=\"http:\/\/www.american.edu\/\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/www.american.edu\/<\/a> from 1996-2008.\t\t<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As we recently completed our first year of web harvesting, it seems a fitting time to make a progress report.\u00a0The original scope of this project was to document the online presence of student organizations and to collect web only publications. We presented our proposal in the fall of 2008 just as AU was finalizing plans [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,6],"tags":[],"class_list":["post-60","post","type-post","status-publish","format-standard","hentry","category-featured-collections","category-news"],"_links":{"self":[{"href":"https:\/\/blogs.library.american.edu\/archives\/wp-json\/wp\/v2\/posts\/60","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.library.american.edu\/archives\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.library.american.edu\/archives\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.library.american.edu\/archives\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.library.american.edu\/archives\/wp-json\/wp\/v2\/comments?post=60"}],"version-history":[{"count":0,"href":"https:\/\/blogs.library.american.edu\/archives\/wp-json\/wp\/v2\/posts\/60\/revisions"}],"wp:attachment":[{"href":"https:\/\/blogs.library.american.edu\/archives\/wp-json\/wp\/v2\/media?parent=60"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.library.american.edu\/archives\/wp-json\/wp\/v2\/categories?post=60"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.library.american.edu\/archives\/wp-json\/wp\/v2\/tags?post=60"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}