Monday, December 15, 2008

Cleaning up the existing site

At first I thought we'd be launching the redesign before our SACS review. The powers that be have other plans. At first I was happy to have a few extra months to work and test and debug prior to launch. But then it hit me. I was gonna need to clean up the site content for the review. That means I've got to deal with the pretty much total lack of any real information architecture on the existing site.

As a first pass, I ran a site wide link report. My first step was to get rid of the orphans. I quickly discovered 2 problems wit this idea.

  1. The current site uses many JavaScript driven pop up windows using code generated by GoLive. DreamWeaver apparently doesn't know how to check these links. Therefore all such content shows up as orphaned.
  2. Ditto for the Flash stuff. This is more surprising since Flash is also a Macromedia product from back in the day. This means the the thousands of photos we have in the various Flash driven galleries all show up as orphans.

The initial report said that out of 24,000+ files, 13,000+ were orphans. More than half the files on the server showed up as not being linked to at all. Of those 24,000-something files, 10,626 were HTML/ASP files. The rest were images, PDFs, and stuff like that. We're now down to a total of 19,957 files, 9,413 of which are HTML/ASP. But 6,138 still show up as orphaned. I bet a few of those really are orphans. Probably no more than 200. And most of those would be images. I'm primarily worried about indexable content that could turn up in a Google search but present horribly outdated information. The trouble there is not all of those files are orphaned. We're still linking to many of them. I guess the next step will be to search for obviously outdated files. Stuff with years in the file names, for example. Then I'll probably need to run another orphan check for freshly orphaned files once that content is cleaned up.

The good news is I've reduced the size of my local directory by 45%, from about 3.4 gigs to 1.9 gigs. The majority of that was the files we're still hosting from last year's CIT conference. But I never need to update those, so there's no need to store them locally. Some of those PowerPoint files got crazy big.

Of course, currently the beta site takes up a total of 339 megs. But it's not quite complete. Still, I'll be surprised if it grows to anywhere near 1.9 gigs before launch. Due to simple changes like getting rid of tables for layout and abandoning the <font> tag in favor of CSS we've shaved about 35k per page. We've also eliminated many pages. The beta site currently contains just 985 PHP files. That's about 10% of the files the current site contains, but we've migrated way more than 10% of the content. One of the big changes in that regard is that we now link to the online catalog for curriculum and course descriptions. There goes 2 pages per degree program plus at least a page per course offered. I think the current site has a lot of redundancy in course description pages among the various program directories. Most of the remaining content will be database driven.

No comments: