Wiadomosci lokalne i ze swiata - www.news.cvfq.com
» Strona g-ówna : Web2

   


Sun, 30 Jul 2006 16:11:01

One of the more unique and perhaps controversial features of FeedJournal is that it can filter out the meat of an article published on the web.

How does it accomplish this? FeedJournal has four ways of retrieving the actual content for the next issue.

Actual Content
In the trivial case, a site (like this blog for example) decides to include the full article text within its RSS feed. FeedJournal simply published the content; no surprises here. By the way, this is how all standard RSS aggregators work. The problem is when a site decides to only publish summaries or teasers of the full article text. FeedJournal needs to deal with this because it is an offline RSS reader, users cannot click on their printed newspaper to read the full article.

Linked Content
The <link> tag inside the RSS feed specifies the URL for the full article. In case the RSS only includes summaries of the full articles, FeedJournal retrieves the text from this URL.

Rewritten Link
In most cases, just following this link is not a good solution. The web page typically includes lots of irrelevant content, like a navigation menu, a blogroll, or other articles. FeedJournal lets the user write a regular expression for each feed, automatically rewriting the article’s URL to the URL of the printer-friendly version. As an example the URL to a full article in International Herald Tribune is http://www.iht.com/articles/2006/07/28/news/mideast.php while the link to the printer-friendly version is http://www.iht.com/bin/print_ipub.php?file=/articles/2006/07/28/news/mideast.php By inserting bin/print_ipub.php?file=/ in the middle of the URL we will reach the printer-friendly article. This article is much more suitable for publishing in FeedJournal, because it more or less only contains the meat of the article.

Filtered Content
“More or less”, I said in the last sentence. There are usually some unwanted elements left in the printer-friendly version, like a header and a footer. These can be filtered out by letting FeedJournal begin the article after a specified substring in the HTML document source. Likewise, another substring can be selected as ending the relevant content.

By applying these functions it is possible to scoop, or extract, the meat of almost any web published article. Of course it is only necessary to do this once for every feed. To my knowledge, FeedJournal is the only aggregator who has the functionality described in the last three sections.

Is this legal, you ask? Wouldn’t a site owner require each user to actually visit the web site to read the content and click on all those fancy ads sprinkled all over? Well, my stance is that if the content is freely available on the web, I am free to do whatever I want with it for my own purposes. Keep in mind that we are not actually republishing the site’s content, we are only filtering it for our own use. Essentially, I think of this as a pop-up or ad blocker running in your browser.

What is interesting to note is that some web sites have tried to include in their copyright notice a paragraph limiting the usage of their content. Digg.com, for example, initially had a clause in the their copyright effectively prohibiting RSS aggregators from using their RSS feeds! Today, it is removed.

As long as FeedJournal is used for personal use, and the issues are not sold or made available publicly, I do not see any legal problems with the deep linking.



Fri, 04 Aug 2006 13:43:50
As I previously blogged about I was using Shalom Help Maker to generate my help file. After spending some time with this and finally completing the user documentation, I was ready to insert it into FeedJournal. The it suddenly hit me: this is not the right help format! The Danish Shalom Help Maker is generating help files in the old Windows .HLP format, which has been obsolete for some years now. What I need is CHM format, which I had up till now deluded myself into believing I was working with. Ouch!

OK, there must be some way of converting my HLP file into CHM, right? Nope, at least nothing free, and all of the programs I tried generated an error during the conversion. Finally, after much hunting I found a link to a freeware application on the excellent forums at Joel on Software. The program was called HelpMaker, and sure enough its conversion feature also choked on my HLP file, but I was able to copy/paste the content to generate a CHM file. HelpMaker also offer to possibility to compile to many different formats: CHM, HLP, MSDN format, RTF, and HTML. All in all, the process using HelpMaker was much easier than that of HTML Help Workshop, the free and official vanilla solution.

I took the opportunity to compile an output for the web at the same time as I compiled the CHM file. The web based help files are available here.

Now, if I only had realized my mistake earlier, I would have saved many hours that I now spent working on the wrong format. Doh!n


Fri, 30 Mar 2007 14:51:00
Hearst-Argyle Television, Inc. today announced three additions to its growing digital media team.

Fri, 30 Mar 2007 14:52:00


Fri, 30 Mar 2007 14:58:00
SoftJin, a customizedEDA (Electronic Design Automation) tool development company, announces theimmediate availability of a Bi-directional OpenAccess to OASIS Translator.

Fri, 30 Mar 2007 15:04:00


Fri, 30 Mar 2007 15:28:00
Universal Guardian Holdings, Inc. (OTCBB: UGHO), an emerging global leader in non-lethal protection products, integrated transportation and global supply chain security systems, and strategic security services to protect against terrorist, criminal, and security threats to governments and businesses worldwide, announced that it will present at and partner sponsor the United States Department of Defense RFID (Radio Frequency Identification) Summit to be held at the Hilton, Washington, D.C. on April 3-4, 2007.

Fri, 30 Mar 2007 15:29:00
PetroSun, Incorporated (PINKSHEETS: PSUD)announced today the following update to its Oil Patch report.

Informacji szukaj w: google yahoo msn



Zobacz nastepne strony: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59