Been doing a touch of screen scraping, scripting with Ruby, against a target that was ‘unwilling’. A few observations:
@agent = Mechanize.new
class @agent
alias :orig_get :get
alias :orig_fetch_page :fetch_page
# remove the chaff characters
def get(options, parameters = [], referer = nil)
page = orig_get(options, parameters, referer)
page.body = page.body.gsub(/"[0x00]"/,"")
page
end
def fetch_page(params)
page = orig_fetch_page(params)
page.body = page.body.gsub(/"[0x00]"/, "")
page
end
end
[0x00] represents ascii null in the sample code; I was able to select and paste the character from an HTML dump with both vim and a GUI text editor but it tends to be less than visible in the wild and YMMV.
by noreply@blogger.com (Ben Weiner) - March 09, 2010 10:14 AM
The Open Clip Art Library has grown, from humble beginnings in early 2004, into a massive collection of over 24,000 scalable vector images, all created by 1200+ artists from around the world.
OCAL is a powerful platform, through which, all work uploaded to the site is dedicated to the public through Creative Commons’ “Public Domain Dedication”. This means that anyone can download and use the entire SVG library for any purpose, including both free and commercial works!
OCAL now boasts an easily navigable collection, made possible by new thumbnail previews. It has now become much easier to search and download clip art that suits any situation. The new site layout includes an updated theme, from Andy Fitzsimon, that emphasizes user interaction by placing more importance on the portal to upload created work, as well as displaying selections from the ever-growing collection.
Behind the scenes, members of Fabricatorz, including, among others, Bassel Safadi, Michi, Ronaldo Barbachano, and Brad Phillips, have helped push The Open Clip Art Library onto the Aiki Framework. This new PHP + MYSQL platform allows programmers to easily create and work with content management systems from the web.
Please help support the new Open Clip Art site launch by registering (if you haven’t already) and uploading artwork of your own!.
Read the entire Announcement 2.0 here and at the Fabricatorz post.
Mark Shuttleworth, founder of the Ubuntu project, recently announced the commission of a new open font for Ubuntu and Canonical:
"We have commissioned a new font to be developed both for the logo’s of Ubuntu and Canonical, and for use in the interface. The font will be called Ubuntu, and will be a modern humanist font that is optimised for screen legibility. It will be published under an open font license, and considered part of the trade dress of Ubuntu, which will limit its relevance for software interfaces outside of Ubuntu but leave it free for use across the web and in printed documents.It will take a few months for the font to be finalised, initial elements will be final in the next week which will be sufficient for the logo and other bits and pieces, but I expect to see that font widely used in 10.10. The work has been commissioned from world-renowned fontographers Dalton Maag, who have expressed excitement at the opportunity to publish an open font and also a font that they know will be used daily by millions of people.
Initial coverage will be Western, Arabic, Hebrew and Cyrillic character sets, but over time we may be able to extend that to being a full Unicode font, with great kerning and hinting for print and screen usage globally. We are considering an internship program, to support aspiring fontographers from all corners of the world to visit London and work with Dalton Maag to extend the font to their own regional glyph set.
The critical test of the font is screen efficiency and legibility, and its character and personality are secondary to its fitness for that purpose. Nevertheless, our hope is that the font has a look that is elegant and expresses the full set of values for both Canonical and Ubuntu: adroitness, accountability, precision, reliability, freedom and collaboration. We’ll publish more as soon as we have it."
What's not to love about this announcement? Seriously, this is very promising... Talk about best practises for setting up an open font project and setting the pace for global derivative branding of a whole family of projects!
Just look at the multiple layers of awesomeness: commissionning very experienced typographers to design an original brand and interface font family for screen and print with wide coverage of major scripts beyond the usual Western scripts (Arabic, Hebrew and Cyrillic) along with plans to extend it even further via a global internship program. With all this serious work to be released under a community-approved license allowing wide modification, redistribution and re-use in tune with the Ubuntu ethos and in respect of Ubuntu's and Canonical's trademarks.
It's very encouraging to see all the attention, energy and resources dedicated to design and language coverage issues in the Ubuntu community: "Every computer user should be able to use their software in the language of their choice" as indicated in the Ubuntu philosophy are not empty words!
A big kudos to all involved!
The community as a whole has made a lot of progress in this difficult area over the past few years (decade?), look at this previous post when Mark was challenging the community to help tackle the font challenges!
And if you didn't know already about the work done for the visual identity Ubuntu has been using until now: there is Andy's Fitzismon's Ubuntu-title with various branches from the community but the new project will reach much further.
Do you enjoy using a range of libre graphics software, including open fonts and all the tools forming the open font design toolkit?
Then please consider supporting the upcoming LGM: the major yearly event where the community of developers and core contributors for libre graphics software come together. Your support will help even more useful interaction and progress to happen during the LGM to the benefit of the wider community.
For details see the create wiki and of course the preparation efforts by OSP (this year's organizer).
Από την πρώτη Μαρτίου 2010, οι χρήστες Windows στην Ευρωπαϊκή Ένωση έχουν την επιλογή για το λογισμικό του περιηγητή (web browser). Η επιλογή θα ενεργοποιηθεί μέσω του συστήματος WindowsUpdate.
Μπορείτε να δείτε πως φαίνεται η ελληνική σελίδα από το σύνδεσμο επιλογής λογισμικού περιήγησης για την Ευρωπαϊκή Ένωση.
Επιλέξτε πρώτο το Mozilla Firefox διότι πρώτο μέλημα του λογισμικού είναι η ασφάλειά σας.
Η δικτυακός τόπος browserchoice.eu παρέχεται από τη Microsoft. Τη λειτουργία του browserchoice.eu την έχει επιβάλει η Ευρωπαϊκή Ένωση όταν καταδίκασε τη Microsoft σε πρόσφατη δίκη περί μονοπωλίου.
Στους όρους χρήσης του browserchoice.eu η Microsoft αναφέρει για το θέμα αυτό
ΚΟΙΝΟΠΟΙΗΣΕΙΣ
Η τοποθεσία BrowserChoice.eu σχεδιάστηκε σύμφωνα με μια απόφαση της νομοθεσίας περί ανταγωνισμού της Ευρωπαϊκής Επιτροπής τον Δεκέμβριο του 2009.
© 2009 Microsoft Corporation. Με επιφύλαξη κάθε νόμιμου δικαιώματος.