Basically, I'll let my code do the talking. This demonstrates two particular features of Ruby - Dir and open-uri, both of which are pretty cool. This took me about 15 minutes to code up ... I was impressed (no, not with myself, with the language!).
# Problem: I downloaded thousands of images# from a public repository, image-net.org.# I chose 70 to use in an experiment. I saved
# those 70 with their original filenames, but
# didn't record the original URLs, which I now# realize I need for the purposes of an experiment.# image-net has an API for getting URLs for an entire # image set, but I only used a handful (< 1% in
# many cases) from any given set.
# The only link between the downloaded image and # the API is the filename.
## Solution: Ruby.## include the open-uri toolsrequire 'open-uri'
# Get list of files in the current folder # Some of them I care about, some I don'tdir_contents = Dir.entries(Dir.pwd)
# Setup some arrays ...imgs = [] # for image IDs, "[wnid]_[picnum]"wnids = [] # for image set ids (only the wnid part)
# Parse filename for image IDs, wnids
dir_contents.each{ |dc|if dc =~ /((n.+)_.+)\.JPEG/ then
imgs.push($1) # save the image ID
if (!wnids.include?($2)) then
wnids.push($2) # save unseen wnid parts
end
end
}
# URL prefixbase = "http://www.image-net.org/api/text/imagenet.synset.geturls.getmapping?wnid="
# Roll through image setswnids.each{ |wnid|open(base + wnid) { |page| # download the page, get handle
page.each_line{ |line| # for each line on the pageline =~ /(.+)\s+(.+)/ # parse the img ID and URL
if imgs.include?($1) thenputs $2 # print the URL if I used it
end
} }}
No comments:
Post a Comment