Basically, I'll let my code do the talking. This demonstrates two particular features of Ruby - Dir and open-uri, both of which are pretty cool. This took me about 15 minutes to code up ... I was impressed (no, not with myself, with the language!).
# Problem: I downloaded thousands of images
# from a public repository, image-net.org.
# I chose 70 to use in an experiment. I saved
# those 70 with their original filenames, but
# didn't record the
original URLs, which I now# realize I need for the purposes of an experiment.
# image-net h
as an API for getting URLs for an entire # image set, but I only used a handful (< 1% in
# many cases) from any given set.
# The only
link between the downloaded image and # the API is the filename.
#
# Solution: Ruby.
#
# include the open-uri tools
require 'open-uri'
# Get list of files in the current folder
# Some of them I care about, some I don't
dir_contents = Dir.entries(Dir.pwd)
# Setup some arrays ...
imgs = [] # for image IDs, "[wnid]_[picnum]"
wnids = [] # for image set ids (only the wnid part)
# Parse filename for image IDs, wnids
dir_contents.each{ |dc|
if dc =~ /((n.+)_.+)\.JPEG/ then
imgs.push($1) # save the image ID
if (!wnids.include?($2)) then
wnids.push($2) # save unseen wnid parts
end
end
}
# URL prefix
base = "http://www.image-net.org/api/text/imagenet.synset.geturls.getmapping?wnid="
# Roll through image sets
wnids.each{ |wnid|
open(base + wnid) { |page| # download the page, get handle
page.each_line{ |line| # for each line on the page
line =~ /(.+)\s+(.+)/ # parse the img ID and URL
if imgs.include?($1) then
puts $2 # print the URL if I used it
end
}
}
}
No comments:
Post a Comment