|
8楼
楼主 |
发表于 2016-2-5 22:31:24
|
只看该作者
本帖最后由 fnaviwwo1 于 2016-3-22 23:50 编辑
抓词典的脚本,为了不为增加服务器的压力,请不要随便使用...
抓单词列表。
- #encoding: utf-8
- require 'open-uri'
- def get_list(a)
- open("http://dictionary.cambridge.org/browse/essential-british-english/#{a}")
- .read.scan(/http[^"]*essential-british-english\/.\/[^"]+(?=")/)
- end
- def build_list(range='a'..'z')
- (range).flat_map{|x|get_list(x)}
- end
- def get_words(url)
- p url
- text = open(url).read
- text.scan(/http[^"]*\/dictionary\/essential-british-english\/[^"]+(?=")/)
- end
- open("wordlistt.txt",'w').
- puts build_list('a'..'z').flat_map{|x|get_words(x)}
- p :ok
复制代码
抓单词解释
- #encoding: utf-8
- require 'open-uri'
- require 'nokogiri'
- def look__up(url)
- doc = Nokogiri::HTML(open(url))
- doc.css("div.di").to_html
- end
- puts look__up("http://dictionary.cambridge.org/dictionary/essential-british-english/give-in")
- #look__up("http://dictionary.cambridge.org/dictionary/essential-british-english/qualification")
- gets
- def task
- list = open('wordlistt.txt').readlines
- p list.length
- list.each{|x|
- x.chomp!
- name = x[/[^\/]+$/]
- filename = "word/#{name}.txt"
- unless File.exist?(filename)
- p filename
- p x
- y = look__up(x)
- open(filename,'w').print(y)
- #gets
- end
- }
- end
- while 1
- begin
- task
- rescue =>e
- p e
- end
- sleep 5
- end
复制代码
排版输出
- #encoding: utf-8
- require 'nokogiri'
- def cc(file)
- text=open(file,'r:utf-8').read.gsub(/(^ *)|( (?= ))|(\n)|(\t)|( \n)/,'')#.gsub("\n",'')
- doc = Nokogiri::HTML(text)#
- parse(doc).gsub(/-\n /,"-\n")
- end
- def parse(node)
- return '' if node.comment?
- return node.text if node.text?# && node.next.nil?
- r = ""
- p_c = ->(){node.children.map{|x|parse(x)}*''}
- case node['class']
- when 'share-this-entry','di-title'
- ''
- when 'def-info'
- return "#{node.text.strip}| "
- when 'sense-block'
- return "\n"+p_c.()
- when "di-title cdo-section-title-hw"
- return "\n\n#{p_c.()}\n-----------\n"
- when "cl"
- "*#{p_c.()}*"
- #when 'c2' then fail
- when 'eg'
- return "\n #{p_c.()}"
- else
- p_c.()
- end
- end
- puts cc('./word/give.txt')
- gets
- gets
- $f = open("celd.md",'w')
- $f.print <<EOF
- Cambridge Essential English Dictionary
- ==========================================
- EOF
- Dir["./word/*"].tap(&:sort!).each{|x|
- $f.print cc(x)
- }
复制代码
|
|