Ruby로 XLS 및 XLSX (MS Excel) 파일 구문 분석?
XLS 및 XLSX 파일을 구문 분석 할 수있는 gem이 있습니까? Spreadsheet와 ParseExcel을 찾았지만 둘 다 XLSX 형식을 이해하지 못합니다.
방금 roo를 찾았 습니다. 작업을 수행 할 수 있습니다. 기본 스프레드 시트를 읽고 내 요구 사항에 적합합니다.
최근에 Ruby로 일부 Excel 파일을 구문 분석해야했습니다. 풍부한 라이브러리와 옵션이 혼란스러워서 블로그 게시물을 작성 했습니다.
다음은 다양한 Ruby 라이브러리와 지원되는 항목의 표입니다.
성능에 관심이 있다면 xlsx
라이브러리 비교 방법은 다음 과 같습니다.
여기에 지원되는 각 라이브러리로 xlsx 파일을 읽는 샘플 코드가 있습니다.
다음은 xlsx
다른 라이브러리가있는 파일 을 읽는 몇 가지 예입니다 .
루비 XL
require 'rubyXL'
workbook = RubyXL::Parser.parse './sample_excel_files/xlsx_500_rows.xlsx'
worksheets = workbook.worksheets
puts "Found #{worksheets.count} worksheets"
worksheets.each do |worksheet|
puts "Reading: #{worksheet.sheet_name}"
num_rows = 0
worksheet.each do |row|
row_cells = row.cells.map{ |cell| cell.value }
num_rows += 1
end
puts "Read #{num_rows} rows"
end
루
require 'roo'
workbook = Roo::Spreadsheet.open './sample_excel_files/xlsx_500_rows.xlsx'
worksheets = workbook.sheets
puts "Found #{worksheets.count} worksheets"
worksheets.each do |worksheet|
puts "Reading: #{worksheet}"
num_rows = 0
workbook.sheet(worksheet).each_row_streaming do |row|
row_cells = row.map { |cell| cell.value }
num_rows += 1
end
puts "Read #{num_rows} rows"
end
작은 만
require 'creek'
workbook = Creek::Book.new './sample_excel_files/xlsx_500_rows.xlsx'
worksheets = workbook.sheets
puts "Found #{worksheets.count} worksheets"
worksheets.each do |worksheet|
puts "Reading: #{worksheet.name}"
num_rows = 0
worksheet.rows.each do |row|
row_cells = row.values
num_rows += 1
end
puts "Read #{num_rows} rows"
end
simple_xlsx_reader
require 'simple_xlsx_reader'
workbook = SimpleXlsxReader.open './sample_excel_files/xlsx_500000_rows.xlsx'
worksheets = workbook.sheets
puts "Found #{worksheets.count} worksheets"
worksheets.each do |worksheet|
puts "Reading: #{worksheet.name}"
num_rows = 0
worksheet.rows.each do |row|
row_cells = row
num_rows += 1
end
puts "Read #{num_rows} rows"
end
다음은 라이브러리를 xls
사용하여 레거시 파일 을 읽는 예입니다 spreadsheet
.
스프레드 시트
require 'spreadsheet'
# Note: spreadsheet only supports .xls files (not .xlsx)
workbook = Spreadsheet.open './sample_excel_files/xls_500_rows.xls'
worksheets = workbook.worksheets
puts "Found #{worksheets.count} worksheets"
worksheets.each do |worksheet|
puts "Reading: #{worksheet.name}"
num_rows = 0
worksheet.rows.each do |row|
row_cells = row.to_a.map{ |v| v.methods.include?(:value) ? v.value : v }
num_rows += 1
end
puts "Read #{num_rows} rows"
end
루 보석 Excel 용 위대한 작품을 (.XLS 및 .XLSX)과 적극적으로 개발되고있다.
구문이 위대하거나 루비와 같지 않다는 데 동의합니다. 그러나 다음과 같이 쉽게 달성 할 수 있습니다.
class Spreadsheet
def initialize(file_path)
@xls = Roo::Spreadsheet.open(file_path)
end
def each_sheet
@xls.sheets.each do |sheet|
@xls.default_sheet = sheet
yield sheet
end
end
def each_row
0.upto(@xls.last_row) do |index|
yield @xls.row(index)
end
end
def each_column
0.upto(@xls.last_column) do |index|
yield @xls.column(index)
end
end
end
노코 기리를 사용하는 개울을 사용하고 있습니다. 빠릅니다. Macbook Air의 21x11250 xlsx 테이블에서 8.3 초를 사용했습니다. 루비 1.9.3 이상에서 작동합니다. 각 행의 출력 형식은 셀 내용에 대한 행 및 열 이름의 해시입니다. { "A1"=> "a cell", "B1"=> "another cell"} 해시는 키가있을 것이라는 보장을하지 않습니다. 원래 열 순서. https://github.com/pythonicrubyist/creek
dullard is another great one that uses nokogiri. It is super fast. Used 6.7 seconds on a 21x11250 xlsx table on my Macbook Air. Got it to work on ruby 2.0.0+. The output format for each row is an array: ["a cell", "another cell"] https://github.com/thirtyseven/dullard
simple_xlsx_reader which has been mentioned is great, a bit slow. Used 91 seconds on a 21x11250 xlsx table on my Macbook Air. Got it to work on ruby 1.9.3+. The output format for each row is an array: ["a cell", "another cell"] https://github.com/woahdae/simple_xlsx_reader
Another interesting one is oxcelix. It uses ox's SAX parser which supposedly faster than both nokogiri's DOM and SAX parser. It supposedly outputs a Matrix. I could not get it to work. Also, there were some dependency issues with rubyzip. Would not recommend it.
In conclusion, creek seems like a good choice. Other posts recommend simple_xlsx_parser as it has similar performance.
Removed dullard as recommended as it's outdated and people are getting errors/having problems with it.
If you're looking for more modern libraries, take a look at Spreadsheet: http://spreadsheet.rubyforge.org/GUIDE_txt.html. I can't tell if it supports XLSX files, but considering that it is actively developed, I'm guessing it does (I'm not on Windows, or with Office, so I can't test).
At this point, it looks like roo is a good option again. It supports XLSX, allows (some) iteration by just using times
with cell access. I admit, it's not pretty though.
Also, RubyXL can now give you a sort of iteration using their extract_data
method, which gives you a 2d array of data, which can be easily iterated over.
Alternatively, if you're trying to work with XLSX files on Windows, you can use Ruby's Win32OLE library that allows you to interface with OLE objects, like the ones provided by Word and Excel. However, as @PanagiotisKanavos mentioned in the comments, this has a few major drawbacks:
- Excel must be installed
- A new Excel instance is started for each document
- Memory and other resource consumption is far more than what is necessary for simple XLSX document manipulation.
But if you choose to use it, you can choose not to display Excel, load your XLSX file, and access it through it. I'm not sure if it supports iteration, however, I don't think it would be too hard to build around the supplied methods, as it is the full Microsoft OLE API for Excel. Here's the documentation: http://support.microsoft.com/kb/222101 Here's the gem: http://www.ruby-doc.org/stdlib-1.9.3/libdoc/win32ole/rdoc/WIN32OLE.html
Again, the options don't look much better, but there isn't much else out there, I'm afraid. it's hard to parse a file format that is a black box. And those few who managed to break it didn't do it that visibly. Google Docs is closed source, and LibreOffice is thousands of lines of harry C++.
I've been working heavily with both Spreadsheet and rubyXL these past couple weeks and I must say that both are great tools. However, one area that both suffer is the lack of examples on actually implementing anything useful. Currently I'm building a crawler and using rubyXL to parse xlsx files and Spreadsheet for anything xls. I hope the code below can serve as a helpful example and show just how effective these tools can be.
require 'find'
require 'rubyXL'
count = 0
Find.find('/Users/Anconia/crawler/') do |file| # begin iteration of each file of a specified directory
if file =~ /\b.xlsx$\b/ # check if file is xlsx format
workbook = RubyXL::Parser.parse(file).worksheets # creates an object containing all worksheets of an excel workbook
workbook.each do |worksheet| # begin iteration over each worksheet
data = worksheet.extract_data.to_s # extract data of a given worksheet - must be converted to a string in order to match a regex
if data =~ /regex/
puts file
count += 1
end
end
end
end
puts "#{count} files were found"
require 'find'
require 'spreadsheet'
Spreadsheet.client_encoding = 'UTF-8'
count = 0
Find.find('/Users/Anconia/crawler/') do |file| # begin iteration of each file of a specified directory
if file =~ /\b.xls$\b/ # check if a given file is xls format
workbook = Spreadsheet.open(file).worksheets # creates an object containing all worksheets of an excel workbook
workbook.each do |worksheet| # begin iteration over each worksheet
worksheet.each do |row| # begin iteration over each row of a worksheet
if row.to_s =~ /regex/ # rows must be converted to strings in order to match the regex
puts file
count += 1
end
end
end
end
end
puts "#{count} files were found"
The rubyXL gem parses XLSX files beautifully.
I couldn't find a satisfactory xlsx parser. RubyXL doesn't do date typecasting, Roo tried to typecast a number as a date, and both are a mess both in api and code.
So, I wrote simple_xlsx_reader. You'd have to use something else for xls, though, so maybe it's not the full answer you're looking for.
Most of the online examples including the author's website for the Spreadsheet gem demonstrate reading the entire contents of an Excel file into RAM. That's fine if your spreadsheet is small.
xls = Spreadsheet.open(file_path)
For anyone working with very large files, a better way is to stream-read the contents of the file. The Spreadsheet gem supports this--albeit not well documented at this time (circa 3/2015).
Spreadsheet.open(file_path).worksheets.first.rows do |row|
# do something with the array of CSV data
end
CITE: https://github.com/zdavatz/spreadsheet
RemoteTable 라이브러리 사용은 ROO 내부. 다양한 형식 (XLS, XLSX, CSV 등)의 스프레드 시트를 쉽게 읽을 수 있습니다. 원격 일 수도 있고 zip, gz 등에 저장 될 수도 있습니다.
require 'remote_table'
r = RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/02data.zip', :filename => 'guide_jan28.xls'
r.each do |row|
puts row.inspect
end
산출:
{"Class"=>"TWO SEATERS", "Manufacturer"=>"ACURA", "carline name"=>"NSX", "displ"=>"3.0", "cyl"=>"6.0", "trans"=>"Auto(S4)", "drv"=>"R", "bidx"=>"60.0", "cty"=>"17.0", "hwy"=>"24.0", "cmb"=>"20.0", "ucty"=>"19.1342", "uhwy"=>"30.2", "ucmb"=>"22.9121", "fl"=>"P", "G"=>"", "T"=>"", "S"=>"", "2pv"=>"", "2lv"=>"", "4pv"=>"", "4lv"=>"", "hpv"=>"", "hlv"=>"", "fcost"=>"1238.0", "eng dscr"=>"DOHC-VTEC", "trans dscr"=>"2MODE", "vpc"=>"4.0", "cls"=>"1.0"}
{"Class"=>"TWO SEATERS", "Manufacturer"=>"ACURA", "carline name"=>"NSX", "displ"=>"3.2", "cyl"=>"6.0", "trans"=>"Manual(M6)", "drv"=>"R", "bidx"=>"65.0", "cty"=>"17.0", "hwy"=>"24.0", "cmb"=>"19.0", "ucty"=>"18.7", "uhwy"=>"30.4", "ucmb"=>"22.6171", "fl"=>"P", "G"=>"", "T"=>"", "S"=>"", "2pv"=>"", "2lv"=>"", "4pv"=>"", "4lv"=>"", "hpv"=>"", "hlv"=>"", "fcost"=>"1302.0", "eng dscr"=>"DOHC-VTEC", "trans dscr"=>"", "vpc"=>"4.0", "cls"=>"1.0"}
{"Class"=>"TWO SEATERS", "Manufacturer"=>"ASTON MARTIN", "carline name"=>"ASTON MARTIN VANQUISH", "displ"=>"5.9", "cyl"=>"12.0", "trans"=>"Auto(S6)", "drv"=>"R", "bidx"=>"1.0", "cty"=>"12.0", "hwy"=>"19.0", "cmb"=>"14.0", "ucty"=>"13.55", "uhwy"=>"24.7", "ucmb"=>"17.015", "fl"=>"P", "G"=>"G", "T"=>"", "S"=>"", "2pv"=>"", "2lv"=>"", "4pv"=>"", "4lv"=>"", "hpv"=>"", "hlv"=>"", "fcost"=>"1651.0", "eng dscr"=>"GUZZLER", "trans dscr"=>"CLKUP", "vpc"=>"4.0", "cls"=>"1.0"}
참고 URL : https://stackoverflow.com/questions/3321011/parsing-xls-and-xlsx-ms-excel-files-with-ruby
'Programing' 카테고리의 다른 글
Scala : 문자열을 부울로 우아하게 변환 (0) | 2020.10.28 |
---|---|
Android에서 이미지를 공유하기 위해 "Share image using"공유 의도를 사용하는 방법은 무엇입니까? (0) | 2020.10.28 |
PHP에서 에코 내에 줄 바꿈을 추가하는 방법은 무엇입니까? (0) | 2020.10.28 |
숫자가 피보나치인지 테스트 (0) | 2020.10.28 |
Mac OS X Lion에서`gem install therubyracer` 실패 (0) | 2020.10.28 |