简介:
平时会遇到不同的需求:Json 转化表格;表格转化Json..... 但这里转换的是不规则的表格(如下图),如何转换?
前提:这个表单保存在数据库一个字段里面!
常用方法:
1.JS脚本转换 主要是Jquery等方法,比较好用
2. Python 的模块解析SGMLParser等
3. 安装Nodejs 去解析 服务器端执行(有点大材小用)
表单如下:
HTML内容如下:
<table border="1" cellpadding="0" cellspacing="0" style="border-bottom: medium none; border-left: medium none; border-collapse: collapse; border-top: medium none; border-right: medium none" width="650"><tbody><tr style="height: 30px"><td style="border-bottom: windowtext 1pt solid; border-left: windowtext 1pt solid; padding-bottom: 0cm; padding-left: 5.4pt; width: 215px; padding-right: 5.4pt; height: 30px; border-top: windowtext 1pt solid; border-right: windowtext 1pt solid; padding-top: 0cm"><p align="right" style="text-align: right"><span style="font-family: 宋体"><span style="font-size: 12pt">应用名称<span style="font-size: 12pt"><span style="font-family: calibri">(</span></span><strong><span style="color: red"><span style="font-family: 宋体"><span style="font-size: 12pt">必填</span></span></span></strong><span style="font-size: 12pt"><span style="font-family: calibri">)</span></span></span></span></p></td><td style="border-bottom: windowtext 1pt solid; padding-bottom: 0cm; padding-left: 5.4pt; width: 151px; padding-right: 5.4pt; height: 30px; border-left-color: #f0f0f0; border-top: windowtext 1pt solid; border-right: windowtext 1pt solid; padding-top: 0cm"><p><br></p></td><td style="border-bottom: windowtext 1pt solid; padding-bottom: 0cm; padding-left: 5.4pt; width: 123px; padding-right: 5.4pt; height: ..........................................................................................很长很长。。。。
URL:http://t.mreald.com/py.html
现在使用python 去解析:
1. 常用的解析模块:
HTMLParser、SGMLParser、pyQuery、BeautifulSoup
下载:http://www.crummy.com/software/BeautifulSoup/bs4/download/
文档:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#
2.现在使用BeautifulSoup
代码如下:
import urllib2 from bs4 import BeautifulSoup content = urllib2.urlopen('http://t.mreald.com/py.html').read() soup = BeautifulSoup(content, 'html.parser') #print(soup.prettify()) i=0 j=0 for tritem in soup.find_all('tr'): if i in [0,5,6,7,8,9,10,11,12]: print tritem.find_all('td')[0].get_text()+' '+tritem.find_all('td')[1].get_text() i+=1;continue elif i == 4: print tritem.find_all('td')[1].get_text()+' '+tritem.find_all('td')[2].get_text() i+=1; continue elif i == 3: print tritem.find_all('td')[0].get_text()+' '+tritem.find_all('td')[1].get_text().strip(' ') #print tritem.find_all('td')[0].get_text()+' '+tritem.find_all('td')[1].get_text().strip(' ')+tritem.find_all('td')[2].get_text()+tritem.find_all('td')[3].get_text() i+=1; continue else: print tritem.find_all('td')[0].get_text()+' '+tritem.find_all('td')[1].get_text() print tritem.find_all('td')[2].get_text()+' '+tritem.find_all('td')[3].get_text() i+=1; continue
执行结果:
参考资料:
http://www.cnblogs.com/bluestorm/archive/2011/06/20/2298174.html
http://www.cnblogs.com/whitewolf/archive/2013/02/27/2935618.html