两个月多没写代码,这两天为了下载个500集的豆单,并把flv转化为mp3
第一,利用http://www.flvcd.com/,输入豆单地址,解析豆单的flv地址flvUrlParse.htm
第二,利用Internet Download Manager,解析flvUrlParse.htm,下载到
F:/download/flv/
第三,转化为Mp3
第四,正则提取flvUrlPasrse.htm,把转换后的mp3名字整理。
结果本来很简单的东西,却写了两天多。。。。
在好几个sb的错误上花费了很多时间:
- while(fin.good()&&!fin.eof()){
- fin.getline(szBuffer,LEN);
上面代码是死循环。。。。
直接判断fin.getline的返回值即可。
然后,利用boost的xpressive做正则
- sregex rex = sregex::compile( "<td [^>]*><a href=\"([^\"]*/(\\d*).flv?[^\"]*)\"[^>]*>([^>]*)</a></td>");
- int const subs[] = { 1 ,2 ,3};
- sregex_token_iterator cur( strBuffer.begin(), strBuffer.end(), rex, subs );
- sregex_token_iterator end;
- for( ; cur != end; ++cur )
- {
- std::cout << *cur << endl;
- //getchar();
- }
subs 是获取的匹配
0为全匹配串
-1为未匹配串
1,2,3依次为子匹配串
boost库的xpressive确实强大,唯一的缺点就是模板的编译时间太长了,引入xpressive后,编译有3秒直接飙升到了1分钟多。就算添加了预编译头,还需要近20秒,不过实话说,预编译头还是有作用的
后来想想,编译代价太大了,改用python来写。。。。。
然后,悲剧又开始了
- if __name__ == "main":
….第一起悲剧,然后
- os.remove(flvFilename);
抛出WindowsError: [Error 2] 。。。。
开始以为是中文文件名的问题,字符编码集改来改去。未成功。才想起来以前写过类似删文件的代码,过去以看,才意识到,需要判断文件是否存在,不存在会抛出异常-__-!!!
手生了很多啊。。
下载: changeDDName.py
- # -*- coding: utf-8 -*-
- import os.path;
- import glob;
- import sys;
- import re;
- flvdir = u"f:/download/莆仙戏";
- mp3dir = u"f:/download/莆仙戏/mp3";
- html = u"F:/tmp/parse.php.htm";
- ff = os.path.join(mp3dir,"*.mp3");
- regexPattern = "<td.*?.*?>.*?(\d*):</td>\s*<td.*?><a href=\"([^\"]*/(\\d*)\.flv\?[^\"]*)\".*?>.*?-([^>]*)</a></td>";
- flvMap = {};
- def doClean():
- print u"删除已经存在的文件...."
- for file in glob.glob(ff):
- baseFilename = os.path.basename(file);
- baseFilename = os.path.splitext(baseFilename);
- flvFilename = os.path.join(flvdir,baseFilename[0]+".flv");
- print flvFilename;
- try:
- os.remove(flvFilename);
- except:
- pass;
- def loadHtml():
- print u"构建名字树...."
- fin = open(html);
- htmlContent = "".join(fin.readlines());
- regex = re.compile(regexPattern,re.IGNORECASE);
- result = regex.findall(htmlContent,re.M);
- if(result):
- for item in result:
- flvMap[ item[2] ]= {"no":item[0],
- "href":item[1],
- "title":item[3]};
- for key in flvMap.keys():
- print getFormatTitle(key);
- def getFormatTitle(key):
- if key in flvMap:
- result = "%03d.%s" % (
- int( flvMap[key]["no"] ),
- flvMap[key]["title"]);
- #str -->unicode need docode fuction
- return result.decode("gbk")
- else:
- print "error@no such key:" + key;
- def rename():
- for file in glob.glob(ff):
- baseFilename = os.path.basename(file);
- baseFilename = os.path.splitext(baseFilename);
- newBaseName = getFormatTitle(baseFilename[0]);
- print newBaseName;
- if(newBaseName):
- newName = os.path.join(mp3dir,newBaseName + baseFilename[1] );
- os.rename(file,newName);
- #ren = "rename "+file +" ---> " + newName;
- #print unicode(ren);
- def reanmeAg():
- reg = "(.*)\\\\(\\d*?)\\.(.*)";
- reg = re.compile(reg,re.IGNORECASE);
- for file in glob.glob(ff):
- result = reg.search(file);
- newFilename = u"%s/%03d.%s" % (
- result.group(1) ,
- int( result.group(2) ),
- result.group(3));
- os.rename(file,newFilename);
- if __name__ == "__main__":
- doClean();
- loadHtml();
- rename();
- reanmeAg();
No related posts.
评论
发表评论 反向链接