博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Python 爬虫 数据清洗 去掉 超链接
阅读量:6422 次
发布时间:2019-06-23

本文共 1477 字,大约阅读时间需要 4 分钟。

有时候我们需要清洗数据,里面有超链接,怎么去掉他们,比如下面的问题

Provenance

Brand New Gallery, Milan

Acquired from the above by the present owner

Exhibited

Milan, Brand New Gallery, This is the story of America. Everybody's doing what they think they're supposed to do, November 21, 2013 - January 11, 2014

  • Artist Bio

    Ethan Cook

    American • 1983

    New York-based artist Ethan Cook is known for his abstract paintings on self-produced canvases. More recently, he has used handwoven strips of cotton and linen to create painterly compositions. Cook's woven canvases are contemporary in their minimalist focus on shape and color while referencing one of the most traditional art forms, weaving. Cook weaves his own canvases on a loom and juxtaposes these with  store-bought canvas sheets in abstract arrangements. For the artist, the surface of th e canvas itself becomes the foc us of his practice. Using simple geometric shapes and a l imited color palate, Cook's works nurture structural s implicity.

    View More Works
  •  

     

    第一种方法:

      用这则替换,把 href 替换为 hre1f 就可以了,

    第二种方法:

    result_div_list = re.findall('<(.*?)>',str(result_div))            if 'href' in str(result_div_list):            for ii in result_div_list:                if 'href' in ii:                    item_desc = str(result_div).replace(str(ii) ,'')        else:            item_desc = result_div

    记录下来,供以后学习参考 

     

    转载地址:http://vcpra.baihongyu.com/

    你可能感兴趣的文章
    tinymce4.x 上传本地图片(自己写个插件)
    查看>>
    极客学院职业路径图课程视频下载-爬虫
    查看>>
    java,使用get、post请求url地址
    查看>>
    基于Maven构建Web项目
    查看>>
    Linux下修改Mysql的用户(root)的密码
    查看>>
    Reactjs 的 PropTypes 使用方法
    查看>>
    linux开机流程
    查看>>
    【转载】反向代理为何叫反向代理?
    查看>>
    Windows 7环境下安装PHP 5.2.17
    查看>>
    mount(挂载)
    查看>>
    使用swoole编写简单的echo服务器
    查看>>
    简明现代魔法博客图书馆之php学习记录
    查看>>
    深入了解java线程池
    查看>>
    API接口自动化之2 处理http请求的返回体,对返回体做校验
    查看>>
    六种方法实现CSS三栏布局
    查看>>
    AIO-3128C四核高性能主板
    查看>>
    代码文件的编码不统一导致的坑
    查看>>
    20145240《信息安全系统设计基础》第十二周学习总结
    查看>>
    物理机_双机调试_资料
    查看>>
    slice,substr和substring的区别
    查看>>