文章详情

短信预约-IT技能 免费直播动态提醒

请输入下面的图形验证码

提交验证

短信预约提醒成功

谁说爬虫只能 Python ?C# 爬虫开发与演示

2024-11-29 22:09

关注

一、C#爬虫开发的优势

二、C#爬虫开发实例

下面是一个简单的C#爬虫示例,用于从指定网页上抓取内容,并提取页面的。

1. 使用HttpClient获取网页内容

首先,我们需要使用HttpClient类来获取网页的内容。在C#中,HttpClient是一个强大的类,用于发送HTTP请求和接收HTTP响应。

using System;
using System.Net.Http;
using System.Threading.Tasks;

class Program
{
    static readonly HttpClient client = new HttpClient();

    static async Task Main(string[] args)
    {
        string url = "http://example.com"; // 替换为你想要爬取的网页URL
        string content = await GetWebPageContentAsync(url);
        Console.WriteLine(content); // 输出网页内容
    }

    static async Task GetWebPageContentAsync(string url)
    {
        HttpResponseMessage response = await client.GetAsync(url);
        response.EnsureSuccessStatusCode(); // 确保请求成功
        return await response.Content.ReadAsStringAsync(); // 读取响应内容为字符串
    }
}

2. 解析网页内容提取

获取到网页内容后,我们需要解析这些内容以提取所需的信息。在这个例子中,我们将使用正则表达式来提取HTML中的标签内容。</p><div><pre><code class="language-php"><code>using System; using System.Net.Http; using System.Text.RegularExpressions; using System.Threading.Tasks; class Program { // ...(省略HttpClient部分代码) static async Task Main(string[] args) { string url = "http://example.com"; // 替换为你想要爬取的网页URL string content = await GetWebPageContentAsync(url); string title = ExtractTitleFromHtml(content); Console.WriteLine($"The title of the page is: {title}"); // 输出网页 } static string ExtractTitleFromHtml(string html) { // 正则表达式匹配<title>标签内容 Regex titleRegex = new Regex(@"<title>\s*(.+?)\s*", RegexOptions.IgnoreCase); Match match = titleRegex.Match(html); if (match.Success) { return match.Groups[1].Value; // 返回标签内的内容 } else { return "No title found"; // 如果没有找到<title>标签,则返回此消息 } } }</code></code></pre></div><h3>三、注意事项与扩展</h3><ul data-id="u738a58b-R9X9WKAe"><li data-id="ld70c578-b1JU1dbJ">遵守网站爬虫协议:在开发爬虫时,务必遵守目标网站的robots.txt文件规定,以及相关法律法规。</li><li data-id="ld70c578-Mj49rEFo">处理反爬虫机制:一些网站可能会采取反爬虫措施,如设置验证码、限制访问频率等。在开发爬虫时,需要考虑这些因素,并采取相应的应对措施。</li><li data-id="ld70c578-CogUW4MZ">使用第三方库:为了更高效地解析HTML或XML,可以考虑使用如AngleSharp等第三方库,它们提供了更强大和灵活的功能。</li><li data-id="ld70c578-lnzMhkJe">错误处理和日志记录:在实际应用中,应加入适当的错误处理和日志记录机制,以便在爬虫遇到问题时能够及时发现并解决。</li><li data-id="ld70c578-EAihsPp1">多线程与异步编程:为了提高爬虫的效率,可以利用C#的多线程和异步编程特性,同时抓取和分析多个网页。</li></ul><h3>四、结语</h3><p>虽然Python在爬虫开发领域具有广泛的应用,但C#同样能够胜任这一任务。通过本文的示例代码,我们可以看到C#在爬虫开发中的潜力和优势。无论是性能、类型安全还是库支持方面,C#都展现出了不俗的表现。希望本文能激发更多开发者尝试使用C#进行爬虫开发的热情。</p></div><div class="readOriginal"><a href="https://mp.weixin.qq.com/s?__biz=Mzg3ODAxNzM5OQ==&mid=2247502920&idx=1&sn=855d15416f62b58b2a021c3285abd07e&chksm=cf18a250f86f2b46b13b606b99a68d8b0e0373ef0f4597ddeb7c5b341ff2a3598bbe41f70417&mpshare=1&scene=23&srcid=0531NnXIJ4Qo" class="original">来源:程序员编程日记</a><a href="/api/report.php?target=https://m.528045.com/article/f31d095135.html" class="complain"><span class="artM art_jinggao"></span>内容投诉</a></div><div class="myShow contentBtmshow"><div class="mzsming"><p class="mzsm_title">免责声明:</p><p>① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的,并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据,供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。 </p><p>② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341</p></div></div><div id="downloader-container" class="page-downloader-container"><div class="page-downloader-tip clear"><h2><span>软考中级</span>精品资料免费领 </h2><ul class="page-downloader-tip-list clear"><li><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFoAAABaCAMAAAAPdrEwAAAAnFBMVEX/ZgD/+fH/1LX/r3n/ij3/izz/4Mn/yKH/l1D/pmn/y6X/17r/k0r/gi3/qW7/nVr/bw7/j0T/jkH/wpf/o2X/8eX/uIf/m1f/bAn/59P/chT/u4v/hTP/3ML/fyj/zan/0a//8OL/tYL/79//zqv/dxv/v5L/6NX/eR//n17/xJr/mVP/hzb/gy//fCX/eyL/483/dRj/6tn/gCvKDcjkAAACEklEQVR4nO2ZaXOCMBCGg1RQVMQD76Nqa61ae/3//9bdhA6EaUIiYRw7eb+47IQHJyS7m4U4WXnhgpTXE4NxaDcyAAYd2tMc+tEMGBWNePTZHJqQIIuOzTC7g0ETfqJNBl0zQg5+Ua5xNGPVCalXhIYpiSpCu6l5F2iPalgFOiMxutGoCu07TqtNrVUrG2h4xb4+eomuTQesuhiM+tJFH5hvDOZcjnZO2ftqqV+EXjBfCGYSfoXCMTroyQv19cB0C9A4RgdNhug65of/Ibo7tNAYHOnCB7VdoZp9/i4l9HWyaIu2aIu+Z/TY9zF9oRrioBpyiUAjFbBiHnK7RNqpIElgGOiLEhiXDBTQHebDIpxWzBLhGB30O/O9grkvQHd155pO8O4brJWc7OmvECifksKsLinMNmv+LrV13cutKyXdestYtEVbtEWXQs9mVaEhJc6fqSWL13HAxuig1+haTsC67MRkEN/CU0B/Mh82F7ZSsuOwVKSOTs78WGTIT6S5QkQBPUv/NTQapeLaJipzHaKLNTvkZL4horZCptN5Ym5rQgV8XXb7LWPRFm3R/wadni65444JdCrs+lo0Xrosqo/EAV8o+LQjR6eXurLonLA9IESvCOlfjx7J0LgHi+ojsT7y6DiD3qNjQXf6wxXKkc+ZxxLnLf/gMqL1bRr5yn7qShVxCwKD6vF0MQHuND3uPfwANQcar49so8oAAAAASUVORK5CYII=" alt="" class="page-downloader-tip-item-icon"><span class="page-downloader-tip-item-title">历年真题</span><span class="page-downloader-tip-item-subtitle">答案解析</span></li><li><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFoAAABaCAMAAAAPdrEwAAAAq1BMVEUZhf8AAAD///8Zhv9TpP8njf+Mwv/U6P+p0v82lP/G4f/i8P9EnP/x9/9vsv+42f/V6f+byv8ahf9hq/9So/8ZhP8Xhv8ZhP8Zhv8Xh/8kkv/F4P8ahP8cf/8Zhf8civ8Yhf8Zhf8ahf8Zhf8ahf8Vf/8Zhf8Yhv8ZhP8Sf/8Zhv8YhP8Yhf8ahf8Zhf8Zhf8Whf8Zhf8YhP8ZhP8Zh/8ZhP8gf/8Yg/8Zhv9PI83jAAAAOXRSTlP/AP+///////////////////+p//9bTMLpIQf/OxKHJXK7UPmWDJAqaA58VbKf888X8V+mSB8QgNRFi0VFAAAB2UlEQVR4nO3Z6Y6CMBQFYEBgQBH3fXd09n1//ycbsCVit7T0NpmJPb+U6Be4HBoIjnvMfrDcvDra6d4jrkTPn/XZQx5J+gYIdpwaQc/BZJLeXxqjd3AyQU8AZYLemqPvzNFdc3TN0udGJ/0w7PUDcDqJGh5KKwWl4wi7hzR8ONpvluUsHSg6JeB8KjB0QsueF0HQATkNlB4AfXIGj2nSLVSl45MR+7Ffx59DbbpTkpHWwhXUpi9KfUZbAvw10aTL8yha0eZMRJH2GWXGBwJJN1ApigOpa9InV+JhIsEMiC7vtefN0n5arIDaA2Fe5TB00TRGqKVVjQ5KtSbSpH6sRAtkxtqnQotkL9ahhTK9OinQQrnN+IM0LZQZq7U8Ld5n5t2IJC2UO+z7HDmakvFCmieiu6FAk3JeBz+sZwl7/DszGZraZ0bTqtH0nKFoxhkEolndgKFJOfKz8DqhRlc7g1J0tVmcBx2FWTgPF5q0gmrpP0bn1zWO4Jn5/zz+W9rSlra0peXo0WCBvlcOj/7WdPn0z60xeq0v8+hrI/R7vunFCL3ONw2N0B/5poepCfoTdX1lgMZvqsa7CTy9MPcq0x2Yo92hOdpdbYDoLUW707er0ZM2PF5+Ie4XoagbkMfsC8wAAAAASUVORK5CYII=" alt="" class="page-downloader-tip-item-icon"><span class="page-downloader-tip-item-title">备考技巧</span><span class="page-downloader-tip-item-subtitle">名师总结</span></li><li><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFoAAABaCAMAAAAPdrEwAAAA8FBMVEVBUv8AAAD///9AUv9BUf9xff9lc/9AUv9NXf+Jk/9ZaP/09f/P1P+gqf/z9P/c3/9AUv99iP/Q1P+Unv/n6f/o6v/Eyf+stP+4v/+3vv9BUv9BUv89Uv9CUv9BUP9BUv9BUv9CUv9BUv9VVf9CVP9DVf9BUv9AU/9BU/9AUv9BUv9BUv9AUv9AUv9CUv9CU/9BUv9BUv9BUv9BUv9BU/9BUv9CUv9DUf8/UP88S/8qVf9BUv9BUv9CUv9AUv9CUv8zM/9BUv9AUf9BUv9JSf9AUP9AUf9ATf9CUv9HVf9BUv9AVf9DTv9DUv9CUv9CUf/aK+25AAAAUHRSTlP/AP+Av///f///////////qv////////////ngGYxDp/ig7QZGKnmxa2A71GfGUNFy5c2kU8k5Ey0RBqzcYkxkBbdb8wcglxQyEuoYFyI+VSPfhnoAAARySURBVHiczZkJc5s6EMdhHfTEXYPBdpqzOdokbY6+9O5r3331+v7fpiskYYMkjCCZ6c5kBmvtX5a/VrtCOK7BXp29u3rkdNvE9OPKDOidyc0G7Ga2Hn1y0Qe8ga1Fn/cEd7N16J3t/ugOtg792ILcwdagn1qRzWwV/eahJdrEVtG3tmQTW0Xv90c+6GS30EfHH7f6o91OdgO9t3/Qn8vQnex19K4Vt0J3sdfQ9bds0B3sFdoazNFmdo1+MRRtZEt0rbPNRLaVbLKFe08431/aCKPMUoMt3Icr1wC0ns3db7kM+5bTucLo2Nx9zMdfDkbr2Nx9Vo1u/zkcrWFz92k1+Nw2vxsJobC5e1KNbY1CK+xR6EnTWuxR6O7WMADtefeFDgDoPaE9gE1hD0EHeTkDtFmZB3eK9iKGlRaZY7dFByUDlvEcYBHza1PklmgvASh8Sr0lwNKj1C8AEkPgdmgfIMwdv5Ykmjp5COCPRyO5IKSpNSEmtg2aJJBSxEPqByz5qJ/iB5+mkJCR6BmEhEnCYhR57bOQMe7ZOHSONCSmVUbIJRNg4Gw0H4OmCWQUA+frmwDwpKMYMs0gGYNGKYIYQKrqT8UF/pMY/9SZ7I8uIXMSWKiOBYacQjkCjYGxwPkHOo+iuSh9WASnOA/D0XjPZAFpnYZoMuVSmONEKvnXGz3FuFKpRybWi1Qkw3uS2tui/QjLKKukEQrB7oAbcVAYtjrxbxa1ZrInWoTJjSU1Nw+jXVk2CE1ihGMNDeM4nrKJ4xY4UxzAFYmuLG6pbaV11tJahMm1VkprbzRG6s1lrcCKxKqUyL4ZxCiR0hH653UIOUYuYqN+WfqCzMpJDqHyg/5oltSFzLd1i6DAwNVl2h/NQsZFJzMsjsUFLlEfA1fS2qaoFqh0XfXryocXGQZdqN+3QGPYOVb9pIpP1utpgv2BVfJxXQZTzmOdkNUljqbY2EOC163VYo3Gqp8QwrYHS1bq8umSbR0I1qpCt/+z6uhV86a4u1nZnPXhUNd1LfchjL0MghgjZ1bEQVBJov2y5e6JdVlYEodiK8upQxAs+vAdbCdZMYKEVdIyYg0hlPl9BztVOhdyVJLILnZXW3cSL1CYdNEuoz/0UwGzQO517uEJLOh61vhhnxuHo21OmA12akBbn9aqdmZAPxuPPjag+dnIGDt4a0D/Oxp9WHEaaH6K4/48Fr2nov/nQyeb3vRssF2OEehf+OAnPvZkFPmFIAu0eBFzLQb5TQw0SRbof/job3L022Dwg5osDz7fc8cHOXw55GAYTeq8hv6bex7X40eTK2vuwb7IjQZazKPzdM31+vbJT32MJ+vWx+OjdXCN/iwrx2XT3cN2eK5WR706tHst7+rwP0u0WGG3RvSvzyX74ncb8ImoZg/fGNHuh9V0PDt9+VdLN4Od7/6hmaQ2etwaXKWWDj2Gvb3TjXa/Dm4w5zq1GmfQr8XbCEu7ONFOROt4e8/u5Suzm4lODRXtul+uty5u+nb1R1fvzl7pwa77HbbVPOgWq5ZIAAAAAElFTkSuQmCC" alt="" class="page-downloader-tip-item-icon"><span class="page-downloader-tip-item-title">高频考点</span><span class="page-downloader-tip-item-subtitle">精准押题</span></li></ul><button type="button" lay-on="showLoginPopup" class="el-button page-downloader-tip-button analytics-el el-button--primary"><span>获取网盘下载链接 </span></button></div></div><div class="heigh10"></div><div class="layui-tab layui-tab-brief"><ul class="layui-tab-title"><li class="layui-this">资料下载</li><li>历年真题</li></ul><div class="layui-tab-content"><div class="layui-tab-item layui-show"><div class="ziliao-box-new"><ul><li><div class="ziliao-icon ziliao-icon-pdf"></div><div class="info"><div class="name"><a href="javascript:void(0);" lay-on="showLoginPopup">2024上半年软考中级软件测评师考试基础知识真题</a></div><p><span>193.9 KB</span><span>下载数265</span></p></div><button class="download-btn"><a style="color: white;" href="javascript:void(0);" lay-on="showLoginPopup">查看</a></button></li><li><div class="ziliao-icon ziliao-icon-pdf"></div><div class="info"><div class="name"><a href="javascript:void(0);" lay-on="showLoginPopup">2024上半年软考中级软件设计师考试基础知识真题</a></div><p><span>191.63 KB</span><span>下载数245</span></p></div><button class="download-btn"><a style="color: white;" href="javascript:void(0);" lay-on="showLoginPopup">查看</a></button></li><li><div class="ziliao-icon ziliao-icon-pdf"></div><div class="info"><div class="name"><a href="javascript:void(0);" lay-on="showLoginPopup">2023下半年-系统集成项目管理工程师-真题考点汇总(完整版)</a></div><p><span>143.91 KB</span><span>下载数1148</span></p></div><button class="download-btn"><a style="color: white;" href="javascript:void(0);" lay-on="showLoginPopup">查看</a></button></li><li><div class="ziliao-icon ziliao-icon-pdf"></div><div class="info"><div class="name"><a href="javascript:void(0);" lay-on="showLoginPopup">2023年下半年系统集成项目管理工程师第一、二、三批次真题考点整理(考友回忆版)</a></div><p><span>183.71 KB</span><span>下载数642</span></p></div><button class="download-btn"><a style="color: white;" href="javascript:void(0);" lay-on="showLoginPopup">查看</a></button></li><li><div class="ziliao-icon ziliao-icon-pdf"></div><div class="info"><div class="name"><a href="javascript:void(0);" lay-on="showLoginPopup">2023年上半年软考中级《系统集成项目管理工程师》-基础知识-考试真题及答案</a></div><p><span>644.84 KB</span><span>下载数2756</span></p></div><button class="download-btn"><a style="color: white;" href="javascript:void(0);" lay-on="showLoginPopup">查看</a></button></li></ul></div></div><div class="layui-tab-item"><div class="exam-box-new"><ul><li><p>2024年上半年信息系统项目管理师第二批次真题及答案解析(完整版)</p><div> 难度  <span></span><span></span><span></span><em></em><em></em>    813人已做 </div><a class="download-btn see-btn" href="javascript:void(0);" lay-on="showLoginPopup"> 查看 </a></li><li><p>【考后总结】2024年5月26日信息系统项目管理师第2批次考情分析</p><div> 难度  <span></span><span></span><span></span><em></em><em></em>    354人已做 </div><a class="download-btn see-btn" href="javascript:void(0);" lay-on="showLoginPopup"> 查看 </a></li><li><p>【考后总结】2024年5月25日信息系统项目管理师第1批次考情分析</p><div> 难度  <span></span><span></span><span></span><em></em><em></em>    318人已做 </div><a class="download-btn see-btn" href="javascript:void(0);" lay-on="showLoginPopup"> 查看 </a></li><li><p>2024年上半年软考高项第一、二批次真题考点汇总(完整版)</p><div> 难度  <span></span><span></span><span></span><em></em><em></em>    435人已做 </div><a class="download-btn see-btn" href="javascript:void(0);" lay-on="showLoginPopup"> 查看 </a></li><li><p>2024年上半年系统架构设计师考试综合知识真题</p><div> 难度  <span></span><span></span><span></span><em></em><em></em>    224人已做 </div><a class="download-btn see-btn" href="javascript:void(0);" lay-on="showLoginPopup"> 查看 </a></li></ul></div></div></div></div><div class="article_relate"><div class="relateTop"><h3>相关文章</h3><span class="intro">发现更多好内容</span></div><ul class="clearfix"><li><a href="https://m.528045.com/article/r88aox9trf.html" title="Java 的 domain 具体有哪些合法的格式呢?(Java的domain有哪些合法格式)">Java 的 domain 具体有哪些合法的格式呢?(Java的domain有哪些合法格式)</a></li><li><a href="https://m.528045.com/article/8ablrjv54z.html" title="Java 中 shuffle 函数的参数该如何设置?(Java中shuffle函数的参数设置)">Java 中 shuffle 函数的参数该如何设置?(Java中shuffle函数的参数设置)</a></li><li><a href="https://m.528045.com/article/sj415csctu.html" title="Java 重构到底有哪些作用呢?(Java重构有什么用)">Java 重构到底有哪些作用呢?(Java重构有什么用)</a></li><li><a href="https://m.528045.com/article/8k0bi494bp.html" title="在 Java 中如何利用 Vector 来定义二维数组?(java中怎么用vector定义二维数组)">在 Java 中如何利用 Vector 来定义二维数组?(java中怎么用vector定义二维数组)</a></li><li><a href="https://m.528045.com/article/ahshdwywad.html" title="为何 Java 环境变量配置总是难以成功?(java环境变量配置为什么不成功)">为何 Java 环境变量配置总是难以成功?(java环境变量配置为什么不成功)</a></li><li><a href="https://m.528045.com/article/ny02oo888i.html" title="如何配置 Java 文件上传接口?(java文件上传接口怎么配置)">如何配置 Java 文件上传接口?(java文件上传接口怎么配置)</a></li><li><a href="https://m.528045.com/article/pflnf8s9ba.html" title="如何在 JAVA 中直接读取文件流内容?(JAVA怎么直接读取文件流内容)">如何在 JAVA 中直接读取文件流内容?(JAVA怎么直接读取文件流内容)</a></li><li><a href="https://m.528045.com/article/erdjseymmi.html" title="如何将 Java 文件转换为可执行文件?(java文件如何变成可执行文件)">如何将 Java 文件转换为可执行文件?(java文件如何变成可执行文件)</a></li><li><a href="https://m.528045.com/article/7hv55i6hvb.html" title="如何合理设置缓存池大小?(缓存池大小如何设置)">如何合理设置缓存池大小?(缓存池大小如何设置)</a></li><li><a href="https://m.528045.com/article/ls86up6df4.html" title="如何轻松进行 java 反汇编?超详细步骤教你快速上手!(如何进行java反汇编)">如何轻松进行 java 反汇编?超详细步骤教你快速上手!(如何进行java反汇编)</a></li></ul></div><div class="recommendArticle"><div class="title"><h3>猜你喜欢</h3><span class="intro">AI推送时光机</span></div><div class="list list_wrap"><div class="articleInfor "><a href="/article/f31d095135.html"><div class="topCon clearfix"><h3 class="tit" style="width: 100%!important;">谁说爬虫只能 Python ?C# 爬虫开发与演示</h3></div></a><div class="info"><a href="https://m.528045.com/article/program-c4-1.html"><span class="icon icon_flag">后端开发</span></a><a href="/tag/C#/" title="C#" class="ren-summary-tag t" style="color: #fff!important;background-color: #958ef2;">C#</a><a href="/tag/爬虫/" title="爬虫" class="ren-summary-tag t" style="color: #fff!important;background-color: #9961dd;">爬虫</a><span class="time">2024-11-29</span></div></div><div class="articleInfor "><a href="/article/6b9e54ccaf.html"><div class="topCon clearfix"><h3 class="tit" style="width: 100%!important;">springboot卡塔尔世界杯门户网站的设计与开发(免费领源码、附论文)可做计算机毕业设计JAVA、PHP、爬虫、APP、小程序、C#、C++、python、数据可视化、大数据、全套文案40685</h3></div></a><div class="info"><a href="https://m.528045.com/article/program-c4-1.html"><span class="icon icon_flag">后端开发</span></a><span class="time">2023-10-09</span></div></div></div></div><div class="breadNav"> 位置:<a class="LinkPath" href="http://m.528045.com/">首页</a>-<a class="LinkPath" href="https://m.528045.com/article/">资讯</a>-<a href="https://m.528045.com/article/program-c4-1.html">后端开发</a></div><div class="noMoreData"> 咦!没有更多了?去看看其它<a href="https://m.528045.com/">编程学习网</a> 内容吧 </div></div><div class="popCommon"></div><div class="btmNav"><a href="/" class="btmNavItem"><img src="https://static.528045.com/m/index.svg"><span class="name">首页</span></a><a href="/course/" class="btmNavItem"><img src="https://static.528045.com/m/wish.svg"><span class="name">课程</span></a><a href="/down/" class="btmNavItem"><div class="guide"></div><img class="pubImg" src="https://static.528045.com/m/btn_new.png"><span class="name">资料下载</span></a><a href="/ask/" class="btmNavItem"><img src="https://static.528045.com/m/msg.svg"><span class="name">问答</span><span class="num"></span></a><a href="/article/" class="btmNavItem btmMe on"><img src="https://static.528045.com/m/me_on.svg"><span class="name">资讯</span></a></div><script src="https://m.528045.com/static/layui/layui.js" type="text/javascript"></script><script src="https://m.528045.com/static/js/custom-script.js" type="text/javascript"></script><script src="https://m.528045.com/static/js/indexsms.js?v=20240108.1443"></script><script src="https://m.528045.com/static/skin/static/js/content.js"></script></body></html>