下面要为大家带来的是利用java的HttpClient实现多线程并发示例,有兴趣的朋友来详细了解下吧。
以下代码基于httpclient4.5.2实现。我们要使用java的HttpClient实现get请求抓取网页是一件非常容易实现的工作:
public static String get(String url) {
CloseableHttpResponseresponse = null;
BufferedReader in = null;
String result = "";
try {
CloseableHttpClienthttpclient = HttpClients.createDefault();
HttpGethttpGet = new HttpGet(url);
response = httpclient.execute(httpGet);
in = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
StringBuffersb = new StringBuffer("");
String line = "";
String NL = System.getProperty("line.separator");
while ((line = in.readLine()) != null) {
sb.append(line + NL);
}
in.close();
result = sb.toString();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (null != response) response.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return result;
}
要多线程执行get请求时上面的方法也堪用。但是这种多线程请求是基于在每一次调用get方法时创建一个HttpClient实例实现的。每个HttpClient实例使用一次即被回收。这显然不是一种最优的实现方法。
HttpClient提供了多线程请求方案,可查看官方文档的《 Pooling connection manager 》这一节。HttpCLient实现多线程请求是基于内置的连接池实现的,其中有一个关键的类即PoolingHttpClientConnectionManager,这个类负责管理HttpClient连接池。在PoolingHttpClientConnectionManager中提供了两个关键的方法:setMaxTotal和setDefaultMaxPerRoute。setMaxTotal设置连接池的最大连接数,setDefaultMaxPerRoute设置每个路由上的默认连接个数。另外还有一个方法setMaxPerRoute—单独为某个站点设置最大连接个数,比如:
HttpHosthost = new HttpHost("locahost", 80);
cm.setMaxPerRoute(new HttpRoute(host), 50);
根据文档稍稍调整下我们的get请求实现:
package com.zhyea.robin;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class HttpUtil {
private static CloseableHttpClienthttpClient;
static {
PoolingHttpClientConnectionManagercm = new PoolingHttpClientConnectionManager();
cm.setMaxTotal(200);
cm.setDefaultMaxPerRoute(20);
cm.setDefaultMaxPerRoute(50);
httpClient = HttpClients.custom().setConnectionManager(cm).build();
}
public static String get(String url) {
CloseableHttpResponseresponse = null;
BufferedReaderin = null;
String result = "";
try {
HttpGethttpGet = new HttpGet(url);
response = httpClient.execute(httpGet);
in = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
StringBuffersb = new StringBuffer("");
String line = "";
String NL = System.getProperty("line.separator");
while ((line = in.readLine()) != null) {
sb.append(line + NL);
}
in.close();
result = sb.toString();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (null != response) response.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return result;
}
public static void main(String[] args) {
System.out.println(get("https://www.wei2008.com/"));
}
}
这样差不多了。但是对于本人而言,本人更喜欢httpclient的fluent实现,例如我们刚才实现的http get请求完全能够这样简单的实现:
package com.zhyea.robin;
import org.apache.http.client.fluent.Request;
import java.io.IOException;
public class HttpUtil {
public static String get(String url) {
String result = "";
try {
result = Request.Get(url)
.connectTimeout(1000)
.socketTimeout(1000)
.execute().returnContent().asString();
} catch (IOException e) {
e.printStackTrace();
}
return result;
}
public static void main(String[] args) {
System.out.println(get("https://www.wei2008.com/"));
}
}
我们要做的只是将以前的httpclient依赖替换为fluent-hc依赖:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>fluent-hc</artifactId>
<version>4.5.2</version>
</dependency>
并且这个fluent实现天然就是采用PoolingHttpClientConnectionManager完成的。它设置的maxTotal和defaultMaxPerRoute的值分别是200和100:
CONNMGR = new PoolingHttpClientConnectionManager(sfr);
CONNMGR.setDefaultMaxPerRoute(100);
CONNMGR.setMaxTotal(200);
唯一的一点让人不爽的是Executor没有提供调整这两个值的方法。但是这也完全够用了,实在是不行的话,还可考虑重写Executor方法,接着直接使用Executor执行get请求:
Executor.newInstance().execute(Request.Get(url))
.returnContent().asString();
这样就行了。