This code will grab the HTML source from a given URL.
Change "website here.com" to a real URL starting with http:// and the program will display the index pages source code in the console.
The nice thing about this code is it spoofs the connection to make it look like its a web browser. This enables you to navigate to sites like google that normally block connections from non web browser applications.There are several ways to get the HTML content of a URL from Java. There are even
more ways to get the HTML using open source java. For example, url is www.google.com and my servlet needs to read the html source code but for this You don't need servlet to read data from a remote server. You can just use java.net.URL or java.net.URLConnection class to read remote content from HTTP server.Some Web Sites do not allow visitors to view the HTML source of their web pages. They can disable our right mouse button to prevent accessing the "view source" menu option and some other web sites open their pages in a special window which has no menu bar to select the "Source" option from "View" menu.
Apache Commons HttpClient
You can also use the Apache Commons HttpClient for a slightly easier to use
library.
HttpClient client = new HttpClient();
try {
client.executeMethod(method);
byte[] responseBody =
method.getResponseBody();
System.out.println(new String(responseBody));
} catch (Exception e) {
e.printStackTrace();
} finally {
method.releaseConnection();
}
Java Development Kit (JDK)
In this Class I used InputStream method.
public class WebsiteSource
{
public static void main(String[] args) throws IOException{
URL
url = new URL("http://www.infostretch.com");
System.out.println("
portno:" +url.getPort());
System.out.println("Host:" + url.getHost());
// System.out.println("file:"+url.getFile());
URLConnection
connection = url.openConnection();
long date =
connection.getDate();
if (date == 0){
System.out.println("No Date
found");
}
else{
System.out.println("Date is
:"
+ new Date(date));
}
if(connection.getExpiration()==0)
{
System.out.println("no expiration
date found");
}
else{
System.out.println("Exp Date
:"+new Date(date));
}
if(connection.getLastModified()==0)
{
System.out.println("no last
modified date found");
}
else{
System.out.println("Modified Date
:"+new Date(date));
}
int len =
connection.getContentLength();
if(len == -1)
System.out.println("Content
length unavailable.");
else
System.out.println("Content-Length:
" +
len);
if(len != 0) {
System.out.println("=== Html
Contents page ===");
InputStream
input = connection.getInputStream();
int htmlSrc ;
while (((htmlSrc =
input.read()) != -1)) {
System.out.print((char)htmlSrc);
}
input.close();
}
else {
System.out.println("No content
available.");
}
}
}
No comments:
Post a Comment