Web Scrapping basically mean parsing some site and extracting the information from it.
There are many libraries that can be used for web scrapping in android we will be focusing on Jsoup which is one of famous library for this purpose.First step is to create a simple android application.Then you need to download the library file which is basically .jar file and add that to your project.
This can be done by right clicking your project ->Build Path ->Configure Build Path
Click on Add External Jar and provide the path of the download Jar file
You can download Jsoup Library from this link: http://jsoup.org/download
Next step is to add the parsing code in your activity. There are multiple ways to do that best way is to make Async Task to perform background operations and publish results on the UI thread without having to manipulate threads and/or handlers.
So we will make Async Task in our Activity class like
private class MyTask extends AsyncTask<Void, Void, String> {
@Override
protected String doInBackground(Void... params) {
String title ="";
Document doc;
try {
doc = Jsoup.connect("http://google.com/").get();
title = doc.title();
System.out.print(title);
} catch (IOException e) {
e.printStackTrace();
}
return title;
}
@Override
protected void onPostExecute(String result) {
//if you had a ui element, you could display the title
((TextView)findViewById (R.id.myTextView)).setText (result);
}
}
In order to get data from any site or extract data from any site we need to specify site name to connect function of Jsoup
doc = Jsoup.connect("http://google.com/").get();
Then we will have all document in our doc variable and we can parse it using different techniques. As the purpose of this blog is to tell the basic usage of Jsoup so we will just extract the title of website from the data extracted from google site.
For that purpose we will do
doc.title();
So now we need to show it in activity. In order to do that we have created a textview and we will set its value in onPostExecute function of Async Task . This function get executed once the task has completed.
That was all will set the title which we extracted from google.com and set its value in TextView which was present in our Activity Layout.
Donot forget to add internet permission in you android manifest file.
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
<uses-permission android:name="android.permission.INTERNET"></uses-permission>
</manifest>