3.4.1 Webpage parser/process and Json
Jsoup
is a Java library for working with real-world HTML. It provides
a very convenient API for extracting and manipulating data, using the
best of DOM,
CSS, and jquery-like methods.
For access website
with login, need to get an authorised session cookie from the
response of the connection of login, then use the cookie for
subsequencial access/connection. Refer to this SO
for some info.
To
parsering element with jsoup, the selector
is very useful. The current 1.8.3 jsoup
has certain limitation such as if the class name contains space, it
will only treat the first word as class name. So in that case, use
selector with class as attr. For example:
myHtml
is <span
class="value item_due_date out">Dec 11, 2015 </span>
then
myHtml.select("span[class*=item_due_date]").first().text()
should
get back “Dec 11, 2015” correctly.
To
use it with Eclipse, just download the jar. Then go to project
properties=>Java Build Path=>Libraries=>Add External JARs.
Several other OpenSource JAVA
HTML parser are available here.
Above all, Jsoup
cannot process webpage with Javascript or other script. Script may
dynamically create webpage, which might be rendered by WebView.
If not planning to parse or
render a html page, you can use URL/HttpURLConnection
to retrive the page source, or use Apache HttpClient (refer to this
SO,
for API23+, add useLibrary ‘org.apache.http.legacy’ in
build.gradle). But if the page is big, try to use 3.2.6 AsyncTask.
To simulate web access from
mobile device on PC with Firefox, install FireFox 'User
agent switcher'
plugin.
To show the content, using
TextView
is not very impressive.
Html.fromHtml(String)
does not support certain
tags. I took the MyTagHandler code from this SO
(also refer to this SO)
and it is a little better. For overriding handleTag
method, note for each tag, the call of tag.equals()
will return ture twice, one for opening, one for closing, that comes
to the first boolean
parameter of handleTag
method.
JSON:
according to wikipedia, (JavaScript
Object Notation), is
an open standard format that uses human-readable text to transmit
data objects consisting of attribute–value
pairs. It
is the primary data format used for asynchronous browser/server
communication (AJAJ),
largely replacing XML (used by AJAX).
Although originally derived from the JavaScript
scripting language, JSON is a language-independent data format. Code
for parsing and generating JSON data is readily available in many
programming languages.
Json is also widely used for
store information, either online or locally, like this code parsing a json update.txt with:
Gson
gson = new
Gson();
String
jsontxt = MyApp.executeHttpGet(StartActivity.this.apkInfoUrl,
true,
false);
UpdateInfo
apkInfo = (UpdateInfo) gson.fromJson(jsontxt,
UpdateInfo.class);
Invalid
Json may cause MalformedJsonExecption,
refer to this so,
use JsonReader to deal with it like below:
Gson
gson
=
new
Gson();
JsonReader
reader
=
new
JsonReader(new
StringReader(data));
reader.setLenient(true);
LiveDataInfo
info
=
(LiveDataInfo)
gson.fromJson(reader,
LiveDataInfo.class);
Refer
this
for validating json against Json schema:
sudo pip install json-spec then,
json
validate --schema-file=schema.json < data.json
0 Comments:
Post a Comment