Monday, March 11, 2019

Android开发笔记-ch3.4.1 Webpage parser/process and Json


3.4.1 Webpage parser/process and Json

Jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. For access website with login, need to get an authorised session cookie from the response of the connection of login, then use the cookie for subsequencial access/connection. Refer to this SO for some info.
To parsering element with jsoup, the selector is very useful. The current 1.8.3 jsoup has certain limitation such as if the class name contains space, it will only treat the first word as class name. So in that case, use selector with class as attr. For example:
myHtml is <span class="value item_due_date out">Dec 11, 2015 </span>
then myHtml.select("span[class*=item_due_date]").first().text() should get back “Dec 11, 2015” correctly.
To use it with Eclipse, just download the jar. Then go to project properties=>Java Build Path=>Libraries=>Add External JARs.
Several other OpenSource JAVA HTML parser are available here. Above all, Jsoup cannot process webpage with Javascript or other script. Script may dynamically create webpage, which might be rendered by WebView.

If not planning to parse or render a html page, you can use URL/HttpURLConnection to retrive the page source, or use Apache HttpClient (refer to this SO, for API23+, add useLibrary ‘org.apache.http.legacy’ in build.gradle). But if the page is big, try to use 3.2.6 AsyncTask.

To simulate web access from mobile device on PC with Firefox, install FireFox 'User agent switcher' plugin.

To show the content, using TextView is not very impressive. Html.fromHtml(String) does not support certain tags. I took the MyTagHandler code from this SO (also refer to this SO) and it is a little better. For overriding handleTag method, note for each tag, the call of tag.equals() will return ture twice, one for opening, one for closing, that comes to the first boolean parameter of handleTag method.
For cookies, refer to wikipedia. And nczonline.

JSON: according to wikipedia, (JavaScript Object Notation), is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is the primary data format used for asynchronous browser/server communication (AJAJ), largely replacing XML (used by AJAX). Although originally derived from the JavaScript scripting language, JSON is a language-independent data format. Code for parsing and generating JSON data is readily available in many programming languages.
Json is also widely used for store information, either online or locally, like this code parsing a json update.txt with:
Gson gson = new Gson();
String jsontxt = MyApp.executeHttpGet(StartActivity.this.apkInfoUrl, true, false);
UpdateInfo apkInfo = (UpdateInfo) gson.fromJson(jsontxt, UpdateInfo.class);
Invalid Json may cause MalformedJsonExecption, refer to this so, use JsonReader to deal with it like below:
Gson gson = new Gson();
JsonReader reader = new JsonReader(new StringReader(data));
reader.setLenient(true);
LiveDataInfo info = (LiveDataInfo) gson.fromJson(reader, LiveDataInfo.class);
Refer this for validating json against Json schema: sudo pip install json-spec then, json validate --schema-file=schema.json < data.json
Here also two github, old and more popular jsonlint, and new one json-validator.

0 Comments:

Post a Comment