Previous review:
Youdao translation parameter cracking
Baidu translation parameter cracking
Crack JSON encrypted string
Website of this crawl: Population mobility big data
1, Packet capture Base64 decryption
Since there is no data in recent days, in order to better demonstrate the case, the date is set on December 23, 2020. Using Google browser to capture packets, the results are as follows
As can be seen from the above figure, the final returned data is a string of encrypted strings, and = = (the standard base64 will use the equal sign as the suffix! And the number must be 0, 1 or 2). Therefore, it can be preliminarily guessed that the original string is encrypted by base64. Copy this part of the string to Online base64 decoding Decrypt and the results are as follows
From the results, the original string is indeed encrypted by base64! But after decoding, we still get the encrypted JSON string. Next, we will follow the previous blog Crack JSON encrypted string Decrypt the steps introduced.
2, Looking for encrypted JS source code
After the analysis of the last blog, we know that JSON encrypted strings like this will have JSON in Javascript Parse function. Therefore, here we can directly take Jason Search with parse as the keyword (or search with var iv, which is the way to define variables in JS), and the search results are as follows
Although five JS files have been searched, through simple analysis, we can see that the encryption source code we need must be in index JS (fields such as cityCodeList and flightCodeList are obviously the data fields we need to crawl) Click to enter index JS file, still with JSON Parse searches for keywords and finally locates to 110 lines. At this breaking point, refresh the page. The results are as follows
From the result, dataDecode is the entry function we are looking for! data is the base64 encrypted string we analyzed at the beginning! Therefore, we copy the JS code of this part and coexist as pop js.
function dataDecode(data){ var base = new Base64(); var d = JSON.parse(base.decode(data)); var key = 'UVJgCE+OFIff3hK5BT5sPBbGZzjR6FwntjSCwOA9tUQ='; var key1 = CryptoJS.enc.Base64.parse(key); var iv1 = CryptoJS.enc.Base64.parse(d.iv); var decrypted=CryptoJS.AES.decrypt(d.value,key1,{ iv : iv1, mode : CryptoJS.mode.CBC, padding : CryptoJS.pad.Pkcs7 }); var d = decrypted.toString(CryptoJS.enc.Utf8); return JSON.parse(d); }
Next, it's time to find out and fill the gaps.
3, Leak check and complete JS code
Like previous blogs, we still use the execjs library to execute JS code here. This part of Python code is shown below
import execjs import requests def get_jscode(file_path): with open(file_path,'r',encoding='utf8') as f: jscode = f.read() return jscode def get_data(url): header = { 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36' } data = requests.get(url).text return data if __name__=='__main__': url = 'https://unicom_trip.133.cn/api/v1/city/source-top/V0152900?date=20201223' data = get_data(url) jscode = get_jscode('c:/users/dell/Desktop/pop.js') rst = execjs.compile(jscode).call('dataDecode',data) print(rst)
The first error is that Base64 is undefined. Here we return to the website and click Base64() function to jump to Base64 JS file, the results are as follows
Considering that the original JS is too long, it will not be displayed here. After copying all of them, add pop JS file, execute the Python code again, as shown below
The second error is that CryptoJS is not defined. Similarly, return to the original website and click CryptoJS enc.Base64. The parse function jumps to cryto JS JS file
(Note: if nodejs is not installed on the computer, an error not defined by JSON will be reported first!)
give the result as follows
Note that if you only copy this piece of JS code, you can only solve the CryptoJS problem in the original entry function enc.Base64. Parse this part. But the entry function also involves CrytoJS AES. Decrypt function and other parameters related to CrytoJS. Here, for convenience, we copy the whole CryptoJS object.
Similarly, considering that this part of JS code is too long, it will not be shown here. Add it to pop JS file, execute the Python code again, and the results are as follows:
Data returned successfully!
4, Find city code data
Although we got the data, we didn't know the source from the returned results_ Which city does city correspond to. Remember we just started at index Is the cityCodeList field found in the JS file? We locate the cityCodeList, set a breakpoint in line 90 and refresh it, as shown in the figure below. cityList is the data we are looking for!
After inputting cityList in the console, you can return JSON data of city code. After copying it, save it as a JSON file.
The final complete code is as follows
import execjs import requests import json import pandas as pd def get_jscode(file_path): with open(file_path,'r',encoding='utf8') as f: jscode = f.read() return jscode def get_data(url): header = { 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36' } data = requests.get(url).text return data def save_data(data,json_path): with open(json_path,'r',encoding='utf8') as f: citycode = json.load(f) data_list = data['data'] df = [] for city in data_list: code = city['source_city'] city_name = citycode[code]['city_name'] user_percent = city['user_percent'] df.append([city_name,user_percent]) df1 = pd.DataFrame(df,columns=['City name','Inflow ratio']) df1.to_csv('c:/users/dell/desktop/flight.csv',index=False,encoding='gbk') print(df1) if __name__=='__main__': url = 'https://unicom_trip.133.cn/api/v1/city/source-top/V0152900?date=20201223' data = get_data(url) jscode = get_jscode('c:/users/dell/Desktop/pop.js') rst = execjs.compile(jscode).call('dataDecode',data) save_data(rst,'c:/users/dell/Desktop/city_code.json')
Crawling data display
The above is all the content shared this time~