Chapter 978 She Has Done Her Best
The competition topic is very long, a whole page.
There is a long paragraph about the competition topic in front.
Tang Su summarized that it is necessary to crawl, clean, organize, calculate, express and analyze the data of the virtual website provided by the competition, and finally visualize the data in the form of charts.
Although it was only a preliminary round, Tang Su felt that the competition questions were quite difficult, especially since they were only freshmen and had not learned some professional knowledge or even taken any courses related to data visualization.
Tang Su has taught himself some courses on data visualization, but not in depth.
Tang Su clicked on the link given in the competition question and prepared to start crawling data.
But before she could do anything, she saw some classmates leaving.
Tang Su looked around and found that his roommates Yang Lu and Qiu Xiao were among those who left. There were nearly 20 to 30 students who left, many of whom were his classmates.
Tang Su took a deep breath.
It seems that many students have no idea where to start with this competition topic, or have to give up the competition because they do not have some relevant skills.
Tang Su ignored everyone else and started to operate.
She first installed and deployed Hadoop-related components, mainly the Hive component.
After completing the first step, she began to crawl data using Python.
Tang Su had crawled data from some websites before. This step was not very difficult for her. It is also one of the basic skills that students majoring in big data need to master.
In the second step, after crawling the data, Tang Su began to extract valid data and then converted the data into json format. Tang Su completed this step very skillfully because he had done it before.
The third step is to clean and analyze the data. This step is very critical. After thinking about it, Tang Su wrote a MapReduce program for data cleaning in Java. After cleaning the data, she loaded the available data into the Hive database and completed the data analysis and statistics by running HQL commands. Finally, she executed the SQL script in Hive to view the data in the table.
This series of operations took a lot of time, and Tang Su saw that two hours had passed.
She had only one hour left to complete the test.
The fourth step is to complete data visualization. After thinking about it, Tang Su used bar charts, line charts, and radar charts to output the data he analyzed.
The theme of this competition is to conduct a comparative analysis of the salary situation of IT industry practitioners in various regions and draw conclusions from the analysis.
The fifth step is to write a data analysis report.
There was still half an hour left before the end of the game.
By this point, less than one-third of the people were still on the scene.
Many students either gave up the competition and left directly, while others finished the competition early and left.
With the visual charts, Tang Su was able to do data analysis more smoothly and successfully wrote the analysis report within the specified time.
After writing the report, Tang Su clicked submit and left the competition site.
The scores will not be announced on the spot, so Tang Su will need to wait for the semi-final list to be announced in a few days to know whether he has a chance to advance to the semi-finals.
Nearly 150 people participated in the preliminary round, but only 30 students could enter the semi-final. Tang Su didn’t know whether she had this chance, but she had tried her best and finished the questions according to the steps.
If she fails to advance in the end, it can only be said that her current professional level is not enough.
(End of this chapter)
Continue read on readnovelmtl.com