Information is the foundation of the learning process. Data acquisition and wrangling are crucial parts of Data Science that lead to extracting knowledge from the information. With large, difficult to transfer data, remote access is the rule, almost exclusively via a command-line interface. Luckily for you, knowing a few tricks make it easy to access and visualize data in a friendly way on a remote machine. As you explore this section, you will also learn how to manage Excel spreadsheets and efficiently manipulate massive data with Python.
Table of contents
1. Remote Data Access
- 1.1 Remote Data Transfer
- Tutorial: Copying Data using SSH
- Tutorial: Copying Data using Globus
- Tutorial: File Transfer using irods
- Tutorial: File Transfer using SRA Toolkit
- Tutorial: Downloading Online Data using WGET
- Tutorial: Downloading Online Data using Web Scraping
- Tutorial: Downloading Online GitHub Repos using GIT
- Tutorial: Downloading Online GitHub Folders using SVN
- 1.2 Remote Data Preview without Downloading
2. Data Manipulation
- 2.1 Manipulating Excel Data Sheets
- 2.2 Manipulating Text Files with Python
- Tutorial: Read, Write, Split, Select Data
- Tutorial: JSON Module - Encoding & Decoding JSON Data
- Tutorial: Math Module - Various Mathematical Functions
- Tutorial: Math Module - Pandas Library - Data Structure Manipulation Tool
- Tutorial: Math Module - Numpy Library - Multi-Dimensional Arrays Parser
- Tutorial: SciPy Library - Algorithms for Scientific Computing