csv 1: Overview

23 May 2018

Intro

All of the pandas csv and text file parsing are done through the read_csv() and read_table() functions. These, in turn, inherit most of their behavior from the csv module in the Python standard library.

Because the end result of “parse data to get to a Dataframe” looks so tabular, it’s worth having a good understanding of how these two function calls work, even in higher-order data, as those methods will leverage these on the backend.

The `filepath_or_buffer` argument

This is a pretty broad argument that represents the object being parsed for information. It can take:

A path to a file (a str or pathlib.Path)
A URL (with http included)
Any object with a read() method (e.g. StringIO object)

All of the other arguments

To try to enumerate the 50 or so other arguments here would be unwieldy. Instead, see the other notebooks.

The other options broadly fall into 5 categories:

Indexing
Type Inference
Datetime Parsing
Unclean Data
Iterating

Engine

The actual parsing can either be done in Python (easier to use) or in C (much faster).

Generally speaking, if you’re looking to use the C engine, you’re going to want to be as explicit as possible in all of your argument-ing and not relying on the ‘smart typing’ that arguments such as sep, parse_dates, etc provide.

float_precision
lineterminator

Intro

The filepath_or_buffer argument

All of the other arguments

Engine

The `filepath_or_buffer` argument