csv 1: Overview
Intro
All of the pandas
csv and text file parsing are done through the read_csv()
and read_table()
functions. These, in turn, inherit most of their behavior from the csv
module in the Python standard library.
Because the end result of “parse data to get to a Dataframe
” looks so tabular, it’s worth having a good understanding of how these two function calls work, even in higher-order data, as those methods will leverage these on the backend.
The filepath_or_buffer
argument
This is a pretty broad argument that represents the object being parsed for information. It can take:
- A path to a file (a
str
orpathlib.Path
) - A URL (with http included)
- Any object with a
read()
method (e.g.StringIO
object)
All of the other arguments
To try to enumerate the 50 or so other arguments here would be unwieldy. Instead, see the other notebooks.
The other options broadly fall into 5 categories:
- Indexing
- Type Inference
- Datetime Parsing
- Unclean Data
- Iterating
Engine
The actual parsing can either be done in Python (easier to use) or in C (much faster).
Generally speaking, if you’re looking to use the C engine, you’re going to want to be as explicit as possible in all of your argument-ing and not relying on the ‘smart typing’ that arguments such as sep
, parse_dates
, etc provide.
float_precision
lineterminator