All of the
pandas csv and text file parsing are done through the
read_table() functions. These, in turn, inherit most of their behavior from the
csv module in the Python standard library.
Because the end result of “parse data to get to a
Dataframe” looks so tabular, it’s worth having a good understanding of how these two function calls work, even in higher-order data, as those methods will leverage these on the backend.
This is a pretty broad argument that represents the object being parsed for information. It can take:
- A path to a file (a
- A URL (with http included)
- Any object with a
All of the other arguments
To try to enumerate the 50 or so other arguments here would be unwieldy. Instead, see the other notebooks.
The other options broadly fall into 5 categories:
- Type Inference
- Datetime Parsing
- Unclean Data
The actual parsing can either be done in Python (easier to use) or in C (much faster).
Generally speaking, if you’re looking to use the C engine, you’re going to want to be as explicit as possible in all of your argument-ing and not relying on the ‘smart typing’ that arguments such as
parse_dates, etc provide.