

It shouldn’t be too hard to derive the entire CSV reader from the code presented in this article, but tabular data can come in many different formats and implementing a reader and a writer for each of them may not be so easy and could really hurt your productivity.įor that reason, you could use some third party component which supports various file formats. Application is located in bin/Release folder.

So total number of occurrences of as separators is 4 and total number of occurrences of as separators is 6, which makes the most probable CSV separator.īundled with this article is a WPF solution that demonstrates auto detection of CSV separator in action. That is because last 4 occurrences of are enclosed in quotes so they don’t qualify as a possible separators. Method detects that CSV separator is although total number of occurrences of is 6 and total number of occurrences of is 8. New line and separator characters are ignored if contained in a quoted value.įor example, in the following Employees.csv file: In this case, if a quote is read, method will peek into CSV stream to see if the next character is also a quote, otherwise it will consider this quote to be a closing quote. Method takes care when reading quotes, separators and new line characters that are part of the quoted value. If any of the possible separators never occurred as a separator in CSV stream, ‘\0’ is returned. When rowCount rows are read or CSV stream is read to the end, method returns first of the possible separators that has maximum number of occurrences as a separator in CSV stream. This parameter is needed because we consider a value to be enclosed in quotes only if opening quote is the first character of the CSV entry.

Return maxCount = 0 ? ' \0' : separators ĬSV stream is represented with reader parameter that is used for reading characters from CSV stream, parameter rowCount tells the method how many rows should be read before determining separator and separators parameter is a list of characters that tells the method which characters are possible separators. Int index = separators.IndexOf(( char)character) If (firstChar) // Set value as quoted only if this quote is the // first char in the value. Reader.Read() // Value is quoted and current and // next characters are "" - read (skip) peeked qoute. If (reader.Peek() != ' "') // Value is quoted and // current character is " and next character is not ". Here is an entire C# source code of the method that detects separator in CSV stream:Ĭopy Code public static char Detect(TextReader reader, int rowCount, IList separators) Now that we have defined the rules for CSV files, we can implement CSV reader that is able to find out which character is used as a separator. There are a lot of CSV readers out there that have wrong implementation because they do not follow the rules stated above. Implementing CSV reader is much more problematic because CSV stream has to be parsed sequentially, character by character and additional state storage has to be provided – which effectively makes CSV reader a state machine. These two simple rules enable us to write CSV writers easily, in just few minutes. If value is enclosed in quotes – any quote character contained in the value should be followed by an additional quote character.If value contains separator character or new line character or begins with a quote – enclose the value in quotes.Rules for writing CSV files are pretty simple: This article gives one possible solution to this problem. So, in order to build a generic CSV reader that will read CSV file regardless of the separator, the reader must first figure out which character is used as a separator. CSV files, as the name C omma Separated Values says, should use comma as the separator but there are many CSV files that use semicolon or horizontal tab as a separator. Interoperability is, probably, the first reason why someone would choose to save the data in CSV format.Īlthough rules for writing and reading CSV files, which are explained in the next chapter, are relatively known and widely accepted, one rule is an exception – determining a character that will be used as a separator. This makes them very interoperable because CSV readers and writers are relatively easy to implement.

CSV files are very popular for storing tabular data because they are simple textual files with a very few rules.
