C Programming - read a file line by line with fgets and getline, implement a portable getline version
Posted on April 3, 2019 past Paul
In this article, I volition evidence you how to read a text file line by line in C using the standard C part fgets and the POSIX getline function. At the end of the commodity, I will write a portable implementation of the getline part that can be used with any standard C compiler.
Reading a file line by line is a trivial problem in many programming languages, simply not in C. The standard style of reading a line of text in C is to use the fgets role, which is fine if you know in advance how long a line of text could exist.
You can detect all the code examples and the input file at the GitHub repo for this article.
Let's kickoff with a elementary case of using fgets to read chunks from a text file. :
For testing the code I've used a simple dummy file, lorem.txt. This is a piece from the output of the above programme on my machine:
The code prints the content of the chunk array, equally filled after every call to fgets, and a mark string.
If you lookout man carefully, past scrolling the above text snippet to the right, you can see that the output was truncated to 127 characters per line of text. This was expected because our code tin can store an entire line from the original text file simply if the line can fit within our chunk assortment.
What if you need to have the unabridged line of text available for further processing and not a piece of line ? A possible solution is to copy or concatenate chunks of text in a carve up line buffer until nosotros find the end of line character.
Let's start past creating a line buffer that will store the chunks of text, initially this volition have the same length as the chunk array:
Next, we are going to append the content of the chunk array to the finish of the line string, until we find the cease of line graphic symbol. If necessary, we'll resize the line buffer:
Please annotation, that in the above code, every time the line buffer needs to be resized its capacity is doubled.
This is the consequence of running the above code on my machine. For brevity, I kept only the beginning lines of output:
You lot can run across that, this time, we can print full lines of text and not fixed length chunks similar in the initial approach.
Allow's change the to a higher place code in order to print the line length instead of the bodily text:
This is the issue of running the modified code on my machine:
In the side by side example, I will prove you how to use the getline function available on POSIX systems like Linux, Unix and macOS. Microsoft Visual Studio doesn't have an equivalent office, and so you won't exist able to easily test this case on a Windows system. Withal, y'all should be able to test information technology if y'all are using Cygwin or Windows Subsystem for Linux.
Please note, how unproblematic is to utilise POSIX's getline versus manually buffering chunks of line like in my previous example. It is unfortunate that the standard C library doesn't include an equivalent function.
When you use getline, don't forget to gratuitous the line buffer when you don't need information technology anymore. Also, calling getline more than one time will overwrite the line buffer, make a copy of the line content if yous need to keep it for further processing.
This is the outcome of running the to a higher place getline example on a Linux machine:
It is interesting to annotation, that for this particular case the getline function on Linux resizes the line buffer to a max of 960 bytes. If y'all run the same lawmaking on macOS the line buffer is resized to 1024 bytes. This is due to the different means in which getline is implemented on dissimilar Unix like systems.
Every bit mentioned before, getline is non present in the C standard library. It could be an interesting do to implement a portable version of this part. The thought here is non to implement the virtually performant version of getline, but rather to implement a simple replacement for non POSIX systems.
We are going to take the to a higher place instance and supervene upon the POSIX'due south getline version with our ain implementation, say my_getline. Apparently, if y'all are on a POSIX organization, you should utilize the version provided by the operating organisation, which was tested past countless users and tuned for optimal functioning.
The POSIX getline office has this signature:
Since ssize_t is also a POSIX defined blazon, usually a 64 $.25 signed integer, this is how we are going to declare our version:
In principle we are going to implement the role using the same arroyo as in one of the in a higher place examples, where I've defined a line buffer and kept copying chunks of text in the buffer until we found the end of line graphic symbol:
Share this post
0 Response to "C Read Character Length of Line From File"
0 Response to "C Read Character Length of Line From File"
Post a Comment