1- "Hello, world!"

This is the first post in the “C++ for the self-taught” series – the second if you count the introduction. We will take a look at how to create your first C++ application.

In order to create an application in C++, you need a compiler: unlike some interpreted languages such as Perl, Python and PHP, C++ is first compiled into machine code, and then executed. For brevity, we won’t go into that process, which actually involves several more steps than compiling, now, but we’ll dive directly into the code and see the machinery at work.

Let’s take our first look at a program written in C++:

1
2
3
4
5
6
7
8
#include <iostream>
 
using namespace std;
 
int main()
{
        cout << "Hello, world!" << endl;
}

Right off the bat, we have many of the features of C++ used in eight lines of code: we’re using the preprocessor, we’re using operator overloading, we’re using name spaces and we’ve defined a function – and all it does is output “Hello, world!” to the console.

Building the example

  1. copy the code to a file called main.cpp
  2. make sure you have GNU Make and G++ installed
    on Cygwin, the necessary packages are called make and gcc-g++ and you can install them using Cygwin’s Setup;
    on Debian, the necessary packages are called make and g++ and you can install them by running apt-get install make g++.
  3. from the console, in the same directory as your new main.cpp file, run make main. GNU Make will figure out how to make main.

You now have an executable that says “Hello, world!” when you run it. Now let’s take a look at how that works.

Dissecting Hello

The preprocessor

Here’s the example again:

1
2
3
4
5
6
7
8
#include <iostream>
 
using namespace std;
 
int main()
{
        cout << "Hello, world!" << endl;
}

On the very first line, we use the fact that C++ is a preprocessed language. As we will see later (when we need it) we can use the preprocessor for a wide variety of things, but we can also use to simply include one file’s contents in another – which is what we do here. The reason for doing this is that, unlike Java and some other programming languages, C++ does not allow you to use anything that is not present in the same translation unit as the one being compiled. This means that if you need to use a variable, a type, a function or anything else, you need to include its definition by using an #include directive.

Translation unit

A translation unit consists of one source file and all of the files included, using #include directives by that file. In our example, the main.cpp file includes a file called iostream, which will in turn include other files. All those files, concatenated together, form a translation unit.

Some people, and some literature, call these compilation units. It is the same thing.

A program, in C++, can consist of many translation units. In the case of our example, there is only one translation unit that we provide to the compiler. However, behind the scenes, we are using functions that are provided to us by the implementation – by the run-time library which, itself, consists of many translation units. These translation units, once compiled, are linked together to form the executable.

It is important to understand that the compiler can only see what’s in the translation unit it is compiling. It will not magically start reading another C++ file without you telling it to. That is what #include directives are for.

#include directive

C++ is a pre-processed language. That means that before the compiler tries to translate your code into something the computer will understand, it pre-processes your code looking for directives on what to do with it. One of those directives tells the pre-processor (the program the compiler uses for pre-processing) to start reading another file and pretend that it’s part of the same file. That directive is the #include directive.

There are two syntaxes for the #include directive:

#include 

and

#include "other-header.h"

. The former is used to include files that are installed on the system – such as those that come with the compiler and are part of the language, or those that come with libraries that you can use to extend your program’s functionality and not have to write everything yourself. The latter is used to include files that you write yourself.

Files that are meant to be included like this are called header files. Files that contain source code that isn’t meant to be included like this are called source files. Header files usually have the .h or .hpp extension whereas source files usually have the .cpp or .cc extension. C source files usually have a .c extension.

Namespaces

On line 3 of our example, we find using namespace std. This tells the compiler that, if it can’t find a name we’re using in our code, it should look for that name in a different namespace – namely the one called std.

Namespaces are an important part of the C++ programming language: they help you to structure your programs and make sure that when you use a name, the compiler understands what you mean by that name and doesn’t confuse it for something else. Let’s say, for example, that you have two types called A one of which launches satellite into orbit while the other herds sheep. If you declare an instance of A, like this:

A a

you need to know whether you’ve just created a shepherd or a rocket. In order to be able to distinguish the two and still use them in the same file, they have to be declared in different namespaces. We’ll get to doing that later. For now, it’s just important to know that this exists.

We’re using this directive in our example because cout and endl are both declared in the std namespace, as is almost anything else that belongs to the standard library.

The main function

The main function is where your program starts. It is a function that returns an integer value that is used by the operating system to know whether it was successful or not. By convention we return 0 if everything was OK, non-zero (usually 1) if it wasn’t.

This function is the only function in C++ that, although it returns an integer, is not required to explicitly return it. All other functions must have a return statement to indicate the value to be returned. By convention, we will always put the return statement in, but I left it out this time to have a good reason to tell you this 🙂

Every program written in C++ must have a main function. It always returns int but it can take arguments as well. Valid declarations of main are:

int main()
int main(int argc, char * argv[])

Implementations may allow for other signatures as well, and common signatures are:

int main(int argc, char **argv)
int main(int argc, char * argv[], char * env[])

but the standard [basic.start.main] only says that any implementation must accept the first two.

You may have noted the way I quoted the standard: [basic.start.main]. I do this like this because a new version of the standard is coming soon (hopefully) and chapter and section numbers may change, but section identifiers usually don’t. If you get a PDF version of the standard, you can search it for the tag I use – it will appear in it in exactly the same way.

Implementation

When the standard that defines C++ talks about an implementation, it means the same thing as we generally mean by a combination of the pre-processor, the compiler and the rest of the building machinery, the standard library and the services provided by the run-time library and operating system.

cout << "Hello, world!" << endl;

On line 7 of our example, we do the actual work of this program: we output a text. We do this by using an overloaded operator<< and an output stream.

overloading

C++ allow you to define more than one version of almost any function, based on the types of its arguments. It allows this for member functions (functions that are part of a class, also called methods) as well as non-member functions (functions that are not part of a class) and operators. An operator is a special kind of function that is called by using one of the math symbols (such as less-than <, greater-than >, plus +, minus -, etc.) or one of the words reserved for that purpose (such as new and delete). In this case we used operator <<

Overloading is an important feature of C++, as is operator overloading. It allows you to write code that is much more readable than it might be if this were not possible (such as in C).

Conclusion

So, we now have our very first program in C++ and we’ve seen that it actually exercises a lot of features from C++ – and we’ve only touched the tip of the iceberg.

Try having some fun with this: make it say something different, or make it say something more than once. We’ll look into loops next.

About rlc

Software Analyst in embedded systems and C++, C and VHDL developer, I specialize in security, communications protocols and time synchronization, and am interested in concurrency, generic meta-programming and functional programming and their practical applications. I take a pragmatic approach to project management, focusing on the management of risk and scope. I have over two decades of experience as a software professional and a background in science.
This entry was posted in C++ for the self-taught. Bookmark the permalink.

3 Responses to 1- "Hello, world!"

  1. Pingback: The Quest For Bug-Free Software @ Making Life Easier

  2. Paercebal says:

    I searched somewhat about the two kind of includes.


    #include <MyHeader.hpp>

    will search the headers in well defined places


    #include "MyHeader.hpp"

    will try for compiler-dependant places (GCC and MSVC don’t agree on this), usually something like “search first around the including file”.

    Because sometimes I hate letting the compiler choose for me, I use the following:

    #include <MyLibrary/MyHeader.hpp>

    The “MyLibrary” is used as a namespace, and will protected by own code from accidental inclusion, so I use this notation even for “private” headers.

    I asked the question of include semantics on Stack Overflow:

    http://stackoverflow.com/questions/179213/c-include-semantics

    And after researching and reading the answers, I found my own satisfying answer, complete with justification:

    http://stackoverflow.com/questions/179213/c-include-semantics/1251308#1251308

    • Hi Raoul,

      Thank you for your comment.

      Yes, the way #include "filename" is handled is “implementation-defined”, but let’s see what the standard has to say about this, shall we?

      16.2 Source file inclusion [cpp.include]

      A #include directive shall identify a header or source file that can be processed by the implementation.
      A preprocessing directive of the form

      # include < h-char-sequence > new-line

      searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined.
      A preprocessing directive of the form

      # include "q-char-sequence" new-line

      causes the replacement of that directive by the entire contents of the source file identified by the specified sequence between the ” delimiters. The named source file is searched for in an implementation-defined manner. If this search is not supported, or if the search fails, the directive is reprocessed as if it read

      # include < h-char-sequence > new-line

      with the identical contained sequence (including > characters, if any) from the original directive.
      A preprocessing directive of the form

      # include pp-tokens new-line

      (that does not match one of the two previous forms) is permitted. The preprocessing tokens after include in the directive are processed just as in normal text (each identifier currently defined as a macro name is replaced by its replacement list of preprocessing tokens). If the directive resulting after all replacements
      does not match one of the two previous forms, the behavior is undefined. The method by which a sequence of preprocessing tokens between a preprocessing token pair or a pair of ” characters is combined into a single header name preprocessing token is implementation-defined.
      The mapping between the delimited sequence and the external source file name is implementation-defined. The implementation provides unique mappings for sequences consisting of one or more nondigits (2.10) followed by a period (.) and a single nondigit. The implementation may ignore the distinctions of alphabetical case.
      A #include preprocessing directive may appear in a source file that has been read because of a #include directive in another file, up to an implementation-defined nesting limit.

      As you can see, in either case, the behavior is implementation-defined. That means that the implementor gets to decide, but has to tell you what they decided. Most common implementations have #include "filename" start at the source file location and #include <filename> start at some configured location, which is the “traditional” way of doing things. Microsoft compilers just happen to work around a common programming “error” and turn that into a feature – and that is just fine as far as I’m concerned.

      I agree with you that directories should be named for namespaces – at least in most cases. I agree that, because of that, #include directives should contain the namespace name in the way you described. Personally, I am quite comfortable with the way #include "filename" works on most compilers and the argument that “the compiler shouldn’t get to decide”, IMO, doesn’t hold much water. Firstly because it’s implementation-defined in either case and secondly because the preprocessor takes care of preprocessing (in most implementations by far) – not the compiler.

      PS. Note that the implementation is only required to provide unique mappings for single-character (non-digit) filename extensions, so the way “.hpp” works is arguably implementation-defined as well 😉

Comments are closed.