Java – Zip Folder Tutorial

Learn how to zip a folder in Java

“Quotation, n: The act of repeating erroneously the words of another.”
― Ambrose Bierce, The Unabridged Devil’s Dictionary

1. Introduction

Java provides good support for reading and writing zip files. In this article, let us learn how to create a zip file from the contents of a folder in java.

Continue reading “Java – Zip Folder Tutorial”

Java – Reading a Large File Efficiently

Efficiently read text and binary files in Java

“The world is a book and those who do not travel read only one page.”
― Augustine of Hippo

1. Introduction

What’s the most efficient and easiest way to read a large file in java? Well, one way is to read the whole file at once into memory. Let us examine some issues that arise when doing so.

Continue reading “Java – Reading a Large File Efficiently”

Java String Format Examples

1. Introduction

Have you tried to read and understand Java’s String format documentation? I have and found it hard to understand. While it does include all the information, the organization leaves something to be desired.

This guide is an attempt to bring some clarity and ease the usage of string formatting in java.

2. String Formatting

Most common way of formatting a string in java is using String.format(). If there were a “java sprintf”, this would be it.

String output = String.format("%s = %d", "joe", 35);

For formatted console output, you can use printf() or the format() method of System.out and System.err PrintStreams.

System.out.printf("My name is: %s%n", "joe");

Create a Formatter and link it to a StringBuilder. Output formatted using the format() method will be appended to the StringBuilder.

StringBuilder sbuf = new StringBuilder();
Formatter fmt = new Formatter(sbuf);
fmt.format("PI = %f%n", Math.PI);
System.out.print(sbuf.toString());
// you can continue to append data to sbuf here.

3. Format Specifiers

Here is a quick reference to all the conversion specifiers supported.

Specifier Applies to Output
%a floating point (except BigDecimal) Hex output of floating point number
%b Any type “true” if non-null, “false” if null
%c character Unicode character
%d integer (incl. byte, short, int, long, bigint) Decimal Integer
%e floating point decimal number in scientific notation
%f floating point decimal number
%g floating point decimal number, possibly in scientific notation depending on the precision and value.
%h any type Hex String of value from hashCode() method.
 %n none Platform-specific line separator.
%o integer (incl. byte, short, int, long, bigint) Octal number
%s any type String value
%t Date/Time (incl. long, Calendar, Date and TemporalAccessor) %t is the prefix for Date/Time conversions. More formatting flags are needed after this. See Date/Time conversion below.
%x integer (incl. byte, short, int, long, bigint) Hex string.

3.1. Date and Time Formatting

Note: Using the formatting characters with “%T” instead of “%t” in the table below makes the output uppercase.

 Flag Notes
 %tA Full name of the day of the week, e.g. “Sunday“, “Monday
 %ta Abbreviated name of the week day e.g. “Sun“, “Mon“, etc.
 %tB Full name of the month e.g. “January“, “February“, etc.
 %tb Abbreviated month name e.g. “Jan“, “Feb“, etc.
 %tC Century part of year formatted with two digits e.g. “00” through “99”.
 %tc Date and time formatted with “%ta %tb %td %tT %tZ %tY” e.g. “Fri Feb 17 07:45:42 PST 2017
 %tD Date formatted as “%tm/%td/%ty
 %td Day of the month formatted with two digits. e.g. “01” to “31“.
 %te Day of the month formatted without a leading 0 e.g. “1” to “31”.
%tF ISO 8601 formatted date with “%tY-%tm-%td“.
%tH Hour of the day for the 24-hour clock e.g. “00” to “23“.
%th Same as %tb.
%tI Hour of the day for the 12-hour clock e.g. “01” – “12“.
%tj Day of the year formatted with leading 0s e.g. “001” to “366“.
%tk Hour of the day for the 24 hour clock without a leading 0 e.g. “0” to “23“.
%tl Hour of the day for the 12-hour click without a leading 0 e.g. “1” to “12“.
%tM Minute within the hour formatted a leading 0 e.g. “00” to “59“.
%tm Month formatted with a leading 0 e.g. “01” to “12“.
%tN Nanosecond formatted with 9 digits and leading 0s e.g. “000000000” to “999999999”.
%tp Locale specific “am” or “pm” marker.
%tQ Milliseconds since epoch Jan 1 , 1970 00:00:00 UTC.
%tR Time formatted as 24-hours e.g. “%tH:%tM“.
%tr Time formatted as 12-hours e.g. “%tI:%tM:%tS %Tp“.
%tS Seconds within the minute formatted with 2 digits e.g. “00” to “60”. “60” is required to support leap seconds.
%ts Seconds since the epoch Jan 1, 1970 00:00:00 UTC.
%tT Time formatted as 24-hours e.g. “%tH:%tM:%tS“.
%tY Year formatted with 4 digits e.g. “0000” to “9999“.
%ty Year formatted with 2 digits e.g. “00” to “99“.
%tZ Time zone abbreviation. e.g. “UTC“, “PST“, etc.
%tz Time Zone Offset from GMT e.g. “-0800“.

4. Argument Index

An argument index is specified as a number ending with a “$” after the “%” and selects the specified argument in the argument list.

String.format("%2$s", 32, "Hello");
// prints: "Hello"

5. Formatting an Integer

With the %d format specifier, you can use an argument of all integral types including byte, short, int, long and BigInteger.

Default formatting:
String.format("%d", 93);
// prints 93
Specifying a width:
String.format("|%20d|", 93);
// prints: |                  93|
Left-justifying within the specified width:
String.format("|%-20d|", 93);
// prints: |93                  |
Pad with zeros:
String.format("|%020d|", 93);
// prints: |00000000000000000093|
Print positive numbers with a “+”:

(Negative numbers always have the “-” included):

String.format("|%+20d|', 93);
// prints: |                 +93|
A space before positive numbers.

A “-” is included for negative numbers as per normal.

String.format("|% d|", 93);
// prints: | 93|

String.format("|% d|", -36);
// prints: |-36|
Use locale-specific thousands separator.

For the US locale, it is “,”:

String.format("|%,d|", 10000000);
// prints: |10,000,000|
Enclose negative numbers within parantheses (“()”) and skip the “-“:
String.format("|%(d|", -36);
// prints: |(36)|
Octal Output
String.format("|%o|"), 93);
// prints: 135
Hex Output
String.format("|%x|", 93);
// prints: 5d
Alternate Representation for Octal and Hex Output

Prints octal numbers with a leading “0” and hex numbers with leading “0x“.

String.format("|%#o|", 93);
// prints: 0135

String.format("|%#x|", 93);
// prints: 0x5d

String.format("|%#X|", 93);
// prints: 0X5D

6. String and Character Conversion

Default formatting:

Prints the whole string.

String.format("|%s|", "Hello World");
// prints: "Hello World"
Specify Field Length
String.format("|%30s|", "Hello World");
// prints: |                   Hello World|
Left Justify Text
String.format("|%-30s|", "Hello World");
// prints: |Hello World                   |
Specify Maximum Number of Characters
String.format("|%.5s|", "Hello World");
// prints: |Hello|
Field Width and Maximum Number of Characters
String.format("|%30.5s|", "Hello World");
|                         Hello|

Summary

This guide explained String formatting in Java. We covered the supported format specifiers. Both numeric and string formatting support a variety of flags for alternative formats.

Java File Examples

1. Introduction

Java provides the File class as an abstraction of file and directory path names. A path is presented as a hierarchical operating system independent view of files.

The File class supports a bunch of operations on files and directories which work on both Unix-like and Windows systems. Let us look at some of these operations in more detail.

2. Creating a New File

To create a new file, use the method createNewFile(). The following creates a new file “joe” in the current directory.

File file = new File("joe");
if ( file.createNewFile() ) System.out.println("created");

If a file specified already exists, the method returns false. In this case, the existing file is not touched or modified.

2.1. Hierarchy Not Created

The method does not automatically create the directory hierarchy if any of the parent directories do not exist. The following throws an exception if directory “joe” does not exist in the current directory.

File file = new File("joe/jack");
if ( file.createNewFile() ) System.out.println("created.");

// throws: java.io.IOException: Not a directory

However, if you run the above code after creating the parent directory, it works normally.

2.2. Creating Absolute Path File

In addition to specifying relative paths as above, you can also specify an absolute path as shown below:

File file = new File("/tmp/joe");
if ( file.createNewFile() ) System.out.println("created.");

// prints "created."

2.3. Permission Denied

When attempting to create a file where you don’t have permission, you get an exception:

File file = new File("/joe");
if ( file.createNewFile() ) System.out.println("created.");

// throws java.io.IOException: Permission denied

3. Delete a File

The File class provides a delete() method to delete a file. It is used as follows:

File file = new File("joe");
if ( file.delete() ) System.out.println("deleted.");

The method does not throw an exception if the file does not exists. It just returns false.

4. List Files

When a File object is created with a directory, you can use the list() method to list the files and directories in that directory. Here is an example of listing the files in the current directory using Java 8 streams.

Arrays.stream(file.list()).forEach(System.out::println);

You can also specify a FilenameFilter to select the required files. FilenameFilter is a functional interface so let us use a lambda expression to select and print the “.class” in the directory.

File file = new File(dirPath);
Arrays
    .stream(file.list((d, f) -> f.endsWith(".class")))
    .forEach(System.out::println);

In the above example, we use the FilenameFilter for filtering files by name. However, FileFilter is another functional interface provided by java that can be used to select files by some file property. In the following example, we select empty files.

File file = new File(path);
File[] emptyFiles = Arrays
    .stream(file.listFiles(f -> f.length() == 0));

Or select hidden files for processing.

File file = new File(path);
List<File> hiddenFiles = Arrays
    .stream(file.listFiles(f -> f.isHidden()))
    .collect(Collectors.toList());

5. Creating a Directory

The File class provides the method mkdir() to create a directory. It returns false if the directory cannot be created which could be for a variety of reasons. In other words, exceptions are not thrown including for the following reasons:

  • File or directory already exists.
  • No permission.
  • Hierarchy does not exist.
File file = new File(path);
if ( file.mkdir() ) System.out.println("created.");

If you want to create all non-existent directories of the hierarchy, use mkdirs() instead.

6. Renaming a File

To rename a file, use renameTo(). You must pass in a target File object to rename the file to.

File source = new File(pathA);
File target = new File(pathB);
if ( source.renameTo(target) ) System.out.println("renamed.");

Note that this method comes with a bunch of caveats including:

  • Might not be able to move a file from one file system to another.
  • Might not work if the target path exists. (In other words, might not overwrite the target file.)
  • It might not be atomic. In other words, removing the old file and creating the new file are two separate operations in which one might fail!

With all these warnings, it is better to check the return value to be sure the operation succeeded.

Summary

The Java File class is an system independent abstraction of file and directory paths. It provides a bunch of operations to create a file, remove a file, create a directory including the hierarchy and renaming files.

Java NIO – Using ByteBuffer

1. Introduction

Java provides a class ByteBuffer which is an abstraction of a buffer storing bytes. While just a little bit more involved than using plain byte[] arrays, it is touted as more performant. in this article, let us examine some uses of ByteBuffer and friends.

Continue reading “Java NIO – Using ByteBuffer”

How do I Create a Java String from the Contents of a File?

1. Introduction

Here we present a few ways to read the whole contents of a file into a String in Java.

2. Using java.nio.file in Java 7+

Quite simple to load the whole file into a String using the java.nio.file package:

String str = new String(Files.readAllBytes(Paths.get(pathname)),
                        StandardCharsets.UTF_8);

Here is how the code works. Read all the bytes into a byte buffer using the Files and Paths available in the java.nio.file package.

byte[] buf = Files.readAllBytes(Paths.get(pathname));

Convert the byte buffer into a String by specifying the character set.

String str = new String(buf, StandardCharsets.UTF_8);

3. Scan for end-of-input

Another way to read the whole file into a String is to use the Scanner class. Create a scanner with the file as input, set the appropriate delimiter and read the next token.

Note: the actual delimiter used in the code below is the beginning-of-input marker which will not match anywhere other than the beginning of input.

Scanner scanner = null;
try {
    scanner = new Scanner(new File(pathname), "UTF-8");
    return scanner.useDelimiter("\\A").next();
} finally {
    if ( scanner != null ) scanner.close();
}

4. Memory Mapped File Reading

This method maps the file contents directly into memory using the MappedByteBuffer class. Memory mapping the contents directly might lead one to expect enhanced performance. However this advantage is only available if the buffer is used directly. In our case, since we are creating a String from the contents of the file, the speed advantage of the memory mapped buffers is probably not visible.

static private String readFile3(String pathname)
    throws java.io.IOException
{
    File f = new File(pathname);
    RandomAccessFile file = new RandomAccessFile(pathname, "r");
    MappedByteBuffer buffer = file.getChannel().map(MapMode.READ_ONLY,
						    0,
						    f.length());
    file.close();
    return new StringBuilder(StandardCharsets.UTF_8.decode(buffer))
	.toString();
}

ByteBuffer provides a method asCharBuffer() which returns a “view” of the byte buffer as a character buffer. However, there is no way to specify the encoding for converting bytes to characters with this method — probably an oversight in the Java API. The correct way to convert a ByteBuffer to CharBuffer is to use CharSet.decode(ByteBuffer) with the appropriate CharSet instance.

5. Simple Way Using java.io

Of course, there is always the “old” way (pre-Java 1.7) of reading a whole file into a String: reading the characters in a loop and appending to a buffer.

static private String readFile3(String pathname)
    throws java.io.IOException
{
    FileReader in = null;
    try {
	in = new FileReader(pathname);
	char[] buf = new char[2048];
	int len;
	StringBuilder sbuf = new StringBuilder();
	while ((len = in.read(buf, 0, buf.length)) != -1) {
	    sbuf.append(buf, 0, len);
	}
	return sbuf.toString();
    } finally {
	if ( in != null ) in.close();
    }
}

6. Benchmarking Various Approaches

Since we have several methods of reading a whole file into a string, it is interesting to see how the methods stack up against one another in performance. To this end, we implemented a simple benchmarking method using System.currentTimeMillis(). The following is the average time for each run over 1000 runs of the method.

simple       235 ms for 1000 iters: 0.235000 ms/op
nio          213 ms for 1000 iters: 0.213000 ms/op
scanner      629 ms for 1000 iters: 0.629000 ms/op
mmap         285 ms for 1000 iters: 0.285000 ms/op

For the set of conditions under which the application ran, we can conclude that the NIO method is the fastest followed by the Simple method. Slowest is the Scanner method which is somewhat expected since a regular expression search is involved. A note of warning: do not use these benchmark numbers to pick the method to use. Rather use the method closest to your paradigm of problem solving.

Conclusion

You are now aware of various methods of reading the whole contents of a file into a String. Pick whatever suits you best and use it!

How to Convert UTF-16 Text File to UTF-8 in Java?

1. Introduction

In this article, we show how to convert a text file from UTF-16 encoding to UTF-8. Such a conversion might be required because certain tools can only read UTF-8 text. Furthermore, the conversion procedure demonstrated here can be applied to convert a text file from any supported encoding to another.

UTF-8 is a character encoding that can represent all characters (or code points) defined by Unicode. It is designed to be backward compatible with legacy encodings such as ASCII.

UTF-16 is another character encoding that encodes characters in one or two 16-bit code units whereas UTF-8 encodes characters in a variable number of 8-bit code units.

2. Supported Character Sets

You can find the characters sets supported by the JVM using the class java.nio.charset.Charset as follows:

for (Map.Entry e : Charset.availableCharsets().entrySet()) {
     System.out.println(e.getKey());
}

// prints the following
Big5
Big5-HKSCS
CESU-8
IBM-Thai
...
US-ASCII
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UTF-8
...

3. Conversion Using java.io Classes

Java provides java.io.InputStreamReader class as a bridge between byte streams to character streams. Open the file using this class to be able to read character buffers in the specified encoding:

Reader in = new InputStreamReader(new FileInputStream(infile), "UTF-16");

Analogously, the class java.io.OutputStreamWriter acts as a bridge between characters streams and bytes streams. Create a Writer with this class to be able to write bytes to the file:

Writer out = new OutputStreamWriter(new FileOutputStream(outfile), "UTF-8");

With the Reader and Writer in place, it is trivial to copy data from the input file to the output file:

char cbuf[] = new char[2048];
int len;
while ((len = in.read(cbuf, 0, cbuf.length)) != -1) {
    out.write(cbuf, 0, len);
}

And that’s it! You have successfully read and converted data from UTF-16 to UTF-8. You can use this code to perform the conversion between any two character sets supported by your JVM.

4. Using String for Converting Bytes

Sometimes, you may have a byte array which you need converted and output in a specific encoding. You can use the String class for these cases as shown below. First convert the byte array into a String:

String str = new String(bytes, 0, len, "UTF-16");

Next, obtain the bytes in the required encoding by using the String.getBytes(String) method:

byte[] outbytes = str.getBytes("UTF-8");

Write the byte array to an OutputStream:

OutputStream out = new FileOutputStream(outfile);
out.write(outbytes);
out.close();

Note that while you could use the String class as shown to convert bytes, you should prefer using Reader/Writer combination when possible to avoid problems with multi-byte characters. Specifically, the byte array you have read may contain an incomplete multi-byte character at the beginning or the end. This may lead to character encoding errors.

Conclusion

When you need to convert text from one character encoding to another in Java, you have several options:

  • Using InputStreamReader and OutputStreamWriter bridge classes for conversion.
  • Using the String class directly with specified encoding.
  • A more advanced option is to use CharsetEncoder and CharsetDecoder class (not presented in this article).