CharSequence instances. Some methods defined in this
class duplicate the functionalities already provided in the standard String class,
but works on a generic CharSequence instance instead of String.
Unicode support
Every methods defined in this class work on code points instead of characters when appropriate. Consequently, those methods should behave correctly with characters outside the Basic Multilingual Plane (BMP).Policy on space characters
Java defines two methods for testing if a character is a white space:Character.isWhitespace(int) and Character.isSpaceChar(int).
Those two methods differ in the way they handle no-break spaces, tabulations and line feeds. The general policy in the SIS library is:
- Use
isWhitespace(…)when separating entities (words, numbers, tokens, etc.) in a list. Using that method, characters separated by a no-break space are considered as part of the same entity. - Use
isSpaceChar(…)when parsing a single entity, for example a single word. Using this method, no-break spaces are considered as part of the entity while line feeds or tabulations are entity boundaries.
isWhitespace(…) is appropriate for skipping spaces between the numbers.
But if there is spaces to skip inside a single number, then isSpaceChar(…) is a good choice
for accepting no-break spaces and for stopping the parse operation at tabulations or line feed character.
A tabulation or line feed between two characters is very likely to separate two distinct values.
In practice, the Format implementations in the SIS library typically use
isSpaceChar(…) while most of the rest of the SIS library, including this
CharSequences class, consistently uses isWhitespace(…).
Note that the String.trim() method doesn't follow any of those policies and should
generally be avoided. That trim() method removes every ISO control characters without
distinction about whether the characters are space or not, and ignore all Unicode spaces.
The trimWhitespaces(String) method defined in this class can be used as an alternative.
Handling of null values
Most methods in this class accept anull CharSequence argument. In such cases
the method return value is either a null CharSequence, an empty array, or a
0 or false primitive type calculated as if the input was an empty string.- Since:
- 0.3
- See Also:
-
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionstatic CharSequenceCreates an acronym from the given text.static CharSequencecamelCaseToSentence(CharSequence identifier) Given a string in camel cases (typically an identifier), returns a string formatted like an English sentence.static CharSequencecamelCaseToWords(CharSequence identifier, boolean toLowerCase) Given a string in camel cases, returns a string with the same words separated by spaces.static intReturns the number of Unicode code points in the given characters sequence, or 0 ifnull.static intcodePointCount(CharSequence text, int fromIndex, int toIndex) Returns the number of Unicode code points in the given characters sub-sequence, or 0 ifnull.static CharSequencecommonPrefix(CharSequence s1, CharSequence s2) Returns the longest sequence of characters which is found at the beginning of the two given texts.static CharSequencecommonSuffix(CharSequence s1, CharSequence s2) Returns the longest sequence of characters which is found at the end of the two given texts.static CharSequencecommonWords(CharSequence s1, CharSequence s2) Returns the words found at the beginning and end of both texts.static voidcopyChars(CharSequence src, int srcOffset, char[] dst, int dstOffset, int length) Copies a sequence of characters in the givenchar[]array.static intcount(CharSequence text, char toSearch) Counts the number of occurrence of the given character in the given character sequence.static intcount(CharSequence text, String toSearch) Returns the number of occurrences of thetoSearchstring in the giventext.static booleanendsWith(CharSequence text, CharSequence suffix, boolean ignoreCase) Returnstrueif the given character sequence ends with the given suffix.static booleanequals(CharSequence s1, CharSequence s2) Returnstrueif the two given texts are equal.static booleanequalsFiltered(CharSequence s1, CharSequence s2, Characters.Filter filter, boolean ignoreCase) Returnstrueif the given texts are equal, optionally ignoring case and filtered-out characters.static booleanReturnstrueif the two given texts are equal, ignoring case.static intindexOf(CharSequence text, int toSearch, int fromIndex, int toIndex) Returns the index within the given character sequence of the first occurrence of the specified character, starting the search at the specified index.static intindexOf(CharSequence text, CharSequence toSearch, int fromIndex, int toIndex) Returns the index within the given strings of the first occurrence of the specified part, starting at the specified index.static intindexOfLineStart(CharSequence text, int numLines, int fromIndex) Returns the index of the first character after the given number of lines.static booleanisAcronymForWords(CharSequence acronym, CharSequence words) Returnstrueif the first string is likely to be an acronym of the second string.static booleanisUnicodeIdentifier(CharSequence identifier) Returnstrueif the given identifier is a legal Unicode identifier.static booleanisUpperCase(CharSequence text) Returnstrueif the given text is non-null, contains at least one upper-case character and no lower-case character.static intlastIndexOf(CharSequence text, int toSearch, int fromIndex, int toIndex) Returns the index within the given character sequence of the last occurrence of the specified character, searching backward in the given index range.static intlength(CharSequence text) Returns the length of the given characters sequence, or 0 ifnull.static byte[]parseBytes(CharSequence values, char separator, int radix) static double[]parseDoubles(CharSequence values, char separator) static float[]parseFloats(CharSequence values, char separator) static int[]parseInts(CharSequence values, char separator, int radix) static long[]parseLongs(CharSequence values, char separator, int radix) static short[]parseShorts(CharSequence values, char separator, int radix) static booleanregionMatches(CharSequence text, int fromIndex, CharSequence part) Returnstrueif the given text at the given offset contains the given part, in a case-sensitive comparison.static booleanregionMatches(CharSequence text, int fromIndex, CharSequence part, boolean ignoreCase) Returnstrueif the given text at the given offset contains the given part, optionally in a case-insensitive way.static CharSequencereplace(CharSequence text, CharSequence toSearch, CharSequence replaceBy) Replaces all occurrences of a given string in the given character sequence.static CharSequenceshortSentence(CharSequence text, int maxLength) Makes sure that thetextstring is not longer thanmaxLengthcharacters.static intskipLeadingWhitespaces(CharSequence text, int fromIndex, int toIndex) Returns the index of the first non-white character in the given range.static intskipTrailingWhitespaces(CharSequence text, int fromIndex, int toIndex) Returns the index after the last non-white character in the given range.static CharSequencespaces(int length) Returns a character sequence of the specified length filled with white spaces.static CharSequence[]split(CharSequence text, char separator) Splits a text around the given character.static CharSequence[]splitOnEOL(CharSequence text) Splits a text around the End Of Line (EOL) characters.static booleanstartsWith(CharSequence text, CharSequence prefix, boolean ignoreCase) Returnstrueif the given character sequence starts with the given prefix.static CharSequencetoASCII(CharSequence text) Replaces some Unicode characters by ASCII characters on a "best effort basis".static CharSequencetoken(CharSequence text, int fromIndex) Returns the token starting at the given offset in the given text.static CharSequenceTrims the fractional part of the given formatted number, provided that it doesn't change the value.static CharSequenceReturns a text with leading and trailing whitespace characters omitted.static CharSequencetrimWhitespaces(CharSequence text, int lower, int upper) Returns a sub-sequence with leading and trailing whitespace characters omitted.static StringtrimWhitespaces(String text) Deprecated, for removal: This API element is subject to removal in a future version.static CharSequenceupperCaseToSentence(CharSequence identifier) Given a string in upper cases (typically a Java constant), returns a string formatted like an English sentence.
-
Field Details
-
EMPTY_ARRAY
An array of zero-length. This constant play a role equivalents toCollections.EMPTY_LIST.
-
-
Method Details
-
spaces
Returns a character sequence of the specified length filled with white spaces.Use case
This method is typically invoked for performing right-alignment of text on the console or other device using monospaced font. Callers compute a value for thelengthargument by (desired width - used width). Since the used width value may be greater than expected, this method handle negativelengthvalues as if the value was zero.- Parameters:
length- the string length. Negative values are clamped to 0.- Returns:
- a string of length
lengthfilled with white spaces.
-
length
Returns the length of the given characters sequence, or 0 ifnull.- Parameters:
text- the character sequence from which to get the length, ornull.- Returns:
- the length of the character sequence, or 0 if the argument is
null.
-
codePointCount
Returns the number of Unicode code points in the given characters sequence, or 0 ifnull. Unpaired surrogates within the text count as one code point each.- Parameters:
text- the character sequence from which to get the count, ornull.- Returns:
- the number of Unicode code points, or 0 if the argument is
null. - See Also:
-
codePointCount
Returns the number of Unicode code points in the given characters sub-sequence, or 0 ifnull. Unpaired surrogates within the text count as one code point each.This method performs the same work than the standard
Character.codePointCount(CharSequence, int, int)method, except that it tries to delegate to the optimized methods from theString,StringBuilder,StringBufferorCharBufferclasses if possible.- Parameters:
text- the character sequence from which to get the count, ornull.fromIndex- the index from which to start the computation.toIndex- the index after the last character to take in account.- Returns:
- the number of Unicode code points, or 0 if the argument is
null. - See Also:
-
count
Returns the number of occurrences of thetoSearchstring in the giventext. The search is case-sensitive.- Parameters:
text- the character sequence to count occurrences, ornull.toSearch- the string to search in the giventext. It shall contain at least one character.- Returns:
- the number of occurrences of
toSearchintext, or 0 iftextwas null or empty. - Throws:
NullPointerException- if thetoSearchargument is null.IllegalArgumentException- if thetoSearchargument is empty.
-
count
Counts the number of occurrence of the given character in the given character sequence.- Parameters:
text- the character sequence to count occurrences, ornull.toSearch- the character to count.- Returns:
- the number of occurrences of the given character, or 0 if the
textis null.
-
indexOf
Returns the index within the given strings of the first occurrence of the specified part, starting at the specified index. This method is equivalent to the following method call, except that this method works on arbitraryCharSequenceobjects instead ofStrings only, and that the upper limit can be specified:There is no restriction on the value ofreturn text.indexOf(part, fromIndex);fromIndex. If negative or greater thantoIndex, then the behavior of this method is as if the search started from 0 ortoIndexrespectively. This is consistent with theString.indexOf(String, int)behavior.- Parameters:
text- the string in which to perform the search.toSearch- the substring for which to search.fromIndex- the index from which to start the search.toIndex- the index after the last character where to perform the search.- Returns:
- the index within the text of the first occurrence of the specified part, starting at the specified index,
or -1 if no occurrence has been found or if the
textargument is null. - Throws:
NullPointerException- if thetoSearchargument is null.IllegalArgumentException- if thetoSearchargument is empty.- See Also:
-
indexOf
Returns the index within the given character sequence of the first occurrence of the specified character, starting the search at the specified index. If the character is not found, then this method returns -1.There is no restriction on the value of
fromIndex. If negative or greater thantoIndex, then the behavior of this method is as if the search started from 0 ortoIndexrespectively. This is consistent with the behavior documented inString.indexOf(int, int).- Parameters:
text- the character sequence in which to perform the search, ornull.toSearch- the Unicode code point of the character to search.fromIndex- the index to start the search from.toIndex- the index after the last character where to perform the search.- Returns:
- the index of the first occurrence of the given character in the specified sub-sequence,
or -1 if no occurrence has been found or if the
textargument is null. - See Also:
-
lastIndexOf
Returns the index within the given character sequence of the last occurrence of the specified character, searching backward in the given index range. If the character is not found, then this method returns -1.There is no restriction on the value of
toIndex. If greater than the text length or less thanfromIndex, then the behavior of this method is as if the search started fromlengthorfromIndexrespectively. This is consistent with the behavior documented inString.lastIndexOf(int, int).- Parameters:
text- the character sequence in which to perform the search, ornull.toSearch- the Unicode code point of the character to search.fromIndex- the index of the first character in the range where to perform the search.toIndex- the index after the last character in the range where to perform the search.- Returns:
- the index of the last occurrence of the given character in the specified sub-sequence,
or -1 if no occurrence has been found or if the
textargument is null. - See Also:
-
indexOfLineStart
Returns the index of the first character after the given number of lines. This method counts the number of occurrence of'\n','\r'or"\r\n"starting from the given position. WhennumLinesoccurrences have been found, the index of the first character after the last occurrence is returned.If the
numLinesargument is positive, this method searches forward. If negative, this method searches backward. If 0, this method returns the beginning of the current line.If this method reaches the end of
textwhile searching forward, thentext.length()is returned. If this method reaches the beginning oftextwhile searching backward, then 0 is returned.- Parameters:
text- the string in which to skip a determined amount of lines.numLines- the number of lines to skip. Can be positive, zero or negative.fromIndex- index at which to start the search, from 0 totext.length()inclusive.- Returns:
- index of the first character after the last skipped line.
- Throws:
NullPointerException- if thetextargument is null.IndexOutOfBoundsException- iffromIndexis out of bounds.
-
skipLeadingWhitespaces
Returns the index of the first non-white character in the given range. If the given range contains only space characters, then this method returns the index of the first character after the given range, which is always equals or greater thantoIndex. Note that this character may not exist iftoIndexis equal to the text length.Special cases:
- If
fromIndexis greater thantoIndex, then this method unconditionally returnsfromIndex. - If the given range contains only space characters and the character at
toIndex-1is the high surrogate of a valid supplementary code point, then this method returnstoIndex+1, which is the index of the next code point. - If
fromIndexis negative ortoIndexis greater than the text length, then the behavior of this method is undefined.
Character.isWhitespace(int)method.- Parameters:
text- the string in which to perform the search (cannot be null).fromIndex- the index from which to start the search (cannot be negative).toIndex- the index after the last character where to perform the search.- Returns:
- the index within the text of the first occurrence of a non-space character, starting
at the specified index, or a value equals or greater than
toIndexif none. - Throws:
NullPointerException- if thetextargument is null.- See Also:
- If
-
skipTrailingWhitespaces
Returns the index after the last non-white character in the given range. If the given range contains only space characters, then this method returns the index of the first character in the given range, which is always equals or lower thanfromIndex.Special cases:
- If
fromIndexis lower thantoIndex, then this method unconditionally returnstoIndex. - If the given range contains only space characters and the character at
fromIndexis the low surrogate of a valid supplementary code point, then this method returnsfromIndex-1, which is the index of the code point. - If
fromIndexis negative ortoIndexis greater than the text length, then the behavior of this method is undefined.
Character.isWhitespace(int)method.- Parameters:
text- the string in which to perform the search (cannot be null).fromIndex- the index from which to start the search (cannot be negative).toIndex- the index after the last character where to perform the search.- Returns:
- the index within the text of the last occurrence of a non-space character, starting
at the specified index, or a value equals or lower than
fromIndexif none. - Throws:
NullPointerException- if thetextargument is null.- See Also:
- If
-
split
Splits a text around the given character. The array returned by this method contains all subsequences of the given text that is terminated by the given character or is terminated by the end of the text. The subsequences in the array are in the order in which they occur in the given text. If the character is not found in the input, then the resulting array has just one element, which is the whole given text.This method is similar to the standard
String.split(String)method except for the following:- It accepts generic character sequences.
- It accepts
nullargument, in which case an empty array is returned. - The separator is a simple character instead of a regular expression.
- If the
separatorargument is'\n'or'\r', then this method splits around any of"\r","\n"or"\r\n"characters sequences. - The leading and trailing spaces of each subsequences are trimmed.
- Parameters:
text- the text to split, ornull.separator- the delimiting character (typically the coma).- Returns:
- the array of subsequences computed by splitting the given text around the given
character, or an empty array if
textwas null. - See Also:
-
splitOnEOL
Splits a text around the End Of Line (EOL) characters. EOL characters can be any of"\r","\n"or"\r\n"sequences. Each element in the returned array will be a single line. If the given text is already a single line, then this method returns a singleton containing only the given text.Notes:
- At the difference of
split(toSplit, '\n’), this method does not remove whitespaces. - This method does not check for Unicode line separator and paragraph separator.
Performance note
Prior Java 8 this method was usually cheap because all string instances created byString.substring(int,int)shared the samechar[]internal array. However, since Java 8, the newStringimplementation copies the data in new arrays. Consequently, it is better to use index rather than this method for splitting largeStrings. However, this method still useful for otherCharSequenceimplementations providing an efficientsubSequence(int,int)method.- Parameters:
text- the multi-line text from which to get the individual lines, ornull.- Returns:
- the lines in the text, or an empty array if the given text was null.
- See Also:
- At the difference of
-
parseDoubles
public static double[] parseDoubles(CharSequence values, char separator) throws NumberFormatException Splits the given text around the given character, then parses each item as adouble. Empty sub-sequences are parsed asDouble.NaN.- Parameters:
values- the text containing the values to parse, ornull.separator- the delimiting character (typically the coma).- Returns:
- the array of numbers parsed from the given text,
or an empty array if
valueswas null. - Throws:
NumberFormatException- if at least one number cannot be parsed.
-
parseFloats
Splits the given text around the given character, then parses each item as afloat. Empty sub-sequences are parsed asFloat.NaN.- Parameters:
values- the text containing the values to parse, ornull.separator- the delimiting character (typically the coma).- Returns:
- the array of numbers parsed from the given text,
or an empty array if
valueswas null. - Throws:
NumberFormatException- if at least one number cannot be parsed.
-
parseLongs
public static long[] parseLongs(CharSequence values, char separator, int radix) throws NumberFormatException - Parameters:
values- the text containing the values to parse, ornull.separator- the delimiting character (typically the coma).radix- the radix to be used for parsing. This is usually 10.- Returns:
- the array of numbers parsed from the given text,
or an empty array if
valueswas null. - Throws:
NumberFormatException- if at least one number cannot be parsed.
-
parseInts
public static int[] parseInts(CharSequence values, char separator, int radix) throws NumberFormatException - Parameters:
values- the text containing the values to parse, ornull.separator- the delimiting character (typically the coma).radix- the radix to be used for parsing. This is usually 10.- Returns:
- the array of numbers parsed from the given text,
or an empty array if
valueswas null. - Throws:
NumberFormatException- if at least one number cannot be parsed.
-
parseShorts
public static short[] parseShorts(CharSequence values, char separator, int radix) throws NumberFormatException - Parameters:
values- the text containing the values to parse, ornull.separator- the delimiting character (typically the coma).radix- the radix to be used for parsing. This is usually 10.- Returns:
- the array of numbers parsed from the given text,
or an empty array if
valueswas null. - Throws:
NumberFormatException- if at least one number cannot be parsed.
-
parseBytes
public static byte[] parseBytes(CharSequence values, char separator, int radix) throws NumberFormatException - Parameters:
values- the text containing the values to parse, ornull.separator- the delimiting character (typically the coma).radix- the radix to be used for parsing. This is usually 10.- Returns:
- the array of numbers parsed from the given text,
or an empty array if
valueswas null. - Throws:
NumberFormatException- if at least one number cannot be parsed.
-
toASCII
Replaces some Unicode characters by ASCII characters on a "best effort basis". For example, the “ é ” character is replaced by “ e ” (without accent), the “ ″ ” symbol for minutes of angle is replaced by straight double quotes “ " ”, and combined characters like ㎏, ㎎, ㎝, ㎞, ㎢, ㎦, ㎖, ㎧, ㎩, ㎐, etc. are replaced by the corresponding sequences of characters.Note: the replacement of Greek letters is a more complex task than what this method can do, since it depends on the context. For example if the Greek letters are abbreviations for coordinate system axes like φ and λ, then the replacements depend on the enclosing coordinate system. SeeTransliteratorfor more information.- Parameters:
text- the text to scan for Unicode characters to replace by ASCII characters, ornull.- Returns:
- the given text with substitutions applied, or
textif no replacement has been applied, ornullif the given text was null. - See Also:
-
trimWhitespaces
Deprecated, for removal: This API element is subject to removal in a future version.Replaced byString.strip()in JDK 11.Returns a string with leading and trailing whitespace characters omitted. This method is similar in purpose toString.trim(), except that the latter considers every ISO control codes below 32 to be a whitespace. ThatString.trim()behavior has the side effect of removing the heading of ANSI escape sequences (a.k.a. X3.64), and to ignore Unicode spaces. ThistrimWhitespaces(…)method is built on the more accurateCharacter.isWhitespace(int)method instead.This method performs the same work than
trimWhitespaces(CharSequence), but is overloaded for theStringtype because of its frequent use.- Parameters:
text- the text from which to remove leading and trailing whitespaces, ornull.- Returns:
- a string with leading and trailing whitespaces removed, or
nullis the given text was null.
-
trimWhitespaces
Returns a text with leading and trailing whitespace characters omitted. Space characters are identified by theCharacter.isWhitespace(int)method.This method is the generic version of
trimWhitespaces(String).- Parameters:
text- the text from which to remove leading and trailing whitespaces, ornull.- Returns:
- a characters sequence with leading and trailing whitespaces removed,
or
nullis the given text was null. - See Also:
-
trimWhitespaces
Returns a sub-sequence with leading and trailing whitespace characters omitted. Space characters are identified by theCharacter.isWhitespace(int)method.Invoking this method is functionally equivalent to the following code snippet, except that the
subSequencemethod is invoked only once instead of two times:text = trimWhitespaces(text.subSequence(lower, upper));- Parameters:
text- the text from which to remove leading and trailing white spaces.lower- index of the first character to consider for inclusion in the sub-sequence.upper- index after the last character to consider for inclusion in the sub-sequence.- Returns:
- a characters sequence with leading and trailing white spaces removed, or
nullif thetextargument is null. - Throws:
IndexOutOfBoundsException- iflowerorupperis out of bounds.
-
trimFractionalPart
Trims the fractional part of the given formatted number, provided that it doesn't change the value. This method assumes that the number is formatted in the US locale, typically by theDouble.toString(double)method.More specifically if the given value ends with a
'.'character followed by a sequence of'0'characters, then those characters are omitted. Otherwise this method returns the text unchanged. This is a "all or nothing" method: either the fractional part is completely removed, or either it is left unchanged.Examples
This method returns"4"if the given value is"4.","4.0"or"4.00", but returns"4.10"unchanged (including the trailing'0'character) if the input is"4.10".Use case
This method is useful before to parse a number if that number should preferably be parsed as an integer before attempting to parse it as a floating point number.- Parameters:
value- the value to trim if possible, ornull.- Returns:
- the value without the trailing
".0"part (if any), ornullif the given text was null. - See Also:
-
shortSentence
Makes sure that thetextstring is not longer thanmaxLengthcharacters. Iftextis not longer, then it is returned unchanged. Otherwise this method returns a copy oftextwith some characters substituted by the"(…)"string.If the text needs to be shortened, then this method tries to apply the above-cited substitution between two words. For example, the following text:
"This sentence given as an example is way too long to be included in a short name."
May be shortened to something like this:"This sentence given (…) in a short name."
- Parameters:
text- the sentence to reduce if it is too long, ornull.maxLength- the maximum length allowed fortext.- Returns:
- a sentence not longer than
maxLength, ornullif the given text was null.
-
upperCaseToSentence
Given a string in upper cases (typically a Java constant), returns a string formatted like an English sentence. This heuristic method performs the following steps:- Replace all occurrences of
'_'by spaces. - Converts all letters except the first one to lower case letters using
Character.toLowerCase(int). Note that this method does not use theString.toLowerCase()method. Consequently, the system locale is ignored. This method behaves as if the conversion were done in the root locale.
Note that those heuristic rules may be modified in future SIS versions, depending on the practical experience gained.
- Parameters:
identifier- the name of a Java constant, ornull.- Returns:
- the identifier like an English sentence, or
nullif the givenidentifierargument was null.
- Replace all occurrences of
-
camelCaseToSentence
Given a string in camel cases (typically an identifier), returns a string formatted like an English sentence. This heuristic method performs the following steps:- Invoke
camelCaseToWords(CharSequence, boolean), which separate the words on the basis of character case. For example,"transferFunctionType"become "transfer function type". This works fine for ISO 19115 identifiers. - Next replace all occurrence of
'_'by spaces in order to take in account another common naming convention, which uses'_'as a word separator. This convention is used by netCDF attributes like"project_name". - Finally ensure that the first character is upper-case.
Exception to the above rules
If the given identifier contains only upper-case letters, digits and the'_'character, then the identifier is returned "as is" except for the'_'characters which are replaced by'-'. This work well for identifiers like"UTF-8"or"ISO-LATIN-1"for instance.Note that those heuristic rules may be modified in future SIS versions, depending on the practical experience gained.
- Parameters:
identifier- an identifier with no space, words begin with an upper-case character, ornull.- Returns:
- the identifier with spaces inserted after what looks like words, or
nullif the givenidentifierargument was null.
- Invoke
-
camelCaseToWords
Given a string in camel cases, returns a string with the same words separated by spaces. A word begins with a upper-case character following a lower-case character. For example if the given string is"PixelInterleavedSampleModel", then this method returns "Pixel Interleaved Sample Model" or "Pixel interleaved sample model" depending on the value of thetoLowerCaseargument.If
toLowerCaseisfalse, then this method inserts spaces but does not change the case of characters. IftoLowerCaseistrue, then this method changes to lower case the first character after each spaces inserted by this method (note that this intentionally exclude the very first character in the given string), except if the second character is upper case, in which case the word is assumed an acronym.The given string is usually a programmatic identifier like a class name or a method name.
- Parameters:
identifier- an identifier with no space, words begin with an upper-case character.toLowerCase-truefor changing the first character of words to lower case, except for the first word and acronyms.- Returns:
- the identifier with spaces inserted after what looks like words, or
nullif the givenidentifierargument was null.
-
camelCaseToAcronym
Creates an acronym from the given text. This method returns a string containing the first character of each word, where the words are separated by the camel case convention, the'_'character, or any character which is not a Unicode identifier part (including spaces).An exception to the above rule happens if the given text is a Unicode identifier without the
'_'character, and every characters are upper case. In such case the text is returned unchanged on the assumption that it is already an acronym.Examples: given
"northEast", this method returns"NE". Given"Open Geospatial Consortium", this method returns"OGC".- Parameters:
text- the text for which to create an acronym, ornull.- Returns:
- the acronym, or
nullif the given text was null.
-
isAcronymForWords
Returnstrueif the first string is likely to be an acronym of the second string. An acronym is a sequence of letters or digits built from at least one character of each word in thewordsstring. More than one character from the same word may appear in the acronym, but they must always be the first consecutive characters. The comparison is case-insensitive. If any of the given arguments isnull, this method returnsfalse.Example
Given the"Open Geospatial Consortium"words, the following strings are recognized as acronyms:"OGC","ogc","O.G.C.","OpGeoCon".- Parameters:
acronym- a possible acronym of the sequence of words, ornull.words- the sequence of words, ornull.- Returns:
trueif the first string is an acronym of the second one.
-
isUnicodeIdentifier
Returnstrueif the given identifier is a legal Unicode identifier. This method returnstrueif the identifier length is greater than zero, the first character is a Unicode identifier start and all remaining characters (if any) are Unicode identifier parts.Relationship with legal XML identifiers
Most legal Unicode identifiers are also legal XML identifiers, but the converse is not true. The most noticeable differences are the ‘:’, ‘-’ and ‘.’ characters, which are legal in XML identifiers but not in Unicode.
Note that the ‘Characters legal in one set but not in the other Not legal in Unicode Not legal in XML :(colon) µ(micro sign) -(hyphen or minus) ª(feminine ordinal indicator) .(dot) º(masculine ordinal indicator) ·(middle dot) ⁔(inverted undertie) Many punctuation, symbols, etc. Identifier ignorable characters. _’ (underscore) character is legal according both Unicode and XML, while spaces, ‘!’, ‘#’, ‘*’, ‘/’, ‘?’ and most other punctuation characters are not.Usage in Apache SIS
In its handling of identifiers, Apache SIS favors Unicode identifiers without ignorable characters since those identifiers are legal XML identifiers except for the above-cited rarely used characters. As a side effect, this policy excludes ‘:’, ‘-’ and ‘.’ which would normally be legal XML identifiers. But since those characters could easily be confused with namespace separators, this exclusion is considered desirable.- Parameters:
identifier- the character sequence to test, ornull.- Returns:
trueif the given character sequence is a legal Unicode identifier.- See Also:
-
isUpperCase
Returnstrueif the given text is non-null, contains at least one upper-case character and no lower-case character. Space and punctuation are ignored.- Parameters:
text- the character sequence to test (may benull).- Returns:
trueif non-null, contains at least one upper-case character and no lower-case character.- Since:
- 0.7
- See Also:
-
equalsFiltered
public static boolean equalsFiltered(CharSequence s1, CharSequence s2, Characters.Filter filter, boolean ignoreCase) Returnstrueif the given texts are equal, optionally ignoring case and filtered-out characters. This method is sometimes used for comparing identifiers in a lenient way.Example: the following call compares the two strings ignoring case and any characters which are not letter or digit. In particular, spaces and punctuation characters like
'_'and'-'are ignored:assert equalsFiltered("WGS84", "WGS_84", Characters.Filter.LETTERS_AND_DIGITS, true) == true;- Parameters:
s1- the first characters sequence to compare, ornull.s2- the second characters sequence to compare, ornull.filter- the subset of characters to compare, ornullfor comparing all characters.ignoreCase-truefor ignoring cases, orfalsefor requiring exact match.- Returns:
trueif both arguments arenullor if the two given texts are equal, optionally ignoring case and filtered-out characters.
-
equalsIgnoreCase
Returnstrueif the two given texts are equal, ignoring case. This method is similar toString.equalsIgnoreCase(String), except it works on arbitrary character sequences and compares code points instead of characters.- Parameters:
s1- the first string to compare, ornull.s2- the second string to compare, ornull.- Returns:
trueif the two given texts are equal, ignoring case, or if both arguments arenull.- See Also:
-
equals
Returnstrueif the two given texts are equal. This method delegates toString.contentEquals(CharSequence)if possible. This method never invokeCharSequence.toString()in order to avoid a potentially large copy of data.- Parameters:
s1- the first string to compare, ornull.s2- the second string to compare, ornull.- Returns:
trueif the two given texts are equal, or if both arguments arenull.- See Also:
-
regionMatches
Returnstrueif the given text at the given offset contains the given part, in a case-sensitive comparison. This method is equivalent to the following code, except that this method works on arbitraryCharSequenceobjects instead ofStrings only:This method does not thrownreturn text.regionMatches(offset, part, 0, part.length());IndexOutOfBoundsException. Instead, iffromIndex < 0orfromIndex + part.length() > text.length(), then this method returnsfalse.- Parameters:
text- the character sequence for which to tests for the presence ofpart.fromIndex- the offset intextwhere to test for the presence ofpart.part- the part which may be present intext.- Returns:
trueiftextcontainspartat the givenoffset.- Throws:
NullPointerException- if any of the arguments is null.- See Also:
-
regionMatches
public static boolean regionMatches(CharSequence text, int fromIndex, CharSequence part, boolean ignoreCase) Returnstrueif the given text at the given offset contains the given part, optionally in a case-insensitive way. This method is equivalent to the following code, except that this method works on arbitraryCharSequenceobjects instead ofStrings only:This method does not thrownreturn text.regionMatches(ignoreCase, offset, part, 0, part.length());IndexOutOfBoundsException. Instead, iffromIndex < 0orfromIndex + part.length() > text.length(), then this method returnsfalse.- Parameters:
text- the character sequence for which to tests for the presence ofpart.fromIndex- the offset intextwhere to test for the presence ofpart.part- the part which may be present intext.ignoreCase-trueif the case should be ignored.- Returns:
trueiftextcontainspartat the givenoffset.- Throws:
NullPointerException- if any of the arguments is null.- Since:
- 0.4
- See Also:
-
startsWith
Returnstrueif the given character sequence starts with the given prefix.- Parameters:
text- the characters sequence to test.prefix- the expected prefix.ignoreCase-trueif the case should be ignored.- Returns:
trueif the given sequence starts with the given prefix.- Throws:
NullPointerException- if any of the arguments is null.
-
endsWith
Returnstrueif the given character sequence ends with the given suffix.- Parameters:
text- the characters sequence to test.suffix- the expected suffix.ignoreCase-trueif the case should be ignored.- Returns:
trueif the given sequence ends with the given suffix.- Throws:
NullPointerException- if any of the arguments is null.
-
commonPrefix
Returns the longest sequence of characters which is found at the beginning of the two given texts. If one of those texts isnull, then the other text is returned. If there is no common prefix, then this method returns an empty string.- Parameters:
s1- the first text, ornull.s2- the second text, ornull.- Returns:
- the common prefix of both texts (may be empty), or
nullif both texts are null.
-
commonSuffix
Returns the longest sequence of characters which is found at the end of the two given texts. If one of those texts isnull, then the other text is returned. If there is no common suffix, then this method returns an empty string.- Parameters:
s1- the first text, ornull.s2- the second text, ornull.- Returns:
- the common suffix of both texts (may be empty), or
nullif both texts are null.
-
commonWords
Returns the words found at the beginning and end of both texts. The returned string is the concatenation of the common prefix with the common suffix, with prefix and suffix eventually made shorter for avoiding to cut in the middle of a word.The purpose of this method is to create a global identifier from a list of component identifiers. The latter are often eastward and northward components of a vector, in which case this method provides an identifier for the vector as a whole.
If one of the given texts is
null, then the other text is returned. If there are no common words, then this method returns an empty string.Example
Given the following inputs:"baroclinic_eastward_velocity""baroclinic_northward_velocity"
"baroclinic_velocity". Note that the"ward"characters are a common suffix of both texts but nevertheless omitted because they cut a word.Possible future evolution
Current implementation searches only for a common prefix and a common suffix, ignoring any common words that may appear in the middle of the strings. A character is considered the beginning of a word if it is a letter or digit which is not preceded by another letter or digit (as leading "s" and "c" in "snake_case"), or if it is an upper case letter preceded by a lower case letter or no letter (as both "C" in "CamelCase").- Parameters:
s1- the first text, ornull.s2- the second text, ornull.- Returns:
- the common suffix of both texts (may be empty), or
nullif both texts are null. - Since:
- 1.1
-
token
Returns the token starting at the given offset in the given text. For the purpose of this method, a "token" is any sequence of consecutive characters of the same type, as defined below.Let define c as the first non-blank character located at an index equals or greater than the given offset. Then the characters that are considered of the same type are:
- If c is a Unicode identifier start, then any following characters that are Unicode identifier part.
- Otherwise any character for which
Character.getType(int)returns the same value than for c.
- Parameters:
text- the text for which to get the token.fromIndex- index of the first character to consider in the given text.- Returns:
- a sub-sequence of
textstarting at the given offset, or an empty string if there are no non-blank character at or after the given offset. - Throws:
NullPointerException- if thetextargument is null.
-
replace
public static CharSequence replace(CharSequence text, CharSequence toSearch, CharSequence replaceBy) Replaces all occurrences of a given string in the given character sequence. If no occurrence oftoSearchis found in the given text or iftoSearchis equal toreplaceBy, then this method returns thetextunchanged. Otherwise this method returns a new character sequence with all occurrences replaced byreplaceBy.This method is similar to
String.replace(CharSequence, CharSequence)except that is accepts arbitraryCharSequenceobjects. As of Java 10, another difference is that this method does not create a newStringiftoSearchis equal toreplaceBy.- Parameters:
text- the character sequence in which to perform the replacements, ornull.toSearch- the string to replace.replaceBy- the replacement for the searched string.- Returns:
- the given text with replacements applied, or
textif no replacement has been applied, ornullif the given text was null - Since:
- 0.4
- See Also:
-
copyChars
public static void copyChars(CharSequence src, int srcOffset, char[] dst, int dstOffset, int length) Copies a sequence of characters in the givenchar[]array.- Parameters:
src- the characters sequence from which to copy characters.srcOffset- index of the first character fromsrcto copy.dst- the array where to copy the characters.dstOffset- index where to write the first character indst.length- number of characters to copy.- See Also:
-
String.strip()in JDK 11.