Discussion:
Diff crash when comparing 2 big files with error message "diff.exe: memory exhausted"
Kissinger Chen
2008-10-13 06:36:19 UTC
Permalink
HI, Lady/Gentleman:

This is Kissinger, a Gnuwin32 user. Today I use diff to compare some big
files and try to retrieve the differences between files.

For some big files, the diff work well and I can get the differences.
But for some other big files, the diff will crash with
such error message:"diff.exe: memory exhausted". These files are more
then 500 MB, pure text files.
And I use the diff of 2.8.7, my OS is windows XP.

It seems there is an issue in the Gnuwin32 diff tool. Has this issue
been reported? I can not search this issue from help.

If convenient, can you tell me how to resolve this issue? Is there any
another diff-like tool? Thanks.


Kissinger




This email was sent to you by Thomson Reuters, the global news and information company.
Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
Bob Proulx
2008-10-13 18:48:46 UTC
Permalink
Post by Kissinger Chen
This is Kissinger, a Gnuwin32 user. Today I use diff to compare some big
files and try to retrieve the differences between files.
For some big files, the diff work well and I can get the differences.
Sounds good.
Post by Kissinger Chen
But for some other big files, the diff will crash with
such error message:"diff.exe: memory exhausted". These files are more
then 500 MB, pure text files.
And I use the diff of 2.8.7, my OS is windows XP.
GNU diff is a program and requires a certain amount of memory to
analyze files. Bigger files with complex differences may consume more
memory than smaller files. When diff encounters an error it will
report it. Running out of memory is an error and is being reported.
Running out of memory in the case of processing large files is not a
program crash. The program did not crash. The program detected that
it was out of resources and reported this error condition.
Post by Kissinger Chen
It seems there is an issue in the Gnuwin32 diff tool. Has this issue
been reported? I can not search this issue from help.
The error message you have received is stating "memory exhausted"
meaning that there is not enough virtual process memory for the
program to run to successful completion. This is a physical
limitation of your platform. You have run out of memory.

How much memory do you have on your system?

How much memory is diff currently consuming before running out?
Post by Kissinger Chen
If convenient, can you tell me how to resolve this issue? Is there any
another diff-like tool? Thanks.
Here are some suggestions:

1. If your 32-bit system has less than 4G of memory then adding more
memory to your system may allow this program to complete. 32-bit
systems have a maximum addressable memory of 4G so that is the
upper limit possible.

2. Run this task on a different system that has more memory than your
current machine.

3. Run this task on a 64-bit system with more than 4G of memory. A
64-bit operating system has much more capability to address very
large amounts of physical memory.

Note that in extreme cases diff may consume very large amounts of
memory and still may exhaust all available memory. You may need to
change the problem statement in order to reduce the system resources
needed to accomplish what you want to accomplish. You may need to
reorganize what you are trying to do in order to make the problem
tractable within current system memory limitations.

Bob
Kissinger Chen
2008-10-14 02:19:25 UTC
Permalink
HI, Bob,

Thanks for your answer.

Today I copy these big pure text file to another computer. This computer
is a 32-bit computer with windows server 2003 enterprise edition(sp1)
And has 8G memory.

Then I compare these files with diff.exe(2.8.7 version). But I get the
same
result: For some big files, the diff work well and I can get the
differences, for this case, 1059M memory have been consumed by diff.exe
and 6614M memory are available.(I check the memory usage by widows task
manager)

For some other big files, the diff will fail with message:"diff.exe:
memory
exhausted". for this case, 1007M memory have been consumed by diff.exe
and 6667M memory are free.

You can see now I have enough memory, and I remember that a process can
consume 2000M memory
in 32-bit computer.

So maybe I encounter the very extreme cases. diff.exe will report error
even there is enough
memory. Has this issue been reported? And if convenient, would you
please tell me how can I
handle this issue? thanks.

Thanks again.

Kissinger




-----Original Message-----
From: Bob Proulx [mailto:***@proulx.com]
Sent: Tuesday, October 14, 2008 02:49
To: Kissinger Chen
Cc: bug-gnu-***@gnu.org
Subject: Re: Diff crash when comparing 2 big files with error message
"diff.exe: memory exhausted"
Post by Kissinger Chen
This is Kissinger, a Gnuwin32 user. Today I use diff to compare some
big files and try to retrieve the differences between files.
For some big files, the diff work well and I can get the differences.
Sounds good.
Post by Kissinger Chen
But for some other big files, the diff will crash with such error
message:"diff.exe: memory exhausted". These files are more then 500
MB, pure text files.
And I use the diff of 2.8.7, my OS is windows XP.
GNU diff is a program and requires a certain amount of memory to analyze
files. Bigger files with complex differences may consume more memory
than smaller files. When diff encounters an error it will report it.
Running out of memory is an error and is being reported.
Running out of memory in the case of processing large files is not a
program crash. The program did not crash. The program detected that it
was out of resources and reported this error condition.
Post by Kissinger Chen
It seems there is an issue in the Gnuwin32 diff tool. Has this issue
been reported? I can not search this issue from help.
The error message you have received is stating "memory exhausted"
meaning that there is not enough virtual process memory for the program
to run to successful completion. This is a physical limitation of your
platform. You have run out of memory.

How much memory do you have on your system?

How much memory is diff currently consuming before running out?
Post by Kissinger Chen
If convenient, can you tell me how to resolve this issue? Is there any
another diff-like tool? Thanks.
Here are some suggestions:

1. If your 32-bit system has less than 4G of memory then adding more
memory to your system may allow this program to complete. 32-bit
systems have a maximum addressable memory of 4G so that is the
upper limit possible.

2. Run this task on a different system that has more memory than your
current machine.

3. Run this task on a 64-bit system with more than 4G of memory. A
64-bit operating system has much more capability to address very
large amounts of physical memory.

Note that in extreme cases diff may consume very large amounts of memory
and still may exhaust all available memory. You may need to change the
problem statement in order to reduce the system resources needed to
accomplish what you want to accomplish. You may need to reorganize what
you are trying to do in order to make the problem tractable within
current system memory limitations.

Bob


This email was sent to you by Thomson Reuters, the global news and information company.
Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
Bob Proulx
2008-10-14 19:41:29 UTC
Permalink
Post by Kissinger Chen
Today I copy these big pure text file to another computer. This computer
is a 32-bit computer with windows server 2003 enterprise edition(sp1)
And has 8G memory.
Even though the system has 8G of memory as a 32-bit program an
individual program cannot address more than 4G of it. Usually there
is a limitation below that value. Typical GNU/Linux based programs
can access up to 3G of memory on a 32-bit system. I do not know what
limitations your MS system imposes as I do not use such systems.
Post by Kissinger Chen
Then I compare these files with diff.exe(2.8.7 version). But I get
the same result: For some big files, the diff work well and I can
get the differences, for this case, 1059M memory have been consumed
by diff.exe and 6614M memory are available.(I check the memory usage
by widows task manager)
Running out of memory at 1059M seems suspiciously like your program is
limited to 1G of memory. This is probably a limitation is the way the
program was built on your MS operating system. You are probably not
able to use the rest of the memory even though it is available.
Post by Kissinger Chen
memory exhausted". for this case, 1007M memory have been consumed by
diff.exe and 6667M memory are free.
This appears to be a limition to 1G in the way the program was built
for your MS operating system. Since I do not use MS I cannot help you
further with this problem.
Post by Kissinger Chen
You can see now I have enough memory, and I remember that a process can
consume 2000M memory
in 32-bit computer.
Typically on 32-bit GNU/Linux systems programs can access up to 3G of
memory.
Post by Kissinger Chen
So maybe I encounter the very extreme cases. diff.exe will report error
even there is enough
memory.
But diff hasn't been able to allocate that memory from the system.
Your diff.exe program has failed to allocate more memory than 1G in
both of the cases you showed. It doesn't matter if the memory is in
the system if the system does not give it to the program to use.

I do not use MS operating systems and so cannot suggest what you would
need to do to produce a program that could make use of the memory
there. You might ask your question on one of the MS mailing lists
such as the Cygwin mailing lists. They might have an answer for you
specific to your operating system. Here we are using the GNU
operating system and it does not have this restriction.

GNU diff needs enough memory to read both files into memory before
the diff. Because both of the files you have are 500M that exceeds
the 1G that your system is allowing you to use for diff.
Post by Kissinger Chen
Has this issue been reported? And if convenient, would you please
tell me how can I handle this issue? thanks.
I searched the web and found several different reports of people
having issues with diff running out of memory. Some suggested using
the rdiff program from the rsync suite. That seems to be more
efficient but has a completely different format and so if you need a
normal diff format isn't suitable. I didn't see any other
suggestions.

The brute force solution would be to move to a 64-bit operating system
with more memory. Otherwise you will simply need to reduce the size
of the files so as to reduce the amount of required memory.

Bob

Loading...